Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 3 3 XGBoost

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍌 Bananas are berries, but strawberries are not.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for XGBoost (Extreme Gradient Boosting) Explained

XGBoost (Extreme Gradient Boosting) Explained

Learn how XGBoost works, including gradient boosting, decision trees, residual learning, regularization, and why XGBoost is one of the most powerful machine learning algorithms for structured and tabular data.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Tue May 26 2026

Share This on

← Previous

Pinned Memory (Page-Locked Memory) in CUDA and GPU Computing

Next →

t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

XGBoost (Extreme Gradient Boosting)

XGBoost is an optimized gradient boosting algorithm that combines multiple decision trees sequentially to build highly accurate predictive models.

XGBoost is a highly optimized machine learning algorithm based on:

  • Gradient Boosting
  • Decision Trees

It is widely used for:

  • structured/tabular data
  • classification
  • regression
  • ranking problems

XGBoost became extremely popular because of:

  • high accuracy
  • speed
  • scalability
  • strong Kaggle competition performance

Sample Code

import xgboost as xgb

# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')

# specify parameters via map
param = {
         'max_depth':2, 
         'eta':1, 
         'objective':'binary:logistic' 
         }
num_round = 2
bst = xgb.train(param, dtrain, num_round)

# make prediction
preds = bst.predict(dtest)

Core Idea

XGBoost builds multiple decision trees sequentially.

Each new tree learns:

  • errors
  • residuals
  • mistakes

from previous trees.

High-Level Workflow

flowchart TD

    A[Training Data]

    A --> B[Tree 1]

    B --> C[Prediction Error]

    C --> D[Tree 2 Learns Residuals]

    D --> E[Updated Prediction]

    E --> F[More Trees Added]

    F --> G[Final Strong Model]

Why "Boosting"?

Boosting means:

  • combining many weak learners
  • into one strong learner

Weak learner:

  • slightly better than random

Strong learner:

  • highly accurate predictor

Ensemble Learning

XGBoost is an:

  • Ensemble Learning algorithm

It combines many decision trees.

flowchart LR

    A[Tree 1]
    B[Tree 2]
    C[Tree 3]
    D[Tree N]

    A --> E[Combined Prediction]
    B --> E
    C --> E
    D --> E

Gradient Boosting Concept

Each new tree minimizes the loss function using gradients.

Fm(x)=Fm−1(x)+hm(x)F_m(x) = F_{m-1}(x)+ h_m(x)Fm​(x)=Fm−1​(x)+hm​(x)

Where:

  • Fm(x)F_m(x)Fm​(x) = updated model
  • hm(x)h_m(x)hm​(x) = new tree correcting errors

Training Process

Step 1

Train first decision tree.

Step 2

Compute prediction errors.

Residual=y−y^\text{Residual} = y - \hat{y}Residual=y−y^​

Step 3

Train next tree on residuals.

Step 4

Add new tree predictions to existing model.

Step 5

Repeat iteratively.

Example Flow

sequenceDiagram

    participant D as Dataset
    participant T1 as Tree 1
    participant T2 as Tree 2
    participant T3 as Tree 3

    D->>T1: Initial Training

    T1->>T2: Residual Errors

    T2->>T3: Remaining Errors

    T3-->>D: Final Prediction

Objective Function

XGBoost minimizes:

L=∑il(yi,y^i)+∑kΩ(fk)\mathcal{L} = \sum_i l(y_i, \hat{y}_i)+ \sum_k \Omega(f_k)L=i∑​l(yi​,y^​i​)+k∑​Ω(fk​)

Where:

  • lll = loss function
  • Ω\OmegaΩ = regularization term
  • fkf_kfk​ = decision trees

Regularization

XGBoost includes regularization to reduce overfitting.

Ω(f)=γT+12λ∣∣w∣∣2\Omega(f) = \gamma T + \frac{1}{2}\lambda ||w||^2Ω(f)=γT+21​λ∣∣w∣∣2

Where:

  • TTT = number of leaves
  • www = leaf weights
  • γ,λ\gamma, \lambdaγ,λ = regularization parameters

Why XGBoost is Powerful

Feature Benefit
Gradient boosting High accuracy
Regularization Prevents overfitting
Parallel processing Faster training
Tree pruning Better optimization
Missing value handling Robust training
Sparse optimization Efficient memory usage

Important Hyperparameters

Parameter Purpose
n_estimators Number of trees
max_depth Tree depth
learning_rate Step size
subsample Row sampling
colsample_bytree Feature sampling
gamma Split regularization
lambda L2 regularization

Learning Rate

Controls contribution of each tree.

Fm(x)=Fm−1(x)+ηhm(x)F_m(x) = F_{m-1}(x) + \eta h_m(x)Fm​(x)=Fm−1​(x)+ηhm​(x)

Where:

  • η\etaη = learning rate

Small learning rate:

  • slower learning
  • better generalization

Decision Tree Structure

flowchart TD

    A[Feature Split]

    A -->|Condition True| B[Left Branch]

    A -->|Condition False| C[Right Branch]

    B --> D[Prediction]

    C --> E[Prediction]

XGBoost Pipeline

flowchart TD

    A[Raw Data]

    A --> B[Feature Engineering]

    B --> C[Train/Test Split]

    C --> D[XGBoost Training]

    D --> E[Model Evaluation]

    E --> F[Predictions]

Limitations

Limitation Description
Can overfit Especially deep trees
Large models Memory intensive
Less effective for images/text Deep learning better
Hyperparameter tuning needed Many parameters

XGBoost vs Random Forest

XGBoost Random Forest
Sequential trees Parallel trees
Boosting Bagging
Learns residuals Independent trees
Higher accuracy Simpler
More tuning required Easier to use

XGBoost vs Neural Networks

XGBoost Neural Networks
Excellent for tabular data Excellent for unstructured data
Faster training Slower training
Less data required Large data preferred
More interpretable Less interpretable

Applications of XGBoost

Common Use Cases

XGBoost is often the best choice when:

  • dataset is tabular
  • features are structured
  • dataset size is moderate
  • interpretability matters

Example Use Cases

  • Fraud Detection
  • Credit Scoring
  • Recommendation Systems
  • Customer Churn
  • Sales Forecasting
  • Medical Prediction
  • Kaggle Competitions

Advantages

Advantage Description
High accuracy Excellent predictive power
Handles tabular data well Industry standard
Fast training Optimized implementation
Robust to missing values Automatic handling
Feature importance support Interpretability

AI-Machine-Learning/3-3-XGBoost
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.