Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

AI-GenAI

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-GenAI

What is AI Models and How to pick the right one?

Step-by-step overview of AI model development, including generative AI, large language models, training and inference workflows, GPU computing, and practical learning resources.

NVIDIA

AI Models

LLM

Generative AI

GPU Computing

CUDA

← Previous

Kubernetes and Cloud Native Certification Path

What are Transformer Models?

Model

A model is a program that has been trained on a set of data to recognize certain patterns or make certain decisions without further human intervention.

Model = Trained Algorithm + Data

Training vs Inference

AI Workflow:

 Data Preperation 
  |--> Model Training 
     |--> Optimization 
         |--> Inference/Deployment

🦾 Model Training

compute intensive
Forward + backward pass
Multi-GPU scaling
High memory + compute demand
Uses NCCL, NVLink, RDMA

🚀 Model Inference

Process of running unseen data through a trained AI model to make a prediction or solve a task

latency optimized
Forward pass only
Lower latency focus
Often containerized (Kubernetes)

Inferences

Inference is an ML model in action.

🦾 Training	🚀 Inference
Model learning	Model usage
High compute + memory	Lower latency focus
Batch workloads	Real-time workloads
Multi-GPU scaling	Edge + cloud deployment

Quantization

Process of reducing numerical precision of model weight & activation

Reducing floating point precision from 32 bit to 8 bit:

Improve latency
Save Power
Reduce memory usage

Precision vs Model Size vs Inference Performance

Precision	Model Size	Inference Speed	Accuracy
32-bit (FP32 / Full Precision)	100 MB	1x	95%
16-bit (FP16 / Half Precision)	50 MB	1.8x	94.8%
8-bit (INT8 Quantized)	25 MB	3x	94%

`EDA` (Exploratory Data Analysis) for AI Models

Process of analyzing and visualizing data to understand its characteristics before training an AI model.

First step in data analysis is to perform EDA to gain insights into the data and identify potential issues.

Used to first understand data before using it to find pattern, problems and features that can be used to train a model.

Common Techniques include:

1. `N-Gram` analysis

Capture longer context and relationships between words by analyzing sequences of n words (e.g., bigrams, trigrams).

Unigram- single word
Bigram- two words
Trigram- three words

from sklearn.feature_extraction.text import CountVectorizer

# Example: Extract bigrams from text data
vectorizer = CountVectorizer(ngram_range=(2, 2))
bigrams = vectorizer.fit_transform(df['text_column'])

Example output:

[('machine learning', 100), ('artificial intelligence', 80), ('deep learning', 60), ('natural language', 50), ('neural networks', 40)]

2. Word frequency analysis

Identify the most common words in the dataset to understand prevalent themes and topics.

from collections import Counter 

# Example: Count word frequencies
word_counts = Counter(" ".join(df['text_column']).split())
most_common_words = word_counts.most_common(10)
print(most_common_words)

Example output:

[('the', 500), ('and', 450), ('to', 400), ('is', 350), ('in', 300), ('it', 250), ('of', 200), ('was', 150), ('for', 100), ('with', 50)]

3. Descriptive Statistical analysis

Calculate summary statistics (e.g., mean, median, standard deviation) to understand the distribution of numerical features.

# Example: Calculate summary statistics for a numerical column
print(df['numerical_column'].describe())

Example output:

count    1000.000000
mean       50.123456
std        10.987654
min        20.000000
25%        40.000000
50%        50.000000

4. Data visualization (e.g., histograms, word clouds, scatter plots)

Use visualizations to explore data distributions and relationships between features.


import matplotlib.pyplot as plt
from wordcloud import WordCloud 

# Example: Create a word cloud
text = " ".join(df['text_column'])
wordcloud = WordCloud(width=800, height=400).generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

Example output: A word cloud visualization showing the most common words in the dataset, with larger words representing higher frequency.

Common EDA steps include:

1. Data Collection

Gather relevant data for the task at hand.
Example: For a sentiment analysis model, collect a dataset of text reviews labeled with sentiment (positive, negative, neutral).

2. Data Cleaning

Remove duplicates, handle missing values, and correct errors in the data.

import pandas as pd
# Load dataset
df = pd.read_csv('reviews.csv')
# Remove duplicates
df = df.drop_duplicates()
# Handle missing values
df = df.fillna(method='ffill')

3. Data Visualization

Use visualizations to understand data distribution and relationships.

import matplotlib.pyplot as plt
# Visualize sentiment distribution
df['sentiment'].value_counts().plot(kind='bar')
plt.title('Sentiment Distribution')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.show()

4. Feature Engineering

Create new features from existing data to improve model performance.

# Example: Create a feature for review length
df['review_length'] = df['review_text'].apply(len)

5. Data Splitting

Split the dataset into training, validation, and test sets.

from sklearn.model_selection import train_test_split
# Split data into training and test sets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

6. Model Selection

Choose an appropriate model architecture based on the task and data characteristics.

from sklearn.linear_model import LogisticRegression
# Initialize a logistic regression model
model = LogisticRegression()

Training Selection

Select a training algorithm and optimization method to train the model.
Example: Use stochastic gradient descent (SGD) to optimize the model's parameters during training.

Common training algorithms include:

Algorithm	Description
Stochastic Gradient Descent (SGD)	Iteratively updates model parameters based on a random subset of the training data.
Adam	An adaptive learning rate optimization algorithm that combines the benefits of both AdaGrad and RMSProp.
RMSProp	An optimization algorithm that adjusts the learning rate for each parameter based on the average of recent magnitudes of gradients.
Adagrad	An optimization algorithm that adapts the learning rate for each parameter based on the historical gradients.

7. Model Training

Train the model on the training data.

# Train the model
model.fit(train_df['review_text'], train_df['sentiment'])

8. Model Evaluation

Evaluate the model's performance on the test set using appropriate metrics.

from sklearn.metrics import accuracy_score

# Predict on the test set
predictions = model.predict(test_df['review_text'])   
  
# Calculate accuracy
accuracy = accuracy_score(test_df['sentiment'], predictions)

# Evaluate accuracy of the model on the test set
print(f"Accuracy: {accuracy}")

Written by Hitesh Sahu, a passionate developer and blogger.

Tue Feb 24 2026

Share This on

← Previous

Kubernetes and Cloud Native Certification Path

What are Transformer Models?

AI-GenAI/2-0-Model

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

AI-GenAI

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-GenAI

What is AI Models and How to pick the right one?

Step-by-step overview of AI model development, including generative AI, large language models, training and inference workflows, GPU computing, and practical learning resources.

NVIDIA

AI Models

LLM

Generative AI

GPU Computing

CUDA

← Previous

Kubernetes and Cloud Native Certification Path

What are Transformer Models?

Model

A model is a program that has been trained on a set of data to recognize certain patterns or make certain decisions without further human intervention.

Model = Trained Algorithm + Data

Training vs Inference

AI Workflow:

 Data Preperation 
  |--> Model Training 
     |--> Optimization 
         |--> Inference/Deployment

🦾 Model Training

compute intensive
Forward + backward pass
Multi-GPU scaling
High memory + compute demand
Uses NCCL, NVLink, RDMA

🚀 Model Inference

Process of running unseen data through a trained AI model to make a prediction or solve a task

latency optimized
Forward pass only
Lower latency focus
Often containerized (Kubernetes)

Inferences

Inference is an ML model in action.

🦾 Training	🚀 Inference
Model learning	Model usage
High compute + memory	Lower latency focus
Batch workloads	Real-time workloads
Multi-GPU scaling	Edge + cloud deployment

Quantization

Process of reducing numerical precision of model weight & activation

Reducing floating point precision from 32 bit to 8 bit:

Improve latency
Save Power
Reduce memory usage

Precision vs Model Size vs Inference Performance

Precision	Model Size	Inference Speed	Accuracy
32-bit (FP32 / Full Precision)	100 MB	1x	95%
16-bit (FP16 / Half Precision)	50 MB	1.8x	94.8%
8-bit (INT8 Quantized)	25 MB	3x	94%

`EDA` (Exploratory Data Analysis) for AI Models

Process of analyzing and visualizing data to understand its characteristics before training an AI model.

First step in data analysis is to perform EDA to gain insights into the data and identify potential issues.

Used to first understand data before using it to find pattern, problems and features that can be used to train a model.

Common Techniques include:

1. `N-Gram` analysis

Capture longer context and relationships between words by analyzing sequences of n words (e.g., bigrams, trigrams).

Unigram- single word
Bigram- two words
Trigram- three words

from sklearn.feature_extraction.text import CountVectorizer

# Example: Extract bigrams from text data
vectorizer = CountVectorizer(ngram_range=(2, 2))
bigrams = vectorizer.fit_transform(df['text_column'])

Example output:

[('machine learning', 100), ('artificial intelligence', 80), ('deep learning', 60), ('natural language', 50), ('neural networks', 40)]

2. Word frequency analysis

Identify the most common words in the dataset to understand prevalent themes and topics.

from collections import Counter 

# Example: Count word frequencies
word_counts = Counter(" ".join(df['text_column']).split())
most_common_words = word_counts.most_common(10)
print(most_common_words)

Example output:

[('the', 500), ('and', 450), ('to', 400), ('is', 350), ('in', 300), ('it', 250), ('of', 200), ('was', 150), ('for', 100), ('with', 50)]

3. Descriptive Statistical analysis

Calculate summary statistics (e.g., mean, median, standard deviation) to understand the distribution of numerical features.

# Example: Calculate summary statistics for a numerical column
print(df['numerical_column'].describe())

Example output:

count    1000.000000
mean       50.123456
std        10.987654
min        20.000000
25%        40.000000
50%        50.000000

4. Data visualization (e.g., histograms, word clouds, scatter plots)

Use visualizations to explore data distributions and relationships between features.


import matplotlib.pyplot as plt
from wordcloud import WordCloud 

# Example: Create a word cloud
text = " ".join(df['text_column'])
wordcloud = WordCloud(width=800, height=400).generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

Example output: A word cloud visualization showing the most common words in the dataset, with larger words representing higher frequency.

Common EDA steps include:

1. Data Collection

Gather relevant data for the task at hand.
Example: For a sentiment analysis model, collect a dataset of text reviews labeled with sentiment (positive, negative, neutral).

2. Data Cleaning

Remove duplicates, handle missing values, and correct errors in the data.

import pandas as pd
# Load dataset
df = pd.read_csv('reviews.csv')
# Remove duplicates
df = df.drop_duplicates()
# Handle missing values
df = df.fillna(method='ffill')

3. Data Visualization

Use visualizations to understand data distribution and relationships.

import matplotlib.pyplot as plt
# Visualize sentiment distribution
df['sentiment'].value_counts().plot(kind='bar')
plt.title('Sentiment Distribution')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.show()

4. Feature Engineering

Create new features from existing data to improve model performance.

# Example: Create a feature for review length
df['review_length'] = df['review_text'].apply(len)

5. Data Splitting

Split the dataset into training, validation, and test sets.

from sklearn.model_selection import train_test_split
# Split data into training and test sets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

6. Model Selection

Choose an appropriate model architecture based on the task and data characteristics.

from sklearn.linear_model import LogisticRegression
# Initialize a logistic regression model
model = LogisticRegression()

Training Selection

Select a training algorithm and optimization method to train the model.
Example: Use stochastic gradient descent (SGD) to optimize the model's parameters during training.

Common training algorithms include:

Algorithm	Description
Stochastic Gradient Descent (SGD)	Iteratively updates model parameters based on a random subset of the training data.
Adam	An adaptive learning rate optimization algorithm that combines the benefits of both AdaGrad and RMSProp.
RMSProp	An optimization algorithm that adjusts the learning rate for each parameter based on the average of recent magnitudes of gradients.
Adagrad	An optimization algorithm that adapts the learning rate for each parameter based on the historical gradients.

7. Model Training

Train the model on the training data.

# Train the model
model.fit(train_df['review_text'], train_df['sentiment'])

8. Model Evaluation

Evaluate the model's performance on the test set using appropriate metrics.

from sklearn.metrics import accuracy_score

# Predict on the test set
predictions = model.predict(test_df['review_text'])   
  
# Calculate accuracy
accuracy = accuracy_score(test_df['sentiment'], predictions)

# Evaluate accuracy of the model on the test set
print(f"Accuracy: {accuracy}")

Written by Hitesh Sahu, a passionate developer and blogger.

Tue Feb 24 2026

Share This on

← Previous

Kubernetes and Cloud Native Certification Path

What are Transformer Models?

AI-GenAI/2-0-Model

Fetching content, this won’t take long…

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

AI-GenAI

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

What is AI Models and How to pick the right one?

Step-by-step overview of AI model development, including generative AI, large language models, training and inference workflows, GPU computing, and practical learning resources.

Model

Model = Trained Algorithm + Data

Training vs Inference

🦾 Model Training

🚀 Model Inference

Inferences

Precision vs Model Size vs Inference Performance

EDA (Exploratory Data Analysis) for AI Models

Common Techniques include:

1. N-Gram analysis

2. Word frequency analysis

3. Descriptive Statistical analysis

4. Data visualization (e.g., histograms, word clouds, scatter plots)

Common EDA steps include:

1. Data Collection

2. Data Cleaning

3. Data Visualization

4. Feature Engineering

5. Data Splitting

6. Model Selection

Training Selection

7. Model Training

8. Model Evaluation

Written by Hitesh Sahu, a passionate developer and blogger.

Fetching content, this won’t take long…

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

AI-GenAI

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

What is AI Models and How to pick the right one?

Step-by-step overview of AI model development, including generative AI, large language models, training and inference workflows, GPU computing, and practical learning resources.

Model

Model = Trained Algorithm + Data

Training vs Inference

🦾 Model Training

🚀 Model Inference

Inferences

Precision vs Model Size vs Inference Performance

EDA (Exploratory Data Analysis) for AI Models

Common Techniques include:

1. N-Gram analysis

2. Word frequency analysis

3. Descriptive Statistical analysis

4. Data visualization (e.g., histograms, word clouds, scatter plots)

Common EDA steps include:

`EDA` (Exploratory Data Analysis) for AI Models

1. `N-Gram` analysis

`EDA` (Exploratory Data Analysis) for AI Models

1. `N-Gram` analysis