Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 3 0 LLM

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍌 Bananas are berries, but strawberries are not.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

AI-GenAI

  • AI-GenAI Index

  • NVIDIA AI-LLM Developers Certification Path

  • Understanding Generative AI

  • What is AI Models and How to pick the right one?

  • How to Choose the Right AI Model for Your Use Case

  • What are Transformer Models?

  • Retrieval-Augmented Generation (RAG) for AI Applications

  • LLMs & Foundation Models Explained

  • Using LLMs in Development

  • Using LLMs in Production

  • Ethical AI vs Responsible AI vs Trustworthy AI

  • Generative Adversarial Networks (GANs) Explained

  • U-Net Explained

  • Understanding CLIP: Connecting Images and Text in Generative AI

  • Diffusion Models Explained

  • The Economic Impact of Generative AI

  • NVIDIA Certified Associate Generative AI (NCA-GENL) Practice Questions

Cover Image for LLMs & Foundation Models Explained

LLMs & Foundation Models Explained

A practical guide to Large Language Models (LLMs) and foundation models, covering architectures, training concepts, fine-tuning, inference, embeddings, RAG, and real-world AI application development.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Wed May 13 2026

Share This on

← Previous

NVIDIA NGC Catalog: GPU Optimized Containers, AI Models and Enterprise AI Infrastructure

Next →

🎿 Beginner’s Guide to Skiing

What is Large Language Model (LLM)

A Large Language Model is a sophisticated mathematical function that predicts what word comes next for any piece of text"

LLM is a type of foundation model specifically designed to understand and generate human language.

White Paper: https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

🧱 Foundation Models (FMs)

Large-scale models trained on broad data that can be adapted to a wide range of downstream tasks.

  • Examples: GPT-3, BERT, DALL-E, Stable Diffusion

Characteristics:

  • Trained on massive datasets (text, images, code)
  • Capable of zero-shot and few-shot learning
  • Serve as a base for fine-tuning on specific tasks

Autoregressive language model

A type of language model that generates text by predicting the next word in a sequence based on the previous words.

  • Example: GPT-3, LLaMA, Mistral, Falcon

🧠 Large Language Models (LLMs)

A subset of foundation models that are specifically designed to understand and generate human language.

Provider Type Description
AWS Bedrock aws_bedrock AWS Bedrock API
Azure OpenAI azure_openai Azure OpenAI API
Hugging Face huggingface Hugging Face API
Hugging Face Inference huggingface_inference Hugging Face Inference API, Endpoints, and TGI
LiteLLM litellm LiteLLM API
NVIDIA NIM nim NVIDIA Inference Microservice (NIM)
OCI Generative AI oci OCI Generative AI
OpenAI openai OpenAI API

Examples: GPT-3, BERT, T5

  • Characteristics:
    • Trained on vast amounts of text data
    • Able to recognize and interpret human language
    • Flexible: can perform tasks like text generation, translation, summarization, and question-answering

Parameter Tuning in LLM

Use low temperature and low top-p for agents, planners, tool calls, and structured outputs.

Use higher temperature and top-p for creative content generation and brainstorming.

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    input="Generate deployment steps for a Grafana dashboard import.",
    temperature=0.1,
    top_p=0.2
)

print(response.output_text)

Choosing Top-P & Temperature

Use Case 🌡️ Temperature Top-p 🎲
JSON generation 0.0 0.1
Tool calling 0.0 0.1
Code generation 0.1 0.2
RAG QA 0.2 0.8
Summarization 0.3 0.9
Creative writing 0.8 0.95
Brainstorming 1.0 1.0

1. Temperature 🌡️

Changes the shape of the probability distribution.

Temperature = changes the probabilities of tokens.

Temperature↓  →  Randomness↓  →  Determinism↑\text{Temperature} \downarrow \;\rightarrow\; \text{Randomness} \downarrow \;\rightarrow\; \text{Determinism} \uparrowTemperature↓→Randomness↓→Determinism↑

❄® Low Temperature  →  Less Randomness  →  More Confident Choices\text{❄️ Low Temperature} \;\rightarrow\; \text{Less Randomness} \;\rightarrow\; \text{More Confident Choices}❄R◯ Low Temperature→Less Randomness→More Confident Choices

🔥 High Temperature  →  More Randomness  →  More Creative Choices\text{🔥 High Temperature} \;\rightarrow\; \text{More Randomness} \;\rightarrow\; \text{More Creative Choices}🔥 High Temperature→More Randomness→More Creative Choices


2. Top-p / Nucleus Sampling 🎲

Top K dynamically adjusts the number of tokens considered based on their cumulative probability.

Top=P sampling selects the smallest set of tokens whose cumulative probability exceeds a threshold P (a value between 0 and 1).

Top-p = changes which tokens are allowed to participate in sampling.

🎲 Top-p (Nucleus Sampling)↓  →  Fewer Candidate Tokens  →  More Deterministic Output\text{🎲 Top-p (Nucleus Sampling)} \downarrow \;\rightarrow\; \text{Fewer Candidate Tokens} \;\rightarrow\; \text{More Deterministic Output}🎲 Top-p (Nucleus Sampling)↓→Fewer Candidate Tokens→More Deterministic Output

🎲 Top-p (Nucleus Sampling)↑  →  More Candidate Tokens  →  More Diverse Output\text{🎲 Top-p (Nucleus Sampling)} \uparrow \;\rightarrow\; \text{More Candidate Tokens} \;\rightarrow\; \text{More Diverse Output}🎲 Top-p (Nucleus Sampling)↑→More Candidate Tokens→More Diverse Output

Example

Tokens:

Token Probability
"the" 40%
"a" 25%
"this" 15%
"that" 10%
Others 10%

Top-p = 1.0

All tokens remain eligible. Maximum diversity.

Eg: Where every bean begins a new adventure.

Top-p = 0.9

The model will consider the smallest number of tokens whose combined probability is 90%.

the   40%
a     25%   -> 65%
this  15%   -> 80%
that  10%   -> 90%

The remaining low-probability tokens are discarded.

Eg. Wake up to a cup of inspiration.

Top-p = 0.3

the 40%

Only the highest-probability token is eligible.

Output becomes highly deterministic.

Eg. Fresh coffee, every day.

Typical Values

Top-p Behavior Meaning
0.1 - 0.3 Very deterministic Less Choices
0.5 - 0.7 Controlled Fresh coffee, every day.
0.8 - 0.95 Balanced Medium
1.0 Maximum diversity Creative

3. Max Tokens 🗨

Limits the number of output tokens the model can generate.

Example: Max Tokens = 50

The model stops after approximately 50 output tokens.

Small Value

Max Tokens = 20

Response may be cut off.

The migration plan consists of three phases:
1. Assessment
2. Pilot
3...

Large Value:

Max Tokens = 2000

The model can provide a detailed answer.

Parameter Controls Effect
Top-p Randomness / token selection How creative or deterministic the output is
Max Tokens Response length How long the model is allowed to generate

Prompting Techniques in Generative AI 💬

Technique Mental Model Examples Provided?
Zero-Shot "Just do it" No
One-Shot "Here is one example" One
Few-Shot "Learn from these examples" Multiple
CoT "Think step by step" Optional
System Prompt "Behave like this" Persistent instruction

1. Zero-Shot Prompting 👨‍🦯

Blindly asking LLM to generate text without giving a direction or example

  • useful for simple and well-defined tasks
  • Most common
  • fast inference

Example: "Suggest newborn baby name"

Expected: Random baby names

Aarav — peaceful, calm
Vihaan — dawn, new beginning
Ivaan — God’s gracious gift
Reyansh — ray of light
...

2. One-Shot Prompting ☝

When a single example clarifies task format or style; helps guide the model with minimal context

The model receives:

  • one example
  • then the real task

Best For

  • formatting guidance
  • classification tasks
  • lightweight context steering

Example: "Suggest newborn baby name starting with A eg: Aaryan"

Expected: Indian Baby names starting with A

Aarav — peaceful, calm
Aadvik — unique
Ayaan — gift of God
Atharv — wisdom, knowledge
...

3. Few-Shot Prompting 📝

When multiple examples are needed to teach the model patterns or nuanced behavior

The model learns patterns from multiple examples.

Best For

  • nuanced tasks
  • structured outputs
  • custom formatting
  • behavior steering

Example: "Suggest newborn baby name starting with A eg: Aaryan"

Expected:

Aarav — peaceful
Aaryan — noble
Ayaan — gift of God
...

4. Chain-of-Thought (CoT) 🔗

When reasoning or multi-step logic is required; improves reasoning accuracy by generating intermediate steps Chain-of-Thought prompting encourages:

  • intermediate reasoning
  • multi-step thinking

CoT improves:

  • reasoning accuracy
  • logical consistency
  • math performance
  • planning tasks

Especially useful for:

  • LLM agents
  • coding tasks
  • complex workflows

Example

Question:
If a train travels 60 km/h for 2 hours,
how far does it travel?

Let's think step by step.

Expected reasoning:

Distance = Speed × Time
60 × 2 = 120 km

5. System Prompting 📜

When you want to control model behavior, tone, safety, or output formatting consistently

System prompts define:

  • model behavior
  • personality
  • rules
  • tone
  • response style

Best For

  • chatbots
  • enterprise AI
  • compliance
  • formatting rules
  • safety policies

Example

You are a professional support assistant.
Always respond politely and concisely.

🔁 Transfer Learning

Using a model pretrained on a large dataset and adapting it for a related task with limited new data.

A model trained on millions of images already understands edges, textures, faces, animals, etc.

You fine-tune it to detect:

  • cancer cells
  • defective products
  • cats vs dogs
  • traffic signs

Advantages

  • Faster training
  • Less data required (1000 vs 1 million)
  • Better accuracy
  • Lower compute cost
  • Works well for small datasets

Disadvantage

  • Source task and target task should be somewhat related
  • Biases from pretrained data can transfer
  • Large models may still be expensive

Popular models

Computer Vision

  • ResNet
  • VGGNet
  • EfficientNet
  • YOLO

NLP

  • BERT
  • GPT
  • T5

🎛 Fine-Tuning

Fine-tuning adapts a model to a specific task.

  • Tune model to understand domain-specific language eg medical, legal, finance
  • Adapt model using smaller domain dataset

Transfer learning vs Fine-tuning

  • Transfer learning = broader concept
  • Fine-tuning = one implementation approach

Use cases:

  • domain-specific language
  • structured outputs
  • company-specific style

🧗 Pretraining

Train on massive internet text

  • Only makes sense for large organizations with unique data and resources

When Should You Pretrain a Model?

Pretraining an LLM is extremely expensive.

Typical requirements:

  • hundreds of billions of tokens
  • months of training
  • tens of millions of dollars

For most application teams, pretraining should be an option of last resort. It only makes sense when the domain is highly specialized and existing models cannot be adapted effectively.

Typical scale:

Stage Data Size
Pretraining billions of tokens
Fine-tuning thousands of examples

⚗️ Knowledge Distillation

Large, powerful model (Teacher) transfers learned behavior to a smaller model (Student), enabling similar performance with lower compute and memory usage.

  • Knowledge Distillation → senior employee mentoring a junior employee

Example

  • Using a large GPT model to train a lightweight chatbot model for mobile devices.

Use case:

  • Mainly used to deploy efficient models on edge/mobile devices?
Concept Main Goal
Transfer Learning Reuse learned knowledge
Fine-Tuning Adapt pretrained model to specific task
Knowledge Distillation Compress knowledge into smaller model

Decision Ladder

flowchart TD
    A[Start with prompting] --> B{Good enough?}
    B -- Yes --> Z[Deploy]
    B -- No --> C[Try RAG]
    C --> D{Good enough?}
    D -- Yes --> Z
    D -- No --> E[Try fine-tuning]
    E --> F{Good enough?}
    F -- Yes --> Z
    F -- No --> G[Consider pretraining as last resort]

Therefore, it should be considered a last resort.

Most applications use:

  • Prompting
  • RAG
  • Fine-tuning

RLHF (Reinforcement Learning From Human Feedback

RLHF trains a reward model that scores answers.

  • Higher scores go to responses that are more helpful, honest, and harmless.

We can describe the reward idea as:

r=Reward(response∣prompt)r = \text{Reward}(\text{response} \mid \text{prompt})r=Reward(response∣prompt)

Then the model is optimized to produce responses with higher expected reward:

max⁡πE[r]\max_{\pi} \mathbb{E}[r]πmax​E[r]

where π\piπ is the model’s response policy.

RLHF Flow Diagram

flowchart TD
    P[Prompt] --> G[Model generates candidate responses]
    G --> H[Humans score responses]
    H --> RM[Train reward model]
    RM --> FT[Further train model to prefer high-reward responses]

This is one reason chat systems feel more aligned, polite, and useful than raw base models.

🕵🏻 Agents

Agents use LLMs to perform multi-step reasoning and actions.

Example task:

Research BetterBurgers competitors

Agent plan:

  1. Search competitors
  2. Visit websites
  3. Summarize each company

Agent Workflow Diagram

flowchart TD
    U[User goal] --> P[LLM plans steps]
    P --> S[Search]
    S --> V[Visit websites]
    V --> R[Read content]
    R --> M[Summarize findings]
    M --> O[Return final answer]

Agents are still an active research area, but the core idea is already useful: combine reasoning, planning, and tools to solve multi-step tasks.

The LLM acts as a controller that decides which tools to use.

← Previous

NVIDIA NGC Catalog: GPU Optimized Containers, AI Models and Enterprise AI Infrastructure

Next →

🎿 Beginner’s Guide to Skiing

AI-GenAI/3-0-LLM
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.