Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 Model

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦈 Sharks existed before trees 🌳.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for AI Models and LLM Development with NVIDIA

AI Models and LLM Development with NVIDIA

Step-by-step overview of AI model development, including generative AI, large language models, training and inference workflows, GPU computing, and practical learning resources.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Thu Feb 19 2026

Share This on

Model

A model is a program that has been trained on a set of data to recognize certain patterns or make certain decisions without further human intervention.

Model = Trained Algorithm + Data

Inferences

Process of running unseen data through a trained AI model to make a prediction or solve a task Inference is an ML model in action.

Foundation Models (FMs)

Large-scale models trained on broad data that can be adapted to a wide range of downstream tasks.

  • Examples: GPT-3, BERT, DALL-E, Stable Diffusion
  • Characteristics:
    • Trained on massive datasets (text, images, code)
    • Capable of zero-shot and few-shot learning
    • Serve as a base for fine-tuning on specific tasks

Large Language Model (LLM)

A type of foundation model specifically designed to understand and generate human language.

  • Examples: GPT-3, BERT, T5
  • Characteristics:
    • Trained on vast amounts of text data
    • Able to recognize and interpret human language
    • Flexible: can perform tasks like text generation, translation, summarization, and question-answering

Word Embedding

Technique used to represent words as dense vectors in a continuous vector space, capturing semantic relationships between words.

  • Examples: Word2Vec, GloVe, FastText
  • Characteristics:
    • Each word is represented as a high-dimensional vector
    • Similar words have similar vector representations
    • Enables models to understand context and relationships between words
    • Foundation for many NLP tasks and models, including LLMs

Transformer Models

Deep learning models that understand relationships within sequences.

  • White Paper: "Attention is All You Need" (Vaswani et al., 2017)
  • Examples: BERT, GPT-3, T5

Key Components:

  • Self-Attention Mechanism: Allows the model to weigh the importance of different words in a sentence when making predictions.
  • Multi-Head Attention: Enables the model to focus on different parts of the input simultaneously.
  • Feed-Forward Networks: Processes the output of the attention mechanism to produce the final output.
  • Characteristics:
    • Highly parallelizable, making it efficient to train on large datasets
    • Capable of capturing long-range dependencies in text
  • Has become the standard architecture for LLMs and many other NLP tasks.

Transformer Key Components

1. Tokenization

Break text into tokens.

Example: unbelievable → un + believ + able

Why? Reduces vocabulary size.

Common tokenizers: BPE WordPiece SentencePiece

2. Embeddings

Convert token IDs → vectors (numbers with meaning).

Without embeddings: Tokens are just numbers.

3. Positional Encoding

Adds order information.

Transformers do NOT understand sequence order naturally. Position is injected manually.

4. Self-Attention (Extremely Important)

Allows model to:

  • Look at all words at once
  • Determine which words are important

Example: "The movie had a slow start but was amazing."

Model focuses more on: “amazing” due to contrast word “but”.

5. Multi-Head Attention

Multiple attention mechanisms running in parallel.

Each head learns different relationships: Syntax Emotion Topic

Long-distance dependency

Exam question may ask:

Q: Why multi-head instead of single head? Answer: To learn multiple representation subspaces simultaneously.

Encoder vs Decoder

Encoder-only (BERT)

If task = understand input → Encoder-only

  • Understand text
  • Classification
  • Sentiment
  • Search ranking

Decoder-only (GPT)

If task = generate output → Decoder-only

  • Generate text
  • Chat
  • Story writing
  • Code

Encoder-Decoder (T5, BART)

If task = both understand + generate → Encoder-Decoder

  • Translation
  • Summarization
  • Text transformation
AI-GenAI/2-Model
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.