Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 5 LLM in Development

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Using LLMs in Development

Using LLMs in Development

Practical examples of how large language models are integrated into real production systems, from support automation and knowledge retrieval to developer tooling, code generation, and intelligent assistants.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Sat Mar 07 2026

Share This on

← Previous

Using LLMs in Production

Next →

Deep Learning Path 🤖

Using LLMs in Software Applications

Large Language Models are not just research tools anymore.
They are increasingly becoming components inside real software systems.

Instead of building complex ML pipelines from scratch, engineers can now integrate an LLM using a prompt and an API call.

In this article we explore how LLMs are used in production software, based on ideas from Andrew Ng's Generative AI for Everyone – Week 2.


Classic ML vs Generative AI Workflow

The workflow difference is significant:

  • Supervised learning

    • Get labeled data
    • Train AI model
    • Deploy model
    • Can take months
  • Prompt-based AI

    • Specify prompt
    • Deploy model
    • Can take minutes, hours, or days
flowchart TD
    A[Supervised Learning] --> A1[Get labeled data]
    A1 --> A2[Train AI model on data]
    A2 --> A3[Deploy run model]

    B[Prompt-Based AI] --> B1[Specify prompt]
    B1 --> B2[Deploy run model]

This is one of the biggest reasons LLMs are attractive in product development: they dramatically reduce time to first prototype.

Before LLMs, a common way to build a text application was supervised learning. For example, if a restaurant wanted to monitor online reviews, the team would:

Input→Labeled Data→Model Training→DeploymentInput \rightarrow Labeled\ Data \rightarrow Model\ Training \rightarrow DeploymentInput→Labeled Data→Model Training→Deployment

Example:

Input: Restaurant reviews Output: Sentiment (Positive / Negative)

This process could take months.

Generative AI changes this dramatically.

  1. Collect labeled examples
  2. Train an AI model
  3. Deploy the model

The system learns a mapping from input text AAA to output label BBB. For sentiment classification:

f(A)=Bf(A) = Bf(A)=B

where:

  • AAA is the review text
  • B∈{Positive,Negative}B \in \{\text{Positive}, \text{Negative}\}B∈{Positive,Negative}

For example:

A="Best soup dumplings I’ve ever eaten."⇒B=PositiveA = \text{"Best soup dumplings I've ever eaten."} \Rightarrow B = \text{Positive}A="Best soup dumplings I’ve ever eaten."⇒B=Positive A="Not worth the 3 month wait for a reservation."⇒B=NegativeA = \text{"Not worth the 3 month wait for a reservation."} \Rightarrow B = \text{Negative}A="Not worth the 3 month wait for a reservation."⇒B=Negative

This approach works, but it is often slow because it depends on dataset creation and model training.

Instead of training a model, we can use prompting.

prompt = """
Classify the following review
as having either a positive or
negative sentiment:
The banana pudding was really
tasty!
"""
response = llm_response(prompt)
print(response)

Development time often drops from months to hours or days.


Prompt-Based Development

Instead of training a classifier, we can simply write a prompt.

Example:

prompt = """
Classify the following review
as either positive or negative:

The banana pudding was really tasty!
"""

response = llm_response(prompt)
print(response)

Expected output: Positive

This works because large language models already have general knowledge learned during pretraining.

Real Software Applications of LLMs

LLMs can power many types of applications.

Writing Applications

Examples:

  • drafting emails
  • generating reports
  • marketing copy
  • summarizing documents

Architecture:

User → Prompt → LLM → Generated Text

Reading Applications

LLMs can understand and extract information from text.

Example tasks:

  • summarization
  • information extraction
  • sentiment analysis
  • document classification

Example prompt:

Classify the sentiment of the following review:

Output: "The mochi is excellent!"

Chat Applications

LLMs also power conversational systems.

Example interaction:

User: I'd like a cheeseburger for delivery
Bot: Sure. Anything else?
User: That's all
Bot: It will arrive in 20 minutes

These systems combine:

  • prompts
  • conversation memory
  • business logic

Lifecycle of a Generative AI Project

Building an AI system is an iterative engineering process.

Typical lifecycle:

  • Scope project
  • Build or improve system
  • Internal evaluation
  • Deploy and monitor

Lifecycle Diagram

flowchart LR
    S[Scope project] --> B[Build or improve system]
    B --> E[Internal evaluation]
    E --> D[Deploy and monitor]
    D --> B

A prototype may look good on a simple example, but fail on a slightly different one. For instance, a sentiment model may correctly label:

“The custard tart was amazing!” →\rightarrow→ Positive

but incorrectly label:

“My pasta was cold” →\rightarrow→ Positive

This shows why evaluation is essential. A working demo is not the same thing as a reliable product.

This loop is central to real LLM engineering. You ship a prototype, observe failure cases, improve prompts or architecture, and repeat.

This loop repeats continuously.

Example failure:

Prompt:
Classify sentiment

Input:
"My pasta was cold"

LLM Output:
Positive

Engineers must analyze failures and improve the system.

Improving LLM Performance

Building AI systems is highly empirical.

We improve performance through experimentation.

Common techniques include:

1. Prompting

Prompting is usually the first and cheapest lever.

You change the instructions, add examples, clarify format, or provide constraints.

2. Retrieval Augmented Generation (RAG)

RAG gives the LLM access to external data sources so it can answer questions using organization-specific information rather than relying only on its built-in knowledge.

3. Fine-tuning

Fine-tuning adapts a model to your task, style, or domain.

4. Pretraining

Pretraining means training an LLM from scratch.

This is the most expensive and hardest option, and usually the last resort.

Improvement Loop Diagram

flowchart LR
    I[Idea] --> P[Prompt]
    P --> R[LLM response]
    R --> I

Cost Intuition

Estimate LLM cost using tokens. Roughly:

1 token≈34 word1 \text{ token} \approx \frac{3}{4} \text{ word}1 token≈43​ word

If a person reads about 250250250 words per minute, then in one hour they consume about:

60×250=15000 words60 \times 250 = 15000 \text{ words}60×250=15000 words

If the system also processes a similar amount of prompt text, total words might be around:

15000+15000=30000 words15000 + 15000 = 30000 \text{ words}15000+15000=30000 words

Converting words to tokens:

30000 words≈40000 tokens30000 \text{ words} \approx 40000 \text{ tokens}30000 words≈40000 tokens

If cost is about:

$0.002 per 1K tokens\$0.002 \text{ per 1K tokens}$0.002 per 1K tokens

then the total estimated cost is:

40×0.002=$0.0840 \times 0.002 = \$0.0840×0.002=$0.08

Retrieval Augmented Generation (RAG)

How RAG Works

The slides break RAG into three steps:

  1. Search relevant documents for an answer
  2. Insert retrieved text into the prompt
  3. Generate the answer from the updated prompt :contentReference[oaicite:18]{index=18}

Mermaid RAG Flow

flowchart TD
    Q[User question] --> R1[Retrieve relevant documents]
    R1 --> R2[Insert retrieved context into prompt]
    R2 --> LLM[LLM generates answer]
    LLM --> A[Grounded response]

Give the model access to external knowledge.

This allows models to answer questions about private or up-to-date data.

Conceptually, the prompt becomes:

Prompt=Instruction+Retrieved Context+Question\text{Prompt} = \text{Instruction} + \text{Retrieved Context} + \text{Question}Prompt=Instruction+Retrieved Context+Question

For example:

Answer=LLM(Instruction+Parking Policy+Question)\text{Answer} = \text{LLM}(\text{Instruction} + \text{Parking Policy} + \text{Question})Answer=LLM(Instruction+Parking Policy+Question)

This is powerful because the LLM is being used more as a reasoning engine than as a pure source of facts. It reads relevant text and uses that text to formulate an answer. :contentReference[oaicite:19]{index=19}

Fine-Tuning

Fine-tuning adapts a model to a specific task.

Pretraining: Train on massive internet text

Fine-tuning: Adapt model using smaller domain dataset

Typical scale:

Stage Data Size
Pretraining billions of tokens
Fine-tuning thousands of examples

Use cases:

  • domain-specific language
  • structured outputs
  • company-specific style

When Should You Pretrain a Model?

Pretraining an LLM is extremely expensive.

Typical requirements:

  • hundreds of billions of tokens
  • months of training
  • tens of millions of dollars

For most application teams, pretraining should be an option of last resort. It only makes sense when the domain is highly specialized and existing models cannot be adapted effectively. :contentReference[oaicite:28]{index=28}

Decision Ladder

flowchart TD
    A[Start with prompting] --> B{Good enough?}
    B -- Yes --> Z[Deploy]
    B -- No --> C[Try RAG]
    C --> D{Good enough?}
    D -- Yes --> Z
    D -- No --> E[Try fine-tuning]
    E --> F{Good enough?}
    F -- Yes --> Z
    F -- No --> G[Consider pretraining as last resort]

Therefore it should be considered a last resort.

Most applications use:

  • prompting
  • RAG
  • fine-tuning

Choosing the Right Model Size

Different tasks require different model sizes.

Model Size Capabilities
1B parameters basic tasks
10B parameters moderate reasoning
100B+ parameters complex reasoning

Example mapping:

Task Model Size
Sentiment classification small
Chatbot medium
Brainstorming assistant large

Closed vs Open Source Models

There are two major deployment strategies.

Closed Models

Examples:

  • OpenAI
  • Anthropic
  • Google

Advantages:

  • strong performance
  • easy API integration

Disadvantages:

  • vendor lock-in
  • data privacy concerns

Open Source Models

Examples:

  • LLaMA
  • Mistral
  • Falcon

Advantages:

  • full control
  • on-prem deployment
  • better privacy

Disadvantages:

  • infrastructure complexity
  • weaker models (sometimes)

Tool Use with LLMs

LLMs can also call external tools.

Example:

User question: How much money will I have after 8 years if I deposit $100 at 5% interest?

Result: =147.74

Tool usage makes systems more reliable.

RLHF

RLHF trains a reward model that scores answers. Higher scores go to responses that are more helpful, honest, and harmless.

We can describe the reward idea as:

r=Reward(response∣prompt)r = \text{Reward}(\text{response} \mid \text{prompt})r=Reward(response∣prompt)

Then the model is optimized to produce responses with higher expected reward:

max⁡πE[r]\max_{\pi} \mathbb{E}[r]πmax​E[r]

where π\piπ is the model’s response policy.

RLHF Flow Diagram

flowchart TD
    P[Prompt] --> G[Model generates candidate responses]
    G --> H[Humans score responses]
    H --> RM[Train reward model]
    RM --> FT[Further train model to prefer high-reward responses]

This is one reason chat systems feel more aligned, polite, and useful than raw base models. :contentReference[oaicite:34]{index=34}


Tool Use

LLMs are powerful, but they are not reliable at everything. In particular, they often struggle with precise arithmetic or actions that require external systems.

The course shows a food-ordering example. A user says:

Send me a burger!

A naive chatbot may simply reply:

Ok, it’s on the way!

But that is not enough. A real system must gather order details, confirm address, and call the ordering backend. :contentReference[oaicite:35]{index=35}

Tool-Based Ordering Flow

flowchart TD
    U[User message] --> L[LLM interprets request]
    L --> T[Call ordering tool]
    T --> C[Show confirmation to user]
    C --> Y{User confirms?}
    Y -- Yes --> O[Place order]
    Y -- No --> X[Cancel or revise]

The tool call might conceptually look like:

ORDER(Burger,9876,"1234 My Street")\text{ORDER}(\text{Burger}, 9876, \text{"1234 My Street"})ORDER(Burger,9876,"1234 My Street")

This makes the LLM part of a larger application architecture rather than the whole system. :contentReference[oaicite:36]{index=36}


Tools for Reasoning

The slides also show that LLMs are not always good at exact math.

Question:

How much would I have after 8 years if I deposit $100 at 5% interest?

A model may produce the wrong number if it tries to reason directly in text. The more reliable method is tool use:

100×1.058=147.74100 \times 1.05^8 = 147.74100×1.058=147.74

So the LLM should call an external calculator:

CALCULATOR(100×1.058)\text{CALCULATOR}(100 \times 1.05^8)CALCULATOR(100×1.058)

Math Tool Flow

flowchart TD
    Q[User asks math question] --> LLM[LLM recognizes need for precise calculation]
    LLM --> Calc[External calculator]
    Calc --> Result[147.74]
    Result --> Answer[LLM returns grounded answer]

This is an important engineering lesson: do not force the LLM to do tasks that a specialized tool can do more reliably. :contentReference[oaicite:37]{index=37}

Agents

Agents use LLMs to perform multi-step reasoning and actions.

Example task:

Research BetterBurgers competitors

Agent plan:

  1. Search competitors
  2. Visit websites
  3. Summarize each company

Agent Workflow Diagram

flowchart TD
    U[User goal] --> P[LLM plans steps]
    P --> S[Search]
    S --> V[Visit websites]
    V --> R[Read content]
    R --> M[Summarize findings]
    M --> O[Return final answer]

Agents are still an active research area, but the core idea is already useful: combine reasoning, planning, and tools to solve multi-step tasks.

The LLM acts as a controller that decides which tools to use.

Key Insight

The most important shift is this:

  • LLMs are not just knowledge sources.
  • They are reasoning engines that process information.

Instead of asking: What does the model know? We ask: What information can we give the model so it can reason about it?

AI-GenAI/5-LLM-in-Development
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.