Using LLMs in Development
Practical examples of how large language models are integrated into real production systems, from support automation and knowledge retrieval to developer tooling, code generation, and intelligent assistants.
Using LLMs in Software Applications
Prompt-Based Development
Instead of training a classifier, we can simply write a prompt.
Example:
prompt = """
Classify the following review
as either positive or negative:
The banana pudding was really tasty!
"""
response = llm_response(prompt)
print(response)
Expected output: Positive
This works because large language models already have general knowledge learned during pretraining.
Classic ML vs Generative AI Workflow
The workflow difference is significant:
Supervised learning
- Get labeled data
- Train AI model
- Deploy model
- Can take months
Prompt-based AI
- Specify prompt
- Deploy model
- Can take minutes, hours, or days
flowchart TD
A[Supervised Learning] --> A1[Get labeled data]
A1 --> A2[Train AI model on data]
A2 --> A3[Deploy run model]
B[Prompt-Based AI] --> B1[Specify prompt]
B1 --> B2[Deploy run model]
This is one of the biggest reasons LLMs are attractive in product development: they dramatically reduce time to first prototype.
Before LLMs, a common way to build a text application was supervised learning.
For example, if a restaurant wanted to monitor online reviews, the team would:
Example:
Input: Restaurant reviews Output: Sentiment (Positive / Negative)
This process could take months.
Generative AI changes this dramatically.
- Collect labeled examples
- Train an AI model
- Deploy the model
The system learns a mapping from input text to output label . For sentiment classification:
where:
- is the review text
For example:
This approach works, but it is often slow because it depends on dataset creation and model training.
Instead of training a model, we can use prompting.
prompt = """
Classify the following review
as having either a positive or
negative sentiment:
The banana pudding was really
tasty!
"""
response = llm_response(prompt)
print(response)
Development time often drops from months to hours or days.
Lifecycle of a GenAI Project
Building an AI system is an iterative engineering process.
Typical lifecycle:
- 📝 Scope project: what you want to build, what problem you want to solve, and what success looks like.
- 🏗️ Build or improve system
- 📋 Internal evaluation
- 🚀 Deploy and monitor
Lifecycle Diagram
flowchart TD
S[Scope project 📝] --> B[🏗️ Build or improve system]
B --> E[Internal evaluation 📋]
E --> D[Deploy and monitor 🚀]
D --> B
A prototype may look good on a simple example, but fail on a slightly different one.
A working demo is not the same thing as a reliable product.
This loop is central to real LLM engineering. You ship a prototype, observe failure cases, improve prompts or architecture, and repeat.
This loop repeats continuously.
Engineers must analyze failures and improve the system.
Improving LLM Performance
Building AI systems is highly empirical.
We improve performance through experimentation.
Common techniques include:
1. Prompting
Prompting is usually the first and cheapest lever.
You change the instructions, add examples, clarify format, or provide constraints.
2. Retrieval Augmented Generation (RAG)
RAG gives the LLM access to external data sources so it can answer questions using organization-specific information rather than relying only on its built-in knowledge.
3. Fine-tuning
Fine-tuning adapts a model to your task, style, or domain.
4. Pretraining
Pretraining means training an LLM from scratch.
This is the most expensive and hardest option, and usually the last resort.
Improvement Loop Diagram
flowchart LR
I[Idea] --> P[Prompt]
P --> R[LLM response]
R --> I
Cost Intuition
Estimate LLM cost using tokens. Roughly:
If a person reads about words per minute, then in one hour they consume about:
If the system also processes a similar amount of prompt text, total words might be around:
Converting words to tokens:
If cost is about:
then the total estimated cost is:
So 8 cents can keep 1 user busy for 1 hour.
Tool Use with LLMs
LLMs call external tools for doing a task
LLMs are powerful, but they are not reliable at everything.
LLM often struggle with precise arithmetic or actions that require external systems.
Example:
LLMs are not always good at exact math.
Question:
How much would I have after 8 years if I deposit $100 at 5% interest?
A model may produce the wrong number if it tries to reason directly in text. The more reliable method is tool use:
So the LLM should call an external calculator:
Math Tool Flow
flowchart TD
Q[User asks math question] --> LLM[LLM recognizes need for precise calculation]
LLM --> Calc[External calculator]
Calc --> Result[147.74]
Result --> Answer[LLM returns grounded answer]
This is an important engineering lesson: do not force the LLM to do tasks that a specialized tool can do more reliably.
Real Software Applications of LLMs
LLMs can power many types of applications.
Writing Applications
Examples:
- drafting emails
- generating reports
- marketing copy
- summarizing documents
Architecture:
User → Prompt → LLM → Generated Text
Reading Applications
LLMs can understand and extract information from text.
Example tasks:
- summarization
- information extraction
- sentiment analysis
- document classification
Example prompt:
Classify the sentiment of the following review:
Output: "The mochi is excellent!"
Chat Applications
LLMs also power conversational systems.
Example interaction:
User: I'd like a cheeseburger for delivery
Bot: Sure. Anything else?
User: That's all
Bot: It will arrive in 20 minutes
These systems combine:
- prompts
- conversation memory
- business logic
