How LLMs Power Applications

What You'll Learn

This guide answers the questions executives ask most when evaluating AI investments:

• How do LLMs actually work?

Understand the fundamentals without needing a technical background (~5 min)

• What automation patterns exist?

Learn when to use workflows vs. agents vs. agentic systems (~7 min)

• What will this cost and what ROI can I expect?

Calculate real costs and understand fixed vs. variable expenses (~5 min)

• What are the risks and how do I mitigate them?

Understand hallucinations, context limits, and practical solutions (~5 min)

Total reading time: ~20 minutes. Use the navigation on the left to skip to any section.

Text In, Text Out

Input

"What is the capital of France?"

LLM

Output

"The capital of France is Paris."

That's the entire interface: text in, text out. This simple interface is why LLMs are so versatile for business:

📧

Input can be...

Customer emails, contracts, reports, database queries, or images

🔄

Output can be...

Summaries, classifications, responses, or instructions for other systems

🎯

Result...

Easy integration into any business process without complex APIs

You don't need to understand neural networks or transformers to use LLMs effectively. You just need to understand what they can do—and that comes from how they're trained.

Four Key Training Aspects

Modern LLMs that power applications like ChatGPT are trained to excel in multiple dimensions

Statistical Likelihood

Trained on massive datasets (web, books, research) to generate text based on patterns observed in the data.

Human Preference

Fine-tuned with human feedback to produce responses that people find helpful, accurate, and appropriate.

Instruction Following

Specifically trained to understand and follow instructions, making them useful for task completion.

Tool Calling

Enhanced to interact with external tools and APIs, extending capabilities beyond text generation.

The Three Flavors of AI Automation

Understanding these three patterns helps you evaluate opportunities and make build vs. buy decisions

Which Pattern Fits Your Problem?

📋

Choose Workflows when...

You have well-defined steps, need predictability, and want full control over the process

🤖

Choose Agents when...

User needs vary, tasks require exploration, and you need flexibility over rigid control

🎯

Choose Systems when...

Problems span multiple domains, require specialized expertise, and justify the complexity

AI Workflows: Code in Control

In workflows, you design each step. LLMs are tools called at specific points for text generation or analysis. This example shows invoice processing automation.

⚙️

Step 1 of 7

Receive Invoice

System receives PDF invoice via email or upload

Output: Raw PDF document

⚙️

Receive Invoice

Code

System receives PDF invoice via email or upload

🤖

Extract Data

LLM

LLM extracts structured information from invoice

⚙️

Validate Data

Code

Code checks extracted data against business rules

🤖

Classify Category

LLM

LLM categorizes expense (Office Supplies, Travel, etc.)

🔀

Route Decision

Logic

Code routes based on amount and category

🤖

Generate Summary

LLM

LLM creates human-readable summary for approval

⚙️

Send Notification

Code

System sends email with summary to approver

Key Characteristics

🎯

Predictable

Every execution follows the same path you designed

🛡️

Controllable

You control exactly when and how LLMs are used

✅

Testable

Easy to test each step independently

Production Considerations

In production workflows, include error handling for when validation or LLM steps fail. Build in retry logic with improved prompts, and human-in-the-loop escalation for edge cases that automated validation catches.

AI Agents: LLM in Control

In agent systems, the LLM makes decisions about what to do next. The LLM can maintain conversations, call tools, and adapt to user needs. Here are two examples of agents in action.

How ChatGPT Works

Each response is generated by feeding the entire conversation history (including system instructions) into the LLM

Step 1 of 8

Starting state: System prompt and user message in chat history

Step 1 of 8 • Auto-playing

Chat History

System

You are a helpful assistant that provides concise, accurate answers.

User

What is machine learning?

LLM

Response

The conversation grows with each exchange. The LLM sees the full history every time, allowing it to maintain context and coherence.

AI Agents with Tool Calling

LLMs can be enhanced to call external tools, enabling them to query databases, generate visualizations, and perform complex multi-step tasks

Step 1 of 9

To measure profitability by SKU, I'll pull sales and cost data from the database.

Manual control • Click play to resume

System Prompt (Data Analysis Agent)

You are a data analysis assistant.

Use query_database(sql) to fetch data
Use run_regression(sql, formula) for statistical modeling
Use make_chart(vega_spec) to visualize results

Always explain your reasoning, show steps clearly, and ensure outputs are accurate and interpretable.

User Prompt

I want to understand product profitability by SKU.

LLM Response

Here's the profitability analysis:

SKU-A: $12.4k (31%)

SKU-B: $8.2k (22%)

SKU-C: $15.1k (38%)

User Follow-up

Can you create a visualization to compare profitability by SKU?

Tool Result

SKU-A

31%

SKU-B

22%

SKU-C

38%

LLM Response

Here's the visualization comparing profitability by SKU. SKU-C shows the highest profit margin at 38%. What would you like to explore next?

LLM

Python Code

query_database()

Python Code

make_chart()

By combining LLMs with tool calling, AI agents can break down complex tasks, execute code, query databases, and provide rich, interactive responses.

Agentic Systems: Coordinated Intelligence

Multiple specialized agents collaborate on complex tasks. Each agent has domain expertise and tools. An orchestrator coordinates the work and synthesizes findings.

Step 1 of 10

user → orchestrator

"Analyze potential acquisition of TechCorp Inc."

User requests comprehensive M&A due diligence analysis

🎯

Orchestrator

💰

Financial Agent

⚖️

Legal Agent

📊

Market Agent

🛡️

Risk Agent

👤

User

Why Agentic Systems?

🎓

Specialized Expertise

Each agent has domain-specific training, tools, and context

⚡

Parallel Processing

Multiple agents work simultaneously on different aspects

🎯

Complex Problem Solving

Handles multi-domain problems that single agents can't solve

When This Pattern Pays Off

Agentic systems make sense when the problem has these characteristics:

✅Good Fit When:

•Task requires 3+ distinct areas of expertise (legal + financial + compliance)
•Each domain needs specialized tools or data sources
•Volume justifies 6-12 month implementation timeline
•Manual process currently requires multiple teams/handoffs

❌Wrong Choice When:

•A single agent with 2-3 tools would work
•Process is linear and predictable (use workflow)
•You haven't proven ROI with simpler patterns first
•You lack internal expertise to maintain complex systems

Example: M&A due diligence on a $50M acquisition—replaces 4 weeks of work across legal, financial, risk, and market research teams. Justifies the complexity.

Where LLMs Add Business Value

Explore high-ROI use cases across business functions. Each is tagged with its recommended automation pattern and implementation complexity.

Start with Quick Wins

For fastest ROI, start with high ROI + low complexity use cases like document processing, ticket routing, or resume screening. Build momentum before tackling complex implementations.

🎯

Document Processing

High ROI, Low Complexity

🎫

Ticket Classification

High ROI, Low Complexity

📄

Resume Screening

High ROI, Low Complexity

Understanding Costs & ROI

LLMs use token-based pricing. Calculate the true costs and compare to time savings to understand your ROI.

Configure Your Scenario

Model Size

LLM Requests Per Task20

1 (simple)100 (complex agent)

Tasks Completed Per Day50

1500

= 1,000 total requests/day

Manual Time Per Task (minutes)15

1 min60 min

Hourly Labor Rate ($)$50

$20/hr$200/hr

Your ROI Analysis

API Costs

Per Request:$0.0095

Daily:$9.50

Monthly:$285.00

Yearly:$3,467.50

Cost Savings (Time)

Daily Time Saved:12.5 hours

Monthly Savings:$18,750.00

Yearly Savings:$228,125.00

Net Benefit

Monthly Net:$18,465.00

Yearly Net:$224,657.50

Return on Investment

65.8x

Every $1 spent returns $65.79

⚠️ Remember: Variable Costs Are Only Part of the Picture

The calculator above shows variable costs (API fees that scale with usage). But successful AI implementations also require fixed costs for implementation and maintenance.

💰Variable Costs (Ongoing)

•API fees per token (shown above)
•Infrastructure costs (if self-hosting)
•Scale linearly with usage

🔧Fixed Costs (One-Time + Maintenance)

•Understanding use cases and requirements
•Development, testing, and integration
•Prompt engineering and fine-tuning
•Ongoing monitoring, iteration, and support
•Training employees on new workflows

Rule of thumb: For simple workflows, fixed costs might equal 3-6 months' worth of variable costs. For complex agentic systems, fixed costs can equal 12-24 months' worth of variable costs. Plan accordingly.

Understanding the Economics

📊

Token-Based Pricing

You pay per token (roughly 4 characters). Longer inputs/outputs cost more. Calculator assumes 60/40 input/output split—adjust for your use case.

⚖️

Right-Sizing Models

Use the smallest model that works. Small models are 10-100x cheaper than large ones.

🎯

Start Simple

Begin with simple workflows to minimize fixed costs and prove ROI before scaling.

The Hallucination Problem

LLMs can generate plausible-sounding but incorrect information. Understanding this limitation is critical for deployment decisions.

Try It: Same Prompt, Different Answers

Prompt

What year was the Eiffel Tower completed?

Example 1 of 6

In real usage, LLMs might occasionally generate any of these responses. Click to see examples.

LLM Response

✅Accurate

The Eiffel Tower was completed in 1889.

Even simple factual questions can produce incorrect answers. Click "Generate Response" multiple times to see variations.

Why Hallucinations Happen

🎰

Pattern Matching, Not Knowledge

LLMs generate text that sounds right based on patterns they've seen, not actual facts they "know." Think of it like autocomplete on steroids—it predicts what words should come next, not what's true.

📚

Gaps in Training

If the correct information wasn't in the training data—or if contradictory information was— the model will guess. And confident-sounding guesses are often more convincing than admitting uncertainty.

🎭

Trained to Be Helpful

Models are trained to provide answers, not to say "I don't know." Sometimes being helpful and confident is prioritized over accuracy—a design trade-off you need to account for in your deployment.

Mitigation Strategies

Risk Tolerance Framework

✅

Low-Stakes: Acceptable

Marketing copy, brainstorming, first drafts. Errors are easily caught and inconsequential.

⚠️

Medium-Stakes: Mitigate

Customer support, research, analysis. Require grounding, verification, or human review.

🛑

High-Stakes: Human-in-the-Loop

Legal, medical, financial decisions. Never trust LLM alone. Always require expert human review.

Context Windows: Your Working Memory

LLMs have a limited "context window"—the amount of text they can process at once. This affects what you can do and how much it costs.

What Are You Processing?

Document Size (pages)50

0 pages500 pages

≈ 25K tokens

Tool Definitions0

0 tools100 tools

≈ 0.0K tokens

Conversation History (turns)5

0 turns50 turns

≈ 1.0K tokens

Fixed Overhead

System Prompt:1.0K tokens

Output Reserve:4.0K tokens

Context Window Usage

Total: 200K tokens13.5% used

Document: 25K

Conversation: 1.0K

Tools: 0.0K

System: 1.0K

Output: 4.0K

✅

Fits in Context Window

169K tokens remaining for additional output

Cost Impact

Larger context = higher cost. This request uses 27K input tokens. At $1.25/1M tokens (GPT-5), that's $0.0338 per request.

Strategies for Large Documents

📝

Summarization

Summarize long documents in chunks, then process summaries. Works for initial filtering.

✂️

Chunking

Break documents into sections, process individually. Best when questions target specific sections.

🔍

RAG (Retrieval)

Search for relevant sections first, then only send those to LLM. Most efficient for large knowledge bases.

Context Limits by Automation Tier

Workflows

Process large batches by chunking. Each chunk is independent.

Agents

Conversation history fills up. May need to summarize or truncate older messages.

Systems

Agents pass information via handoffs and summaries, not full context sharing.

How LLMs Power Applications

What You'll Learn

• How do LLMs actually work?

• What automation patterns exist?

• What will this cost and what ROI can I expect?

• What are the risks and how do I mitigate them?

Text In, Text Out

Four Key Training Aspects

Statistical Likelihood

Human Preference

Instruction Following

Tool Calling

The Three Flavors of AI Automation

AI Workflows

AI Agents

Agentic Systems

Which Pattern Fits Your Problem?

Choose Workflows when...

Choose Agents when...

Choose Systems when...

AI Workflows: Code in Control

Receive Invoice

Extract Data

Validate Data

Classify Category

Route Decision

Generate Summary

Send Notification

Key Characteristics

Predictable

Controllable

Testable

Production Considerations

AI Agents: LLM in Control

How ChatGPT Works

Chat History

AI Agents with Tool Calling

Agentic Systems: Coordinated Intelligence

Why Agentic Systems?

Specialized Expertise

Parallel Processing

Complex Problem Solving

When This Pattern Pays Off

✅Good Fit When:

❌Wrong Choice When:

Where LLMs Add Business Value

Start with Quick Wins

Document Processing

Ticket Classification

Resume Screening

Understanding Costs & ROI

Configure Your Scenario

Your ROI Analysis

⚠️ Remember: Variable Costs Are Only Part of the Picture

💰Variable Costs (Ongoing)

🔧Fixed Costs (One-Time + Maintenance)

Understanding the Economics

Token-Based Pricing

Right-Sizing Models

Start Simple

The Hallucination Problem

Try It: Same Prompt, Different Answers

Why Hallucinations Happen

Pattern Matching, Not Knowledge

Gaps in Training

Trained to Be Helpful

Mitigation Strategies

Grounding with RAG

Structured Outputs

Human-in-the-Loop

Multi-Step Verification

Temperature = 0

Fine-tuning on Domain Data

Risk Tolerance Framework

Low-Stakes: Acceptable

Medium-Stakes: Mitigate

High-Stakes: Human-in-the-Loop

Context Windows: Your Working Memory

What Are You Processing?

Fixed Overhead