Executive Summary
Artificial Intelligence is becoming table stakes for maintaining competitiveness, even for organizations that do not consider themselves "tech companies." Mid-market firms, with their leaner teams and tighter margins, can capture outsized value from AI by automating knowledge-heavy processes and unlocking new insights—provided they adopt a deliberate, risk-aware approach.
This report distills the lessons we have learned from dozens of conversations with CEOs and functional leaders who know they need AI but are unsure where to start. We first build a shared vocabulary (Foundations) and a clear-eyed view of what current large language models can—and cannot—do (Model Strengths & Limitations). We then introduce a scoring framework that helps executives identify the highest-impact, lowest-friction use cases hiding inside their own operations (Identifying Opportunities).
The playbook that follows turns strategy into action: configure the Big Three (prompt, context, tools), execute tasks with agentic systems, and implement an evaluation loop that keeps outputs reliable as models evolve. Throughout, we surface practical architectures, governance checkpoints, and cost-control tactics so teams can move fast without sacrificing safety or ROI.
Finally, we spotlight the "lethal trifecta" of AI security—private data access, exposure to untrusted input, and outbound communication—and outline engineering patterns that break at least one side of that triangle at all times.
Key Outcomes
- • Prioritize projects that deliver measurable value within one to two quarters
- • Deploy AI responsibly across core functions
- • Build internal momentum for a longer-term roadmap of increasingly sophisticated capabilities
Foundations
The current wave of AI tools, products, and workflows is rapidly evolving. This is exciting and provides many value creating opportunities, but it is very difficult to keep up! Let's begin by defining the key words, terms, products, and companies that will form the foundation of our framework.
Essential Terms
Large language model (LLM)
Computer software system built by analyzing a massive body of diverse text including books, websites, reference materials, online discussions, and professional documents that estimates the most likely helpful response based on patterns in human communication.
Artificial intelligence (AI)
A computer system that connects a powerful model (like an LLM) with other systems or components to create versatile computational systems that exhibit intelligent behavior by analyzing data, understanding context, and generating appropriate responses or actions.
Context
Additional task specific information (text, images, files) that help an LLM understand the analytical setting for your prompt and ground its responses. Used to augment the LLMs extensive, generic world knowledge with task relevant information.
Prompt
A written set of instructions given to an LLM. This often includes the description of a persona or role ("you are a digital marketing specialist" or "you are a senior software engineer with expertise in Python"), relevant context, task instructions, output format expectations, and ideally a validation or testing strategy the LLM can use to verify its output.
Tools
Connections, integrations, and extensions that can be invoked by an AI system to extend its capabilities beyond generating. Examples include reading or writing files, sending emails, accessing a database, search the web, and interact with external APIs.
Agent
An LLM with access to tools running in a loop. This means the AI can work independently on complex tasks by thinking through problems step-by-step, using different tools as needed, and adjusting its approach based on results—similar to how a skilled assistant would complete a project from start to finish.
View Additional Terms
Embeddings
A numerical summary of the semantic meaning of a piece of data. The actual numbers are not meaningful, but they have the useful property that similar data will have similar numbers.
Vector Store
A storage system or database for storing embeddings. Key properties include fast comparison of embeddings for different data, storage of original data, and storage of metadata like original file name/url/page number/etc.
Retrieval Augmented Generation (RAG)
An agentic system where (1) embeddings are generated and stored for domain specific data and documents (2) user queries (prompts) are embedded (3) similar/matching/relevant document chunks are retrieved and added to the LLMs context (4) the LLM responds to the user query
Reasoning or Thinking Models
Advanced AI models that show their step-by-step problem-solving process before providing an answer, similar to how a human might walk through their analysis. These models excel at complex tasks requiring logic, mathematics, coding, and strategic planning by breaking problems into smaller components.
Model Context Protocol (MCP) Server
An open standard developed by Anthropic that enables AI assistants to securely connect with external data sources and tools through a unified protocol. MCP servers act as bridges between AI systems and your organization's databases, APIs, and internal tools, allowing AI to access real-time information while maintaining security boundaries.
Token
The basic unit of text that an LLM processes—roughly equivalent to a word or part of a word. Understanding tokens helps estimate costs and context limits, as most AI services charge per token processed.
Fine-tuning
The process of customizing a pre-trained AI model with your organization's specific data, terminology, and use cases—like training a new consultant on your company's unique methodologies and client base.
API (Application Programming Interface)
The technical bridge that allows your business systems to communicate with AI services programmatically, enabling automation and integration into existing workflows.
Hallucination
When an AI generates plausible-sounding but factually incorrect information. Understanding this limitation is crucial for risk management and establishing appropriate verification processes.
Multimodal AI
Systems that can process and generate multiple types of content—text, images, audio, and video—enabling more comprehensive analysis and communication capabilities.
OpenAI
The company behind ChatGPT and GPT models, a major player in commercial AI services offering both consumer and enterprise solutions.
Anthropic
Creator of Claude AI, focused on building helpful, harmless, and honest AI systems with strong emphasis on safety and reliability for enterprise use.
Google Gemini
Google's flagship AI model family that powers various Google products and services. Available in multiple versions (Ultra, Pro, Flash) for different performance needs, Gemini integrates deeply with Google Workspace tools like Docs, Sheets, and Gmail, making it particularly relevant for organizations already using Google's enterprise ecosystem.
Microsoft Copilot
Microsoft's AI assistant integrated across Office 365 and enterprise tools, designed to enhance productivity within familiar business applications.
Perplexity
An AI-powered search engine that provides sourced, conversational answers to queries—useful for research and competitive intelligence.
Workflow Automation
Using AI to connect and orchestrate multiple business processes, reducing manual work and improving consistency across operations.
Guardrails
Technical and procedural controls that ensure AI systems operate within defined parameters, maintaining compliance, brand standards, and risk thresholds.
Inference
The process of an AI model generating responses to inputs in real-time—the operational phase where business value is created from trained models.
Model Governance
The framework of policies, procedures, and oversight mechanisms ensuring responsible AI deployment, including version control, access management, and performance monitoring.
Evaluations (Evals)
Systematic testing and measurement processes to assess AI model performance, accuracy, and reliability for specific business use cases. Like quality assurance in traditional software, evaluations help organizations validate that AI outputs meet required standards, identify edge cases where models may fail, and track performance over time.
Model Strengths & Limitations
LLMs are incredibly powerful, but they are not good at everything. Understanding these boundaries is crucial for successful implementation.
Strengths
-
Summarization
Excellent at understanding, synthesizing, and summarizing language across documents
-
Exploration
Can quickly iterate and explore diverse options when properly configured
-
Web Research
Can search hundreds of websites and produce comprehensive reports
-
Coding
World-class coding assistants that significantly increase productivity
Limitations
-
Multi-step Reasoning
Performance degrades with complex logic requiring multiple reasoning steps
-
Causal Reasoning
Cannot truly understand cause-and-effect relationships
-
Mathematical Reasoning
Struggles with multi-step equations and complex calculations
-
Hallucination
May generate plausible but false information
Identifying Opportunities for AI
A key question for any business decision maker is how to identify opportunities for effectively utilizing AI in their business.
In the table below we present the most important project or task properties to consider when trying to identify where AI can add value in your organization. Review each property and think about your current business processes—tasks that align closely with these properties are strong candidates for AI-driven improvement. This framework is designed to help leadership quickly spot high-impact opportunities and avoid common pitfalls when evaluating where to start with AI.
| Property | Weight (/5) | Explanation | Variable | Example Tasks |
|---|---|---|---|---|
|
Repetitive & Rule-Based
| 5 | The work follows a predictable pattern or a set of defined, logical rules that can be learned and consistently applied. | R | Invoice processing, data entry, form categorization, basic report generation. |
| Data-Intensive | 5 | Success depends on processing, synthesizing, or retrieving information from large volumes of text, code, or other data formats. | D | Market research analysis, legal e-discovery, summarizing scientific literature, log analysis. |
|
Pattern Recognition Dependent
| 5 | The core activity involves identifying trends, anomalies, classifications, or clusters within data that may not be obvious to humans at scale. | P | Fraud detection, sentiment analysis, customer churn prediction, medical image screening. |
|
Generative in Nature
| 5 | The primary output involves creating new content, code, or structured data based on a prompt or existing information. | G | Writing email drafts, generating marketing copy, creating code snippets, summarizing meetings. |
|
Experiment-Based
| 4 | The task requires rapid iteration or the generation of multiple variations to test hypotheses or explore creative options. | E | A/B testing ad copy, brainstorming product names, simulating customer dialogues, experimenting with different user interface concepts, generating test data. |
| Labor-Intensive | 4 | The task requires a significant number of human hours to complete, making automation a high-value proposition for cost and time savings. | L | Document review, audio transcription, moderating user-generated content, tagging images. |
|
Objective & Verifiable
| 4 | The quality of the output can be measured against clear, objective criteria, making it possible to validate the AI's performance. | O | Answering factual questions, checking code for syntax errors, data validation, comparing documents. |
|
Prone to Human Error
| 3 | The task is tedious or requires such high attention to detail that humans are likely to make mistakes due to fatigue or oversight. | H | Data migration and cleanup, proofreading for basic errors, reconciling large financial statements. |
|
Low Requirement for Emotional Intelligence
| 3 | The task does not depend on deep empathy, complex negotiation, or nuanced interpersonal skills to be completed successfully. | I | Tier-1 technical support, scheduling logistics, routing customer inquiries, data classification. |
How to Calculate a Weighted Score for AI Opportunity Assessment
To systematically evaluate and prioritize AI opportunities, use this weighted scoring formula:
AI Opportunity Score = (R×5) + (D×5) + (P×5) + (G×5) + (E×4) + (L×4) + (O×4) + (H×3) + (I×3)
Score each task from 0-5 for each property using the variables from the table above.
Score interpretation:
- • 140-180: Excellent AI candidate - high priority for implementation
- • 100-139: Good AI candidate - strong potential for automation
- • 60-99: Moderate AI candidate - consider after higher priorities
- • Below 60: Poor AI candidate - likely not worth pursuing with current technology
Maximum possible score: 180 points
Spreadsheet Implementation:
- 1. Create columns for each variable (R, D, P, G, E, L, O, H, I)
- 2. Score each task from 0-5 for each property
- 3. Use formula: =(R*5)+(D*5)+(P*5)+(G*5)+(E*4)+(L*4)+(O*4)+(H*3)+(I*3)
- 4. Sort tasks by final score to prioritize implementation order
Ready-to-use spreadsheet with formulas pre-configured
How to Use This Framework
Start by generating a list of tasks or processes within your organization. For each task, review the properties in the table and assign a score based on how closely the task matches each property.
Tasks that score highly—especially those that are repetitive, data-intensive, pattern recognition dependent, or generative—are prime candidates for AI automation or augmentation.
This approach helps leadership quickly identify where AI can deliver the greatest impact, prioritize projects, and build a roadmap for implementation. Focus first on high-value, low-risk opportunities to build momentum and confidence before tackling more complex or sensitive areas.
View Example Scoring Grid
| Task/Process | R | D | P | G | E | L | O | H | I | Score | Priority |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Monthly Financial Report Generation | 5 | 5 | 3 | 4 | 2 | 5 | 5 | 4 | 5 | 156 | Excellent |
| Customer Support Email Triage | 4 | 4 | 5 | 3 | 1 | 5 | 3 | 3 | 4 | 129 | Good |
| Contract Review & Analysis | 3 | 5 | 4 | 2 | 1 | 5 | 3 | 3 | 3 | 113 | Good |
| Social Media Content Creation | 3 | 2 | 3 | 5 | 5 | 3 | 2 | 2 | 3 | 112 | Good |
| Employee Performance Reviews | 2 | 3 | 2 | 2 | 1 | 3 | 1 | 2 | 1 | 68 | Moderate |
| Executive Strategy Sessions | 1 | 2 | 1 | 1 | 2 | 2 | 0 | 1 | 0 | 42 | Poor |
Scoring Key:
Implementation Playbook
Identifying the opportunity is the first and most critical step in AI success. The subsequent steps are: (1) framing the problem for success (2) executing the task with an agentic system (3) reviewing the output.
Frame the Problem (The Big Three)
Prompt
What instructions would you give to a teammate to complete the task without further explanation?
Context
What reports, datasets, examples would you refer to if solving this problem yourself?
Tools
What systems would you access or calculations would you perform as part of your workflow?
Effective Prompt Structure
# PROMPT TASK
Role: You are a <role_description>
# Task Overview
<client> works in <industry> doing <value_add>
<client> is trying to optimize...
This task is important because <reason>
Results will be used by <team_member>
# Desired output
Successfully completing the task will require you to <output_description>
The <output> must be <quality1>, <quality2>, and <quality3>
# Verification Strategy
To ensure proper completion <testing_strategy>
# Tasks
1. <step1>
2. <step2>
3. ... Execute the Task
- • Choose your AI client (ChatGPT, Claude Desktop, Gemini, etc.)
- • Attach relevant files for context
- • Set up necessary tools
- • Copy/paste your prompt and execute
Review the Output
Start by checking for completeness and accuracy. Compare results against known benchmarks. For objective tasks, use automated checks. For subjective outputs, gather stakeholder feedback. Document issues to improve future prompts and configurations.
Advanced Strategies & Architectures
Evaluation Framework
The LLM landscape changes rapidly. Something you build today may not work the same tomorrow. An essential step for reliability is setting up an evaluation framework.
Basic Evaluation Requirements
- 1. Prompt – the exact question or task you give the AI
- 2. Background material/tools – documents, data, or tools the AI can use
- 3. Good answer description – desired result characteristics
- 4. Example answer – human-written reference response
- 5. Scoring method – simple rating system (0-5 or pass/fail)
- 6. Target score – minimum acceptable performance
Security Considerations: The Lethal Trifecta
Critical Security Triangle
- 1. Access to your private data — one of the most common purposes of tools
- 2. Exposure to untrusted content — any mechanism for malicious text to reach your LLM
- 3. Ability to externally communicate — pathways that could exfiltrate data
When these three capabilities intersect, they create a direct pipeline for attackers: untrusted content can instruct the model to read your most sensitive files and immediately transmit them outside your perimeter. Because the entire sequence happens inside the agent's own reasoning loop, conventional security layers often have no visibility or control.
Mitigation Strategy
The only reliable mitigation is to ensure at least one side of the triangle is disabled at all times. If an agent must handle confidential information, restrict both its exposure to arbitrary inputs and its ability to call outbound services. If it must process untrusted input, run it in a tightly controlled sandbox with synthetic data and no external network access.