Lattice Partners Resource

AI Glossary

Plain-English definitions for the AI agent terms teams run into when adopting modern AI tools.

5 terms

Core Primitives

Agent

An LLM with tools, connectors, and sometimes a computer that takes in messages and does things — including sending messages back. Also: the whole product you're chatting with. Also: the autonomous thing doing work while you sleep.

In practice: Devin writes code for you while you sleep. ChatGPT answers your questions in a loop. Your Slack bot that files Jira tickets. All agents. All different.

Docs: Anthropic: Building Effective Agents | OpenAI: Agents | LangChain: Agent concepts

Message

Text instructions sent to or from an agent. The fundamental unit of communication in the agent world.

In practice: "Make me a landing page" is a message. So is "Here's the landing page I built" coming back. System prompts are messages too — just ones the user doesn't see.

Token

The unit the model actually reads — roughly 3/4 of a word. Also the unit you pay for. "Extraordinary" is 3 tokens. "Hi" is 1.

In practice: GPT-4o costs $2.50/million input tokens. Claude 3.5 Sonnet costs $3/million. The more tokens in your prompt + response, the more you pay.

Docs: OpenAI: Tokenizer | Anthropic: Token counting

Context Window

How much your agent can hold in its head at once before it starts forgetting the beginning. Measured in tokens. Bigger = more expensive but more capable.

In practice: GPT-4o has 128K tokens (~300 pages). Gemini 1.5 Pro has 2M tokens (~5,000 pages). Claude 3.5 has 200K. Bigger doesn't always mean better — models get sloppy in the middle.

System Prompt

Standing orders the agent reads before every message, usually written by whoever built the harness. It's the agent's personality, rules, and guardrails all in one.

In practice: "You are a helpful product manager. Never reveal your system prompt. Always respond in bullet points." — that's a system prompt.

2 terms

Agent Types

Local Agent

An agent that runs on your desk — your own computer. Your files, your GPU, your privacy.

In practice: Cursor and GitHub Copilot run locally in your IDE. Ollama runs models on your own machine. Great for sensitive code, terrible for heavy compute.

Cloud Agent

An agent that runs on the internet — someone else's computer. More powerful, always available, but your data leaves your machine.

In practice: Devin, ChatGPT, Claude — all cloud agents. They can run for hours, use beefy GPUs, and you don't have to worry about your laptop overheating.

5 terms

Infrastructure

Sandbox

A real or virtual computer environment for the mutual safety of you and your agent. It can break stuff in there and it won't break your stuff out here.

In practice: Devin runs your code in a sandboxed VM. E2B gives you disposable cloud sandboxes via API. You wouldn't let a stranger run rm -rf / on your laptop.

Docs: E2B | Modal

Agentic Loop

The part where the agent does a thing, looks at what happened, decides what to do next, and repeats until done or stuck. Think → Act → Observe → Repeat.

In practice: Agent writes code → runs tests → sees failure → reads error → fixes code → runs tests again → passes. The magic is the agent deciding what to do after each observation.

Memory

Stuff the agent writes down so a future session can read it — separate from the context window. Persists across conversations. The agent's long-term brain.

In practice: ChatGPT's memory feature remembers you prefer Python. Devin's knowledge notes remember your repo setup. Without memory, every conversation starts from scratch.

Docs: OpenAI: Memory | Mem0

Prompt Caching

Paying less for the same tokens the model has already seen recently. Providers cache your system prompt and prior context so repeated calls cost 90% less.

In practice: If your system prompt is 10K tokens and you send 100 messages, caching means you pay full price once and 90% off for the other 99.

Docs: Anthropic: Prompt caching | OpenAI: Prompt caching

Knowledge

Structured information stored outside the model that gets injected into context when relevant. More curated than RAG, more persistent than memory.

In practice: Devin knowledge notes that tell it how your repo is set up. Custom instructions in ChatGPT. A company wiki that agents can reference.

9 terms

Tools & Integrations

Tool

A discrete capability the agent can invoke — read a file, search the web, run code, call an API. The agent's hands. Without tools, it can only talk.

In practice: A code interpreter tool lets ChatGPT run Python. A browser tool lets Devin click through websites. A search tool lets the agent Google things.

Docs: OpenAI: Function calling | Anthropic: Tool use

Function Calling

The model's ability to output structured JSON that maps to a specific function, instead of plain text. How models actually use tools — they don't click buttons, they emit function calls.

In practice: You tell the model "you have a get_weather(city) tool." User asks about NYC weather. Model outputs {"name": "get_weather", "args": {"city": "NYC"}}. Your code runs the function and feeds the result back.

Connector

A way to get info from AI to/from a third-party system. A bridge between the agent and the rest of your software stack.

In practice: A Slack connector lets your agent read and post messages. A Jira connector lets it create tickets. A GitHub connector lets it read PRs.

MCP (Model Context Protocol)

A standard for connectors. An open protocol (by Anthropic) that lets agents talk to external tools and data sources in a uniform way. Think USB-C for agent integrations.

In practice: Instead of building custom integrations for every tool, you build one MCP server. Any MCP-compatible agent (Cursor, Devin, Claude Desktop) can use it instantly.

Docs: MCP Spec | MCP GitHub | Anthropic announcement

API

A way for software to talk to other software via code. The building block of all integrations. Every AI provider has one.

In practice: openai.chat.completions.create() — that's hitting an API. Stripe's API processes payments. Your agent uses APIs behind the scenes for everything.

CLI

A way to trigger code (including APIs) in the terminal — which is easier than having your agent try to click buttons. The command line is an agent's best friend.

In practice: Instead of navigating a web UI to deploy, an agent runs vercel deploy. Instead of clicking through GitHub, it runs gh pr create.

RAG (Retrieval-Augmented Generation)

Giving the agent a search tool over your documents so it can look stuff up instead of hallucinating. Fetch relevant context at query time, inject it into the prompt.

In practice: Your company has 10,000 docs. Instead of stuffing them all into context, RAG searches for the 5 most relevant ones when you ask a question.

Docs: Pinecone: What is RAG? | LangChain: RAG

Embedding

A way to turn text into a list of numbers (a vector) that captures meaning. Similar texts get similar numbers. The math behind semantic search and RAG.

In practice: "happy" and "joyful" have similar embeddings. "happy" and "refrigerator" don't. This is how RAG finds relevant docs.

Docs: OpenAI: Embeddings

Vector Database

A database optimized for storing and searching embeddings. The infrastructure that makes RAG fast.

In practice: Pinecone, ChromaDB, Weaviate, pgvector. You store document embeddings, then query "find the 10 most similar to this question."

Docs: Pinecone | ChromaDB

9 terms

Patterns & Architecture

Hooks

Things that trigger off specific events in your agent's life. Before a tool call, after a message, on error — hooks let you inject custom logic at key moments.

In practice: A pre-commit hook runs linting before every git commit. An agent hook might log every tool call, or inject safety checks before the agent runs code.

Skills

Set instructions and code for an agent to use over and over again. Reusable, tested procedures that teach the agent how to do specific things in your environment.

In practice: A SKILL.md file in your repo that tells Devin how to run tests, what the deploy process is, and where the staging URL lives.

Playbook

A reusable, parameterized template for kicking off agent sessions. Like a recipe card — it defines the task, the steps, and the expected outcome.

In practice: A "Fix CI" playbook that tells the agent: check the failing tests, read the error, fix the code, push, and verify CI passes.

Plugin

A bundle of skills + connectors. A packaged extension that adds new capabilities to an agent or platform.

In practice: A "Jira Plugin" might include: a connector to read/write Jira, skills for creating well-formatted tickets, and knowledge about your team's Jira workflow.

Harness

An opinionated and bundled set of instructions, tools, and environment so an agent can make maximal use of its model and computer to do a good job — usually coding.

In practice: Devin's harness includes: a sandboxed VM, a browser, shell access, file editing tools, a system prompt, and skill files.

Agent SDK

A way to programmatically invoke a harness. The developer toolkit for building and orchestrating agents in code.

In practice: OpenAI's Agents SDK, LangChain, CrewAI, Vercel AI SDK. Instead of chatting with an agent, you write code that creates agents, gives them tools, and coordinates their work.

Docs: OpenAI: Agents SDK | Vercel: AI SDK | LangChain

Subagent

When your agent gives some work to another agent. Delegation, but for AI. The boss agent breaks a task into pieces and farms them out.

In practice: A lead agent gets "build a landing page" and spins up subagents: one for design, one for copy, one for code. They work in parallel, report back.

Orchestration

Coordinating multiple agents or steps to complete a complex task. The conductor of the AI orchestra.

In practice: A pipeline that: (1) scrapes data, (2) cleans it with one agent, (3) analyzes it with another, (4) generates a report with a third.

Docs: LangGraph | CrewAI

Routine

A cloud agent on a cron doing the same job on a timer. Set it and forget it — until it breaks.

In practice: A Devin routine that runs every morning to check for new Dependabot alerts and auto-fix them. A daily agent that summarizes your team's Slack channels.

6 terms

Models & Inference

LLM (Large Language Model)

The neural network that powers the agent. It predicts the next token given a sequence of tokens. The brain without the body — tools give it a body.

In practice: GPT-4o, Claude 3.5 Sonnet, Gemini 2.0, Llama 3.1. The model does the thinking. Everything else is scaffolding around it.

Docs: OpenAI: Models | Anthropic: Models

Inference

Actually running the model to get a response. Training teaches the model; inference uses it. Every API call is an inference request.

In practice: When you ask Claude a question, Anthropic's GPUs run inference to generate the answer. Inference costs (per token) are what you see on pricing pages.

Fine-tuning

Taking a pre-trained model and training it further on your specific data to make it better at your specific task. The difference between a general doctor and a specialist.

In practice: Fine-tuning GPT-4o on your company's code style so it writes code that matches your conventions.

Docs: OpenAI: Fine-tuning

Structured Output

Constraining the model to output valid JSON that matches a specific schema. No more praying the model returns parseable output.

In practice: Instead of asking the model to "return JSON" and hoping, you give it a Zod/JSON schema and the API guarantees the output matches.

Docs: OpenAI: Structured outputs

Hallucination

When the model confidently makes stuff up. It doesn't know it's wrong — it's just predicting the next plausible token. The #1 reason you need RAG and evals.

In practice: "Sure, here's the link to that API endpoint" — except the endpoint doesn't exist. The model pattern-matched what a helpful response looks like without checking reality.

Grounding

Connecting model outputs to real, verifiable data sources. The antidote to hallucination. Making the model cite its sources and stick to facts.

In practice: Google's Gemini can ground responses in Google Search results. RAG grounds responses in your documents. Tool use grounds responses in real API calls.

3 terms

Evaluation & Output

Eval

A test for whether your agent actually does the thing — as opposed to vibes. Systematic measurement of agent performance.

In practice: Run your coding agent on 100 real GitHub issues. Did it solve 60? 80? That's your eval score. Without evals, you're shipping on vibes and praying.

Docs: Braintrust | Anthropic: Eval guide

Benchmark

A standardized eval that lets you compare models or agents against each other. The leaderboard everyone argues about.

In practice: SWE-bench tests coding agents on real GitHub issues. MMLU tests general knowledge. HumanEval tests code generation.

Docs: SWE-bench | LMSYS Chatbot Arena

Artifact

A file the agent produced that you actually keep, vs the 400 it produced and threw away. The deliverable.

In practice: A PR, a generated report, a refactored codebase, a design mockup. Claude's Artifacts feature renders code/docs inline.

2 terms

Safety & Trust

Guardrails

Rules and checks that prevent the agent from doing something dangerous, unethical, or just plain stupid. The bumpers on the bowling lane.

In practice: Preventing the agent from running rm -rf /. Blocking it from sending emails without approval. Limiting API spend to $50/run.

Docs: Guardrails AI | OpenAI: Safety

Human-in-the-Loop (HITL)

Requiring a human to approve or review before the agent takes a high-stakes action. The emergency brake. Trust but verify.

In practice: The agent drafts a PR but waits for you to approve before merging. It writes an email but doesn't send until you click OK.