AI Buzzwords Explained: LLM, Token, RAG & More

At some point every conversation about technology turned into a wall of AI vocabulary. People throw around "LLM," "RAG," "hallucination," and "embedding" like everyone in the room already knows what they mean. I didn't, not really, not in a way I could explain back to someone else. So I sat down and actually learned what these terms mean, one at a time, without the marketing fog around them. Here's the plain-language version.

LLM (Large Language Model)

An LLM is the core engine behind tools like ChatGPT or Claude. It's a model trained on huge amounts of text to predict what word (or part of a word) comes next in a sequence. That's it at the foundation, a very sophisticated next-word predictor. Companies like OpenAI and Anthropic are two of the better-known organizations building and training these models, and the "large" part refers to the scale of both the training data and the model itself.

Token

A token is the small chunk of text a model actually processes, it's not always a full word. Sometimes it's a word, sometimes it's part of a word, sometimes it's just a punctuation mark. When you hear about a model's "context window," that's measured in tokens, not words or characters. This matters in practice because it's also usually how usage is priced and how much text a model can actually "see" at once.

Prompt

A prompt is just the input you give the model, the instructions or question you type in. It sounds simple, but how you phrase a prompt has a real effect on the quality of what comes back. I wrote a full guide on prompt writing because this is genuinely a skill, not just typing a question and hoping for the best.

Fine-Tuning

Fine-tuning is taking an already-trained model and training it further on a smaller, specific set of data so it gets better at a particular task or adopts a particular style. Think of it as a general-purpose model getting specialized training afterward, rather than being built from scratch for that one job.

RAG (Retrieval-Augmented Generation)

RAG is a technique where the model doesn't just rely on what it learned during training, it actually retrieves relevant information from an external source (like a document database) and uses that retrieved content to generate its answer. This is one of the main ways companies make AI tools that can answer questions about specific, current, or private information that wasn't part of the model's original training.

Hallucination

A hallucination is when a model states something confidently that is false or made up. It's not lying in the human sense, the model has no concept of truth, it's generating plausible-sounding text, and sometimes "plausible-sounding" and "true" don't line up. This is exactly why fact-checking AI output matters, especially for anything specific like names, dates, or numbers.

Embedding

An embedding is a way of converting text (or images, or audio) into a list of numbers that represents its meaning, so that similar concepts end up mathematically close to each other. This is the underlying mechanism that lets a model understand that "puppy" and "dog" are related concepts, even though they're different words. Open platforms like Hugging Face host huge numbers of these models and the embeddings they produce, which is part of why the term comes up so often in technical discussions.

Inference

Inference is simply the act of running a trained model to get an output, as opposed to training it. When people talk about "inference costs" or "inference speed," they're talking about what happens every time you actually use the model, not the (much more expensive) process of building it in the first place.

Multimodal

Multimodal means a model can handle more than one type of input or output, like text and images together, instead of being limited to just text. A multimodal model can look at a photo and describe it, or take a text description and generate an image. This used to be a special feature; now it's quickly becoming the default expectation.

Agent

An agent is an AI system set up to take a sequence of actions toward a goal, rather than just answering a single question. Instead of one prompt and one response, an agent might break a task into steps, use tools, check its own work, and keep going until the task is done. It's one of the more loosely used buzzwords right now, so when someone says "agent," it's worth asking what they actually mean by it.

Why Bothering to Learn These Actually Helps

None of these terms are complicated once you strip away the hype around them. The real benefit of knowing them isn't sounding smart in a meeting, it's being able to tell when a product claim is meaningful versus when it's just buzzword soup. Once you know what RAG and fine-tuning actually are, for example, you can ask sharper questions about how a tool you're considering actually works under the hood, instead of just nodding along.