Text generation is the most visible capability of modern AI. LLMs can write emails, explain concepts, summarise documents, translate languages, write code, compose poetry, and hold extended conversations. Understanding how this works — and where it breaks down — makes you a much better user of these tools.
How text is generated — token by token
When you send a message to an LLM, it doesn't generate the whole response at once. It generates one token at a time, each token selection influenced by everything that came before. This is why you see the text "streaming" in character by character.
At each step, the model produces a probability distribution over all possible next tokens (there are roughly 50,000 in a typical vocabulary). The temperature setting controls how the model samples from this distribution:
- Low temperature (0.1–0.3) — almost always picks the most probable token. Consistent, predictable, useful for factual or code tasks.
- High temperature (0.8–1.2) — more likely to pick less probable tokens. Creative, varied, sometimes surprising. Useful for creative writing, brainstorming.
What LLMs are genuinely excellent at
- Drafting, editing, and improving text — emails, reports, essays, documentation
- Summarisation — distilling long documents into key points
- Translation — approaching human quality in major language pairs
- Explanation — breaking down complex topics in accessible language
- Code generation — writing, debugging, and explaining code across many languages
- Classification — categorising text by topic, sentiment, or intent
- Extraction — pulling structured information from unstructured text
The hallucination problem — why LLMs confabulate
LLMs generate text by predicting the most plausible continuation — not by retrieving verified facts. When asked about something outside their training data, or something requiring precise recall, they confidently generate text that sounds correct but may be entirely fabricated.
This isn't a bug being actively fixed — it's a fundamental consequence of how these systems work. Mitigation strategies include: grounding models in retrieved documents (RAG), giving models access to search tools, and training models to express uncertainty. But hallucination cannot be fully eliminated in pure text generation systems.
Treat LLM outputs as a smart first draft that needs fact-checking, not as a source of truth. Use LLMs for generation and drafting; use other tools and your own knowledge for verification.
Prompt sensitivity
LLM outputs are highly sensitive to how you phrase your input. "Summarise this article" and "Give me the three most important takeaways from this article in bullet points" will produce very different outputs — even with the same input. This is why prompt engineering is a genuinely useful skill, not just marketing jargon.
Key takeaways
- LLMs generate text one token at a time, sampling from a probability distribution
- Temperature controls creativity: low = consistent, high = creative and varied
- LLMs excel at drafting, summarising, translating, explaining, and coding
- Hallucination is fundamental to how LLMs work — always verify factual claims
- Prompt phrasing significantly affects output — prompt engineering matters