Episode 2: Behind the Curtain – How Language Models Think and See the World
Dive deep into the engine room of Large Language Models. We explore how LLMs process information, the differences compared to human thinking, and phenomena like "hallucinations." Learn about tokenization, auto-regressive models, the Transformer architecture, and how settings like "temperature" affect the output.
Audio Player
Transkript
Understanding How LLMs Process Information (How They "Think"):
Tokenization: LLMs do not process text character by character, but rather break it down into "multiletter chunks called tokens." These tokens typically range from three to four characters but can be longer for common words. The set of tokens a model uses is its "vocabulary." (Page 23)
Deterministic Tokenizers: Unlike human reading, LLMs use deterministic tokenizers. This means typos can significantly alter how a word is tokenized (e.g., "ghost" is one token, "gohst" can be three). (Page 24)
Seeing Text Differently: LLMs lack the intuitive understanding of visual aspects of text that humans possess (e.g., round vs. square letters, ASCII art, accents). Processing text with these variations requires significant computational effort from the model. (Page 26)
Token Counting and the Context Window: The number of tokens is crucial because it dictates the length of the text from the model's perspective, impacting processing time, computational cost, and the critical limitation of the "context window." The context window is the maximum amount of text (prompt + completion) an LLM can handle at once. Exceeding this limit results in an error. (Page 28) The size of the context window is a recurring limitation that prompt engineers must manage.
One Token at a Time: LLMs generate text sequentially, one token at each step. This process is described as "multiple tokens to a single token," where the model repeatedly predicts and appends the next most likely token. (Page 28) This highlights the auto-regressive nature of LLM generation.
Lack of "Thinking Time": Unlike humans who can pause and reflect, LLMs must produce a token at every step and "can't stall." (Page 29) This constraint underlies the need for prompt engineering techniques that guide the model towards better reasoning.
Temperature and Probabilities: LLMs don't just pick the most likely token; they compute a probability distribution over all possible tokens. The process of choosing the actual token is called "sampling." (Page 32)
These probabilities are often returned as "logprobs" (natural logarithms of probabilities), which are always less than or equal to 0. A logprob of 0 indicates certainty. (Page 33)
The "temperature" parameter controls the randomness of token selection. A temperature of 0 selects the most likely token deterministically, while higher temperatures introduce more variability and can lead to "inaccurate" but potentially more creative completions. (Page 34-35)