#4 MLOps & AI/ML

Episode 4: The Building Blocks – Designing and Evaluating LLM Applications

How do you actually build an application with an LLM at its core? We dissect "the loop" – from the user's problem to the model's output and back. Learn about the feedforward pass, the complexity of the loop, and how to evaluate the quality of LLM applications, both offline and online.

48:41 Prompt Engineering for LLMs (book) 133,7 MB 3 uppspelningar
Ej påbörjad

Audio Player

0:00 / 48:41
Hastighet:
Sovtimer:

Transkript

Chapter 4 builds upon the information from the previous chapters. Chapter 2 explained how LLMs function primarily as document completion models that predict content one token at a time. Chapter 3 detailed how the chat API is built on these models, completing conversation transcripts, and notes that a chat model is essentially still a document completion model, but for conversational documents.
Chapter 4 focuses on the concept that the LLM application serves as a transformation layer. This layer iteratively and statefully converts real-world needs into text that LLMs can process, and then converts the data provided by the LLMs back into information and action addressing those real-world needs.
The chapter discusses the complexity of user problem domains across dimensions like the medium of the problem, the level of abstraction, and how stateful the problem is. For instance, a proofreading application is low complexity, while a travel planning assistant is quite complex across these dimensions.
Key criteria for designing an effective prompt are introduced in Chapter 4:
1.
The prompt must closely resemble content from the training set. This is referred to as the Little Red Riding Hood principle, emphasizing the importance of mimicking common patterns found in training data for predictable and stable completions.
2.
The prompt must include all the information relevant to addressing the user’s problem. You, as the prompt engineer, are responsible for this.
3.
The prompt must lead the model to generate a completion that addresses the problem.
4.
The completion must have a reasonable end point so that generation stops naturally. You must shape the prompt to ensure this.
Chapter 4 zooms in on the feedforward pass, which is the part of the LLM-application loop where the user problem is converted into the domain of the model. The basic steps for this translation include:

Context retrieval: Creating or retrieving the raw text that serves as context, including direct context from the user, indirect context from relevant sources (like documentation), and boilerplate text used to shape the response.

Snippetizing context: Breaking down the retrieved context into relevant chunks, ensuring they are appropriate sizes and potentially converting information from other formats into text snippets.

Scoring and prioritizing snippets: Assigning priority tiers or floating-point scores to snippets based on their importance and relevance for the prompt.

Prompt assembly: Combining boilerplate instructions, the user's request, and supporting context while managing the token budget.
The chapter also briefly touches on how complex prompt engineering can become, requiring state management, integration with external context, sophisticated reasoning, and interaction with external tools. These topics are introduced at a high level in Chapter 4 and detailed in later chapters. The conclusion of Chapter 4 reiterates that the LLM application is a transformation layer and highlights the concepts of collecting, extracting, and assembling context for the prompt.

Källor

Prompt Engineering For Llms By John Berryman And Albert Ziegler