Podcast Avsnitt
Bläddra och filtrera alla tillgängliga avsnitt
Visar 1-12 av 16 avsnitt
Episode 10: The Judge – The Art and Science of Evaluating LLM Applications
The final, but perhaps most important step: evaluation. What are we actually testing? Dive into offline evaluation with example suites, how to find samples, evaluating solutions (including SOMA assessment), and online evaluation through A/B testing and various metrics. Learn to ensure quality and effectiveness in your LLM projects.
Episode 9: The Architect – Designing Basic and Advanced LLM Workflows
When is a conversational agent not enough? Discover basic and advanced LLM workflows. We look at how to define tasks, assemble workflows (with an example from Shopify plugin marketing), and explore concepts like LLM agents driving the workflow, stateful task agents, roles, and delegation.
Episode 8: The Conversation Master – Building Agents with Tools and Reasoning
Go beyond simple chats. Learn about LLMs trained for tool usage, guidelines for defining tools, and how to enable reasoning through techniques like Chain of Thought and ReAct. We also explore context for task-based interactions and how to build and manage conversational agents for a better user experience.
Episode 7: The Conductor – Guiding and Refining LLM-Generated Content
How do you ensure the LLM's output is what you intended? We look at the anatomy of an ideal "completion," including the preamble, recognizable start and end markers, and postscript. Explore logprobs, how to assess the quality of generated content, using LLMs for classification, critical points in the prompt, and model selection.
Episode 6: The Puzzle – Constructing the Perfect Prompt
Learn the anatomy of an ideal prompt. We discuss how to adapt the prompt depending on whether you're aiming for an advice conversation, an analytical report, or a structured document. Explore formatting snippets, "inertness," few-shot examples, elastic snippets, and the relationships between prompt elements like position and dependency.
Enterprise Transformation: Scaling DevOps and Secure Collaboration
Blueprint for scaling DevOps across large organizations. Covers platform engineering, security compliance automation, and team structures for cloud/ML initiatives
Episode 5: Feeding the Beast – Crafting Effective Prompt Content
The content of your prompt is crucial. We explore different sources of content, from static to dynamic. Learn about the importance of clarifying your question, the power of "few-shot prompting," how to find dynamic context, the basics of Retrieval-Augmented Generation (RAG), and summarization techniques.
Culture of Experimentation: Risk-Taking and Organizational Learning
Strategies for fostering innovation, implementing architectural safety nets, and converting failures into improvements. Case studies from tech giants and scaling startups
Episode 4: The Building Blocks – Designing and Evaluating LLM Applications
How do you actually build an application with an LLM at its core? We dissect "the loop" – from the user's problem to the model's output and back. Learn about the feedforward pass, the complexity of the loop, and how to evaluate the quality of LLM applications, both offline and online.
Feedback Loops: Monitoring, Observability, and Learning from Failure
Dives into monitoring architectures, incident response workflows, and creating psychological safety for blameless retrospectives. Real examples from microservices and ML pipelines.
Episode 3: From Instruction to Interaction – The Evolution of Chat Models
Explore the transition from instruction-based LLMs to today's advanced chat models. We highlight the importance of Reinforcement Learning from Human Feedback (RLHF), its benefits, and the "alignment tax." Learn the differences between "instruct" and "chat models," how APIs have changed, and how prompt engineering can be likened to playwriting.
Flow Mastery: CI/CD, Automation, and Infrastructure-as-Code
Breaks down CI/CD pipeline design, automated testing strategies, and IaC patterns. Features examples from cloud-native stacks (AWS/GCP) and MLOps workflows.