The rise of autonomous AI agents brings exciting possibilities but also the critical need for robust evaluation. This webinar focuses on practical strategies to assess and ensure your AI agents meet the highest standards of reliability, fairness, and performance in real-world scenarios.
What You’ll Learn:
This session will take a deep dive into a real-world case study: an airline customer support agent. We’ll analyze its ability to handle complex, multi-turn interactions—including managing itinerary changes, handling ambiguous user inputs, and ensuring seamless handoffs—while applying cutting-edge evaluation strategies to measure its performance, reliability, and user experience.
Why Attend:
Gain expert insights into building and evaluating conversational agents that can leverage tools, maintain long-term coherence, and seamlessly integrate with external systems to handle complex, real-world tasks.
Who Should Attend:
AI engineers, data scientists, product managers, and researchers working on LLM-driven chatbots, virtual assistants, and multi-turn AI systems who want to improve evaluation strategies and ensure real-world reliability.
Join our expert speakers for a presentation and Q&A
Andrew’s background in system quality and AI has spanned multiple problems spaces and operational modalities, starting from building critical software solutions to eventually leading teams at the forefront of the AI/ML development.
His career started with directly working with stakeholders in both commercial and government use cases, providing perspective on engineering mission critical software. At Synapse Technology Corporation, he combined this experience with the nascent AI/ML space by driving development on the company’s proprietary threat detection platform. During his time at Rakuten, he built upon this as Head of the AI Infrastructure team to scale his ability to execute across an organization and deliver enterprise-wide operational excellence. Now, as CTO and co-founder at Kolena, he seeks to bring his experience and expertise in AI/ML quality to build a safer world as the technology becomes ever more omnipresent.