Classes from agile, experimental chatbot growth

Classes realized bringing LLM-based merchandise to manufacturing

A photo of me (Katherine Munro) on stage presenting this article as a talk. To watch or listen to the recording, click here. — In the present day’s publish recaps my latest speak on classes realized attempting to convey LLM-based merchandise to manufacturing. You’ll be able to try the video right here.

What occurs whenever you take a working chatbot that’s already serving 1000’s of shoppers a day in 4 completely different languages, and attempt to ship a fair higher expertise utilizing Giant Language Fashions? Good query.

It’s well-known that evaluating and evaluating LLMs is difficult. Benchmark datasets might be onerous to come back by, and metrics akin to BLEU are imperfect. However these are largely tutorial issues: How are trade knowledge groups tackling these points when incorporating LLMs into manufacturing tasks?

In my work as a Conversational AI Engineer, I’m doing precisely that. And that’s how I ended up centre-stage at a latest knowledge science convention, giving the (optimistically titled) speak, “No baseline? No benchmarks? No biggie!” In the present day’s publish is a recap of this, that includes:

The challenges of evaluating an evolving, LLM-powered PoC in opposition to a working chatbot
How we’re utilizing several types of testing at completely different phases of the PoC-to-production course of
Sensible professionals and cons of various check varieties