Appen Launches AI Chat Feedback and Benchmarking
Appen, a provider of data for the artificial intelligence lifecycle, has released AI Chat Feedback and Benchmarking, two solutions to help companies launch high-performing large language models.
AI Chat Feedback empowers domain experts to assess multi-turn live conversations, enabling them to review, rate, and rewrite each response.It evaluates contextual understanding and coherence in complex conversations that extend over multiple turns or dialogues, mirroring real-world applications. It manages the end-to-end flow of data through multiple rounds of evaluation and provides data to help improve models.
The AI Chat Feedback tool directly connects LLM outputs with specialists so it can learn from diverse, natural chat data. Specialists chat live with the model, whether it's a customer's model or a third party's, and rate, flag, and provide context for their evaluations.
Appen's Benchmarking tool helps determine the right LLM for a specific enterprise application. Companies can evaluate the performance of various models along commonly used or fully custom dimensions, such as accuracy or toxicity. Combined with a curated crowd of Appen's AI training specialists, the tool also evaluates performance along demographic dimensions of interest, such as gender, ethnicity, and language. A configurable dashboard helps compare multiple models across dimensions.
"As AI Chatbots grow more advanced, the stakes are higher for enterprises to get them right before they're released into the world or they risk harmful biases and dangerous responses that could have long-term impacts on the business," said Appen CEO Armughan Ahmad in a statement. "Appen's new evaluation products provide our customers with an essential trust layer that ensures they are releasing AI tools that are truly helpful and not harmful to the public. This trust layer is backed by robust datasets and processes that have proven effective in our 27 years of AI training work and a team of over a million human experts who are attending to the nuances of the data."