In her 2023 TED Talk, computer scientist Yejin Choi stated, “AI today is unbelievably intelligent and then shockingly stupid.” How could something so smart be so foolish?
AI, especially Generative AI, is not designed to provide precise, context-specific knowledge geared toward a specific job on its own. This method of measuring a model is actually a fool’s errand. Consider these models as being focused on relevance based on what it has encountered and then producing reactions to these likely hypotheses.
Because of this, generative AI frequently falls short when it comes to B2B requirements, despite the fact that it continues to astound us with its originality. Sure, it’s brilliant of ChatGPT to turn social media content into rap, but generative AI may become hallucinatory if given too much freedom. This is when the model generates bogus information that passes for the real thing. These significant defects are undoubtedly bad for business, regardless of the sector a firm operates.
Enterprise-ready generative AI relies on data that has been carefully structured to offer the right context, which can then be used to train highly honed large language models (LLMs). Strong anti-hallucination frameworks are created by a carefully coordinated mix of sophisticated LLMs, actionable automation, and a few human inspections, which enables generative AI to produce accurate findings that really add value to B2B enterprises.
Here are three essential frameworks to include in your technological stack if your company wants to take advantage of generative AI’s limitless potential.
Create Effective Anti-Hallucination Systems
In a test, the generative falsehood detection startup Got It AI found that ChatGPT’s LLM gave wrong answers about 20% of the time. The objectives of a firm aren’t served by such a high failure rate. Therefore, you cannot allow generative AI to operate in a vacuum in order to resolve this problem and prevent hallucinations. It is crucial that the system be continuously observed by people and that it be trained on high-quality data in order to generate outputs. These feedback loops can help fix mistakes and increase model accuracy over time.
It’s essential to integrate the gorgeous writing of generative AI into a context- and outcome-driven system. Any system’s initial stage is a clean slate that ingests data customized to a firm and its unique objectives. A well-engineered system’s middle phase, which incorporates meticulous LLM fine-tuning, is its beating heart. Model fine-tuning, according to OpenAI, is “a powerful technique to create a new model that’s specific to your use case.” This happens by using generative AI’s typical methodology and training models on a large number of case-specific instances, leading to superior outcomes.
Companies might choose to use a combination of hard-coded automation and honed LLMs throughout this phase. Even though the choreography may vary from business to company, using each technology to its full potential guarantees the most context-oriented results.
It’s time to allow generative AI truly shine in contact with the outside world after everything on the back end has been put up. Answers are quickly produced, quite accurate, and offer a personal touch without being too sympathetic.
Orchestrate Technology With Human Checkpoints
Any business can supply the organized information and context required to help LLMs do what they do best by coordinating multiple technological levers. Identifying jobs that are computationally challenging for people but simple for automation and vice versa is the first step for leaders. Then consider the areas where AI is superior to both. In essence, avoid using AI when a more straightforward solution, such as automation or even human labor, would do.
John Collison, the founder of Stripe, stated during a chat with OpenAI CEO Sam Altman at the Stripe Sessions in San Francisco that Stripe utilizes OpenAI’s GPT-4 “anywhere someone is doing manual work or working on a series of tasks.” Automation should be used by businesses to complete tedious tasks like information gathering and document scanning. Additionally, they may hard-code definite, binary requirements like return policies.
It is not generative AI-ready until this solid foundation has been established. Systems are prepared to appropriately handle increased complexity since the inputs are well vetted before generative AI meets the data. To ensure the correctness of model output, as well as to provide the model feedback and correct outcomes as necessary, it is still essential to have people in the loop.
Measure Outcomes Via Transparency
LLMs are currently unknowable entities. The statement made by OpenAI upon the release of GPT-4 was, “Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.” Although progress has been made in making models less opaque, it is still unclear how the model works. Because the industry as a whole lacks standardized efficacy metrics, it is unclear not just what is going on inside the car but also what the differences between models are, other than price and how you use them.
Companies are now altering this and adding transparency to generative AI models. There are indirect business benefits from these standardized efficacy assessments. Anyone can check how well an LLM did for generative AI outputs because companies like Gentrace link data back to consumer feedback. Other businesses, like Paperplane.ai, go a step further by collecting generative AI data and connecting it with user input so executives can monitor the effectiveness, efficiency, and cost of deployments over time.