Google Deepmind Introduces ‘Superhuman’ AI for Fact-Checking

Google Deepmind Introduces 'Superhuman' AI for Fact-Checking

A recent study from Google’s DeepMind research team discovered that an artificial intelligence system may beat human fact-checkers when assessing the integrity of material provided by massive language models.

The research, titled “Long-form factuality in large language models” and published on the pre-print service arXiv, describes a method known as the Search-Augmented Factuality Evaluator. SAFE uses a big language model to break down produced text into individual facts, which are then checked against Google Search results to assess claim accuracy.

“SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether the search results support a fact,” the authors of the paper stated.

‘Superhuman’ Performance Raises Controversy

The researchers compared SAFE to human annotators on a dataset containing around 16,000 facts and discovered that SAFE’s evaluations matched the human rates 72% of the time. More importantly, in a sample of 100 conflicts between SAFE and human raters, SAFE’s decision was determined to be right 76% of the time.

While the research claims that “LLM agents can achieve superhuman rating performance,” other experts doubt what “superhuman” actually implies in this context.

Gary Marcus, a well-known AI expert and frequent skeptic of overstated claims, argued on Twitter that in this situation, “superhuman” may just mean “better than an underpaid crowd worker, rather a true human fact checker.”

“That makes the characterization misleading,” added the professor. “Like saying that 1985 chess software was superhuman.”

Marcus has a fair argument. To genuinely demonstrate superhuman performance, SAFE would need to be tested against skilled human fact-checkers rather than crowdsourcing labor. The specifics of the human raters, such as their credentials, pay, and fact-checking method, are critical for correctly contextualizing the results.

Cost Reductions and Benchmarking Top Models

SAFE has an obvious economic benefit – the researchers discovered that using the AI system was around 20 times less expensive than using human fact-checkers. As the volume of information created by language models grows, having an affordable and scalable method to validate claims will become increasingly important.

The DeepMind researchers utilized SAFE to assess the factual accuracy of 13 top language models from four families (Gemini, GPT, Claude, and PaLM-2) using a novel benchmark known as LongFact. Their findings imply that bigger models caused fewer factual mistakes. 

However, even the best-performing models produced a large percentage of incorrect statements. This highlights the dangers of depending too much on language models that can effectively represent incorrect data. Automatic fact-checking tools, such as SAFE, might help to mitigate these risks.

Transparency and Human Baselines are Important

While the SAFE code and LongFact dataset have been made available on GitHub, allowing other academics to examine and improve on the work, greater transparency is still required about the human baselines utilized in the study. Understanding the crowdworkers’ backgrounds and processes is critical for evaluating SAFE’s capabilities in the correct context.

As tech firms rush to create increasingly advanced language models for purposes ranging from search to virtual assistants, the ability to automatically fact-check these systems’ outputs might be critical. Tools like SAFE are critical to establishing a new layer of trust and responsibility.

However, such significant technologies must be developed in the open, with input from a diverse variety of stakeholders outside the walls of any single corporation. To assess actual success, rigorous, transparent benchmarking against human specialists rather than crowd workers will be required. Only then can we assess the practical impact of automatic fact-checking in the battle against disinformation.

Source- VentureBeat

Leave a Reply

Your email address will not be published. Required fields are marked *