Close Menu
Ztoog
    What's Hot
    Gadgets

    Affordable Apple Watch Ultra Possibly In Development, Suggests Leak

    The Future

    SAG-AFTRA Nets Deal, Possibly Bringing Actors Strike to a Close

    Gadgets

    This $20 AI-powered tool cures writer’s block and makes a practical, last-minute gift

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » Chatbot answers are all made up. This new tool could help you figure out which ones to trust.
    AI

    Chatbot answers are all made up. This new tool could help you figure out which ones to trust.

    Facebook Twitter Pinterest WhatsApp
    Chatbot answers are all made up. This new tool could help you figure out which ones to trust.
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    The Trustworthy Language Model attracts on a number of methods to calculate its scores. First, every question submitted to the tool is shipped to a number of totally different massive language fashions. Cleanlab is utilizing 5 variations of DBRX, an open-source mannequin developed by Databricks, an AI agency primarily based in San Francisco. (But the tech will work with any mannequin, says Northcutt, together with Meta’s Llama fashions or OpenAI’s GPT sequence, the fashions behind ChatpGPT.) If the responses from every of those fashions are the identical or comparable, it’s going to contribute to the next rating.

    At the identical time, the Trustworthy Language Model additionally sends variations of the unique question to every of the DBRX fashions, swapping in phrases which have the identical which means. Again, if the responses to synonymous queries are comparable, it’s going to contribute to the next rating. “We mess with them in different ways to get different outputs and see if they agree,” says Northcutt.

    The tool can even get a number of fashions to bounce responses off each other: “It’s like, ‘Here’s my answer—what do you think?’ ‘Well, here’s mine—what do you think?’ And you let them talk.” These interactions are monitored and measured and fed into the rating as nicely.

    Nick McKenna, a pc scientist at Microsoft Research in Cambridge, UK, who works on massive language fashions for code era, is optimistic that the method could be helpful. But he doubts it is going to be good. “One of the pitfalls we see in model hallucinations is that they can creep in very subtly,” he says.

    In a spread of exams throughout totally different massive language fashions, Cleanlab reveals that its trustworthiness scores correlate nicely with the accuracy of these fashions’ responses. In different phrases, scores shut to 1 line up with appropriate responses, and scores shut to 0 line up with incorrect ones. In one other check, additionally they discovered that utilizing the Trustworthy Language Model with GPT-4 produced extra dependable responses than utilizing GPT-4 by itself.

    Large language fashions generate textual content by predicting the almost definitely subsequent phrase in a sequence. In future variations of its tool, Cleanlab plans to make its scores much more correct by drawing on the possibilities {that a} mannequin used to make these predictions. It additionally needs to entry the numerical values that fashions assign to every phrase of their vocabulary, which they use to calculate these chances. This degree of element is offered by sure platforms, akin to Amazon’s Bedrock, that companies can use to run massive language fashions.

    Cleanlab has examined its method on information offered by Berkeley Research Group. The agency wanted to seek for references to health-care compliance issues in tens of 1000’s of company paperwork. Doing this by hand can take expert workers weeks. By checking the paperwork utilizing the Trustworthy Language Model, Berkeley Research Group was in a position to see which paperwork the chatbot was least assured about and test solely these. It lowered the workload by round 80%, says Northcutt.

    In one other check, Cleanlab labored with a big financial institution (Northcutt wouldn’t title it however says it’s a competitor to Goldman Sachs). Similar to Berkeley Research Group, the financial institution wanted to seek for references to insurance coverage claims in round 100,000 paperwork. Again, the Trustworthy Language Model lowered the variety of paperwork that wanted to be hand-checked by greater than half.

    Running every question a number of occasions by means of a number of fashions takes longer and prices much more than the standard back-and-forth with a single chatbot. But Cleanlab is pitching the Trustworthy Language Model as a premium service to automate high-stakes duties that will have been off limits to massive language fashions previously. The concept just isn’t for it to exchange present chatbots however to do the work of human specialists. If the tool can slash the period of time that you want to make use of expert economists or attorneys at $2,000 an hour, the prices will likely be value it, says Northcutt.

    In the long term, Northcutt hopes that by lowering the uncertainty round chatbots’ responses, his tech will unlock the promise of enormous language fashions to a wider vary of customers. “The hallucination thing is not a large-language-model problem,” he says. “It’s an uncertainty problem.”

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    AI

    “Periodic table of machine learning” could fuel AI discovery | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    The App Store is down, along with Apple TV, Apple Podcasts, and Apple Music

    Apple’s on-line companies look like having some issues, with an outage slicing many people off…

    The Future

    Disney vs. DeSantis Update: Florida Feud Background, Updates

    As the battle between Disney and Florida Governor Ron DeSantis continues to notch common updates,…

    Technology

    iPhone 17 series to get a long-awaited display upgrade, report claims

    Ryan Haines / Android AuthorityTL;DR A brand new report claims all iPhone fashions in 2025…

    Technology

    Halo Car Makes History with Driverless Operations in Las Vegas, Setting a New Standard

    Halo Car, a agency that specializes in delivering rental vehicles, has begun autonomous operations in…

    Crypto

    Ethereum Plunges Below $1,700, Here’s The Metric That Signaled This In Advance

    Ethereum has plunged beneath $1,700 in the course of the previous day. Here’s the on-chain…

    Our Picks
    Gadgets

    Windows 11 has made the “clean Windows install” an oxymoron

    Crypto

    Bitcoin Falls Out Of Step With US Equities, What This Could Mean For The Crypto Market

    Science

    Where are all the exomoons? The hunt for worlds orbiting alien planets

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Crypto

    Next Crypto to Explode in 2025: Top Picks and Analysis

    Technology

    Best Mini Fridge for Beer in 2023

    Technology

    Detect Quakes With “Raspberry Shakes”

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.