Close Menu
Ztoog
    What's Hot
    Gadgets

    Apple announces RCS support for iMessage

    Science

    CDC reports dips in flu, COVID-19, and RSV—though levels still very high

    Gadgets

    8 Best Meal Kit Delivery Services (2023): Blue Apron, Dinnerly, and More

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » OLAPH: A Simple and Novel AI Framework that Enables the Improvement of Factuality through Automatic Evaluations
    AI

    OLAPH: A Simple and Novel AI Framework that Enables the Improvement of Factuality through Automatic Evaluations

    Facebook Twitter Pinterest WhatsApp
    OLAPH: A Simple and Novel AI Framework that Enables the Improvement of Factuality through Automatic Evaluations
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Large Language Models (LLMs) are entering into medical and medical fields as they develop in functionality and versatility. These fashions have a quantity of advantages, together with the capability to complement and even substitute the work that docs sometimes do. This embrace offering medical data, maintaining monitor of affected person data, and holding consultations with sufferers.

    In the medical career, one of the most important benefits of LLMs is their capability to provide long-form textual content, which is important for giving thorough responses to affected person inquiries. Responses that are correct and instructive are important, notably in medical conditions when offering false data might need detrimental results. For occasion, when a affected person asks about the origins of a white tongue, the LLM should reply in truth about potential causes, together with bacterial accumulation, with out spreading myths, comparable to the concept that the situation is invariably harmful and irreversible.

    ✅ [Featured Article] LLMWare.ai Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

    In the medical space, there are quite a few eventualities through which producing complete, prolonged solutions is important. This is especially essential when answering inquiries from sufferers, as the particulars given have to be true and factual. To guarantee the accuracy and consistency of these solutions, an automatic course of for assessing the assertions made by LLMs is required. 

    To dive into this, in a latest research, a staff of researchers has produced MedLFQA, a specialised benchmark dataset derived from pre-existing long-form question-answering datasets in the biomedical space. The purpose of MedLFQA is to make it simpler to routinely assess the factual accuracy of responses produced by LLMs. This dataset helps in figuring out the accuracy and dependability of the info provided in these prolonged responses.

    The staff has provided a novel framework known as OLAPH (Optimizing Large language fashions’ Answers with Preferences of lowering Hallucination). OLAPH makes use of a sequence of automated assessments to enhance the factual accuracy of LLMs. The methodology makes use of an iterative coaching course of to show the LLM to favor responses with the biggest factual and evaluation metrics scores. 

    For every query, the OLAPH framework generates a number of response samples. Then, utilizing predetermined evaluation standards, the response with the biggest rating is chosen. The LLM is then additional skilled utilizing this most well-liked response, bringing its subsequent responses nearer to the appropriate and most well-liked solutions. The mannequin would in any other case produce false data, however this iterative strategy helps to restrict the challenge of hallucinations.

    The outcomes have proven appreciable enhancements in factual accuracy for LLMs skilled with the OLAPH framework, even when measured in opposition to measures not expressly included in the coaching process. A 7-billion parameter LLM skilled with OLAPH produced long-form responses on par with skilled medical responses in phrases of high quality.

    The staff has summarized their main contributions as follows.

    1. The staff has launched MedLFQA, a reorganized benchmark dataset for automated evaluation of the long-text era produced by LLMs in the biomedical discipline. 
    1. In order to judge the veracity of medical claims offered in long-form responses, the staff has developed two distinct statements that provide a complete image of the LLMs’ capability to provide correct knowledge.
    1. OLAPH framework has been launched, which boosts LLM replies through iterative studying and computerized analysis. 
    1. It has been demonstrated that LLMs with 7 billion parameters when skilled utilizing the OLAPH framework, can produce long-form solutions that are comparable in factual accuracy to these offered by medical specialists.

    In conclusion, this research proposes the OLAPH structure to reinforce long-form medical responses by iterative coaching, and it introduces MedLFQA as a baseline for assessing the factual accuracy of these responses produced by LLMs. The findings present that OLAPH has the potential to vastly enhance LLMs’ dependability in producing correct medical data, which might be essential for a quantity of medical functions.


    Check out the Paper and Github. All credit score for this analysis goes to the researchers of this undertaking. Also, don’t neglect to observe us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you want our work, you’ll love our e-newsletter..

    Don’t Forget to affix our 42k+ ML SubReddit


    Tanya Malhotra is a last 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
    She is a Data Science fanatic with good analytical and vital pondering, together with an ardent curiosity in buying new expertise, main teams, and managing work in an organized method.


    [Free AI Webinar] ‘How to Build Personalized Marketing Chatbots (Gemini vs LoRA)’.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    AI

    “Periodic table of machine learning” could fuel AI discovery | Ztoog

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    Google’s Find My Device tipped to gain extra layer of user security

    What you want to knowA notable tipster dove into the most recent Find My Device…

    The Future

    Risk algorithm used widely in US courts is harsher than human judges

    Judges can use algorithms to assist make their choicesFrances Twitty/Getty Images A US courtroom experiment…

    Gadgets

    11 Best Travel Deals for Your Next Trip

    New yr, new journey plans. As you look forward to the potential of 2024, do…

    Science

    Experimental treatment grows livers from lymph nodes

    A workforce of scientists is making an attempt to develop a brand new liver within…

    Science

    This Technology Can Transform Any Paper into a Keyboard

    It’s been a few years since print media started to be digitized, however a know-how…

    Our Picks
    Crypto

    Former SEC chair Jay Clayton feels ‘vast majority’ of crypto tokens are securities

    The Future

    The Last of Us Showrunner Rules Out a Return to Bill and Frank

    Science

    This new version of quantum theory is even stranger than the original

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Mobile

    Verizon is now all-in on Google Messages and its version of RCS

    The Future

    Police won’t fine Elon Musk for illegally livestreaming while driving

    AI

    Stability AI’s New Upcoming Tool Uses AI to Generate 3D Models

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.