Close Menu
Ztoog
    What's Hot
    Mobile

    The best Apple Watch Series 9 bands

    Gadgets

    Stability AI releases Stable Diffusion XL, its next-gen image synthesis model

    AI

    A Minecraft town of AI characters made friends, invented jobs, and spread religion

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » Meet ImageReward: A Revolutionary Text-to-Image Model Bridging the Gap between AI Generative Capabilities and Human Values
    AI

    Meet ImageReward: A Revolutionary Text-to-Image Model Bridging the Gap between AI Generative Capabilities and Human Values

    Facebook Twitter Pinterest WhatsApp
    Meet ImageReward: A Revolutionary Text-to-Image Model Bridging the Gap between AI Generative Capabilities and Human Values
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    In machine studying, generative fashions that may produce photos primarily based on textual content inputs have made important progress lately, with numerous approaches exhibiting promising outcomes. While these fashions have attracted appreciable consideration and potential purposes, aligning them with human preferences stays a main problem attributable to variations between pre-training and user-prompt distributions, leading to identified points with the generated photos.

    Several challenges come up when producing photos from textual content prompts. These embody difficulties with precisely aligning textual content and photos, precisely depicting the human physique, adhering to human aesthetic preferences, and avoiding potential toxicity and biases in the generated content material. Addressing these challenges requires extra than simply enhancing mannequin structure and pre-training information. One method explored in pure language processing is reinforcement studying from human suggestions, the place a reward mannequin is created by means of expert-annotated comparisons to information the mannequin towards human preferences and values. However, this annotation course of can take time and effort.

    To cope with these challenges, a analysis group from China has introduced a novel resolution to producing photos from textual content prompts. They introduce ImageReward, the first general-purpose text-to-image human choice reward mannequin, skilled on 137k pairs of knowledgeable comparisons primarily based on real-world person prompts and mannequin outputs.

    🚀 Build high-quality coaching datasets with Kili Technology and clear up NLP machine studying challenges to develop highly effective ML purposes

    To assemble ImageReward, the authors used a graph-based algorithm to pick numerous prompts and supplied annotators with a system consisting of immediate annotation, text-image ranking, and picture rating. They additionally recruited annotators with at the least college-level training to make sure a consensus in the rankings and rankings of generated photos. The authors analyzed the efficiency of a text-to-image mannequin on various kinds of prompts. They collected a dataset of 8,878 helpful prompts and scored the generated photos primarily based on three dimensions. They additionally recognized widespread issues in generated photos and discovered that physique issues and repeated era had been the most extreme. They studied the affect of “function” phrases in prompts on the mannequin’s efficiency and discovered that correct operate phrases enhance text-image alignment.

    The experimental step concerned coaching ImageReward, a choice mannequin for generated photos, utilizing annotations to mannequin human preferences. BLIP was used as the spine, and some transformer layers had been frozen to forestall overfitting. Optimal hyperparameters had been decided by means of a grid search utilizing a validation set. The loss operate was formulated primarily based on the ranked photos for every immediate, and the objective was to robotically choose photos that people want.

    In the experiment step, the mannequin is skilled on a dataset of over 136,000 pairs of picture comparisons and is in contrast with different fashions utilizing choice accuracy, recall, and filter scores. ImageReward outperforms different fashions, with a choice accuracy of 65.14%. The paper additionally contains an settlement evaluation between annotators, researchers, annotator ensemble, and fashions. The mannequin is proven to carry out higher than different fashions when it comes to picture constancy, which is extra advanced than aesthetics, and it maximizes the distinction between superior and inferior photos. In addition, an ablation research was carried out to investigate the impression of eradicating particular parts or options from the proposed ImageReward mannequin. The primary results of the ablation research is that eradicating any of the three branches, together with the transformer spine, the picture encoder, and the textual content encoder, would result in a major drop in the choice accuracy of the mannequin. In explicit, eradicating the transformer spine would trigger the most vital efficiency drop, indicating the essential function of the transformer in the mannequin.

    In this text, we introduced a brand new investigation made by a Chinese group that launched ImageReward. This general-purpose text-to-image human choice reward mannequin addresses points in generative fashions by aligning with human values. They created a pipeline for annotation and a dataset of 137k comparisons and 8,878 prompts. Experiments confirmed ImageReward outperformed present strategies and may very well be a super analysis metric. The group analyzed human assessments and deliberate to refine the annotation course of, lengthen the mannequin to cowl extra classes and discover reinforcement studying to push text-to-image synthesis boundaries.


    Check out the Paper and Github. Don’t neglect to hitch our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, the place we share the newest AI analysis information, cool AI tasks, and extra. If you may have any questions relating to the above article or if we missed something, be happy to e-mail us at Asif@marktechpost.com

    🚀 Check Out 100’s AI Tools in AI Tools Club


    Mahmoud is a PhD researcher in machine studying. He additionally holds a
    bachelor’s diploma in bodily science and a grasp’s diploma in
    telecommunications and networking programs. His present areas of
    analysis concern laptop imaginative and prescient, inventory market prediction and deep
    studying. He produced a number of scientific articles about particular person re-
    identification and the research of the robustness and stability of deep
    networks.


    🔥 StoryBird.ai simply dropped some superb options. Generate an illustrated story from a immediate. Check it out right here. (Sponsored)

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Gadgets

    Apple partly halts Beeper’s iMessage app again, suggesting a long fight ahead

    Enlarge / The dream of all people having blue bubbles, and epic photographs of completely…

    The Future

    Xbox Game Pass’ second wave of games for June starts with EA FC 24

    EA Sports FC 24 headlines the second wave of games coming to Xbox Game Pass…

    Technology

    Today’s NYT Mini Crossword Answers for July 21

    The New York Times Crossword Puzzle is famous. But if you do not have that…

    Science

    Do we create space-time? A new perspective on the fabric of reality

    IMAGINE approaching a Renaissance sculpture in a gallery. Even from a distance, it appears spectacular.…

    Science

    Machu Picchu housed people from all over South America

    Despite being a UNESCO World Heritage Site and one of the vital well-known archaeological places…

    Our Picks
    The Future

    Carvana crashes back down to earth

    Science

    The full sensory experience of eclipse totality, from inside a convertible in Texas

    Mobile

    Tecno Camon 30, 30 5G and 30 Pro officially launch

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    AI

    AI’s carbon footprint is bigger than you think

    Gadgets

    AeroPress XL Coffee Maker Review: Double the Size, Double the Brew

    The Future

    Pope Francis suffering from double pneumonia, shows ‘slight improvement’

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.