Close Menu
Ztoog
    What's Hot
    Science

    World’s tiniest particle accelerator fits on a coin

    Mobile

    Things are finally looking up for OPPO in Europe

    Mobile

    WiiM Amp review: This all-in-one network streaming amp is incredible

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

      India-Pak conflict: Pak appoints ISI chief, appointment comes in backdrop of the Pahalgam attack

    • Technology

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

      The more Google kills Fitbit, the more I want a Fitbit Sense 3

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

      Fortnite said to return to the US iOS App Store next week following court verdict

    • Science

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

      ‘Dark photon’ theory of light aims to tear up a century of physics

    • AI

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

      The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    • Crypto

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

      Speak at Ztoog Disrupt 2025: Applications now open

    Ztoog
    Home » OpenAI’s latest blunder shows the challenges facing Chinese AI models
    AI

    OpenAI’s latest blunder shows the challenges facing Chinese AI models

    Facebook Twitter Pinterest WhatsApp
    OpenAI’s latest blunder shows the challenges facing Chinese AI models
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    In reality, amongst the few lengthy Chinese tokens in GPT-4o that aren’t both pornography or playing nonsense, two are “socialism with Chinese characteristics” and “People’s Republic of China.” The presence of those phrases suggests {that a} vital a part of the coaching information truly is from Chinese state media writings, the place formal, lengthy expressions are extraordinarily widespread.

    OpenAI has traditionally been very tight-lipped about the information it makes use of to coach its models, and it most likely won’t ever inform us how a lot of its Chinese coaching database is state media and the way a lot is spam. (OpenAI didn’t reply to MIT Technology Review’s detailed questions despatched on Friday.)

    But it’s not the solely firm combating this downside. People inside China who work in its AI business agree there’s an absence of high quality Chinese textual content information units for coaching LLMs. One purpose is that the Chinese web was once, and largely stays, divided up by large corporations like Tencent and ByteDance. They personal most of the social platforms and aren’t going to share their information with rivals or third events to coach LLMs. 

    In reality, that is additionally why engines like google, together with Google, kinda suck relating to looking in Chinese. Since WeChat content material can solely be searched on WeChat, and content material on Douyin (the Chinese TikTok) can solely be searched on Douyin, this information will not be accessible to a third-party search engine, not to mention an LLM. But these are the platforms the place precise human conversations are occurring, as an alternative of some spam web site that retains attempting to attract you into on-line playing.

    The lack of high quality coaching information is a a lot greater downside than the failure to filter out the porn and common nonsense in GPT-4o’s token-training information. If there isn’t an present information set, AI corporations should put in vital work to determine, supply, and curate their very own information units and filter out inappropriate or biased content material. 

    It doesn’t appear OpenAI did that, which in equity makes some sense, given that folks in China can’t use its AI models anyway. 

    Still, there are numerous folks dwelling exterior China who wish to use AI companies in Chinese. And they deserve a product that works correctly as a lot as audio system of every other language do. 

    How can we resolve the downside of the lack of fine Chinese LLM coaching information? Tell me your concept at zeyi@technologyreview.com.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    How to build a better AI benchmark

    AI

    Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

    AI

    This data set helps researchers spot harmful stereotypes in LLMs

    AI

    Making AI models more trustworthy for high-stakes settings | Ztoog

    AI

    The AI Hype Index: AI agent cyberattacks, racing robots, and musical models

    Mobile

    Chinese tech icon is about to raise the stakes in a battle with US chipmaker over AI processors

    AI

    Novel method detects microbial contamination in cell cultures | Ztoog

    AI

    Seeing AI as a collaborator, not a creator

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Technology

    Free Technology for Teachers: 25 Search Strategies You Need to Know

    One of the preferred webinars and programs that I’ve hosted through the years have been…

    Crypto

    Why This 70-Year-Old Billionaire Wants To Own Bitcoin

    Billionaire investor Stanley Druckenmiller not too long ago gave his two cents on Bitcoin, acknowledging…

    The Future

    ActivTrak vs Teramind: A detailed 2023 comparison

    Want to know the distinction between ActivTrak vs Teramind? If you’re on the lookout for…

    AI

    Meta’s latest AI model is free for all 

    Under the hood Getting LLaMA 2 able to launch required numerous tweaking to make the…

    Crypto

    8 Best Cryptocurrency App Choices for Investors and Businesses

    Cryptocurrencies similar to Bitcoin and Ethereum have revolutionized our method to cash. You can use…

    Our Picks
    Crypto

    Ethereum Enters Accumulation Phase

    The Future

    Threads Launches New UX Features Amidst Controversies

    Gadgets

    Scootility: Innovative Electric Scooter For Last-Mile Cargo Delivery

    Categories
    • AI (1,482)
    • Crypto (1,744)
    • Gadgets (1,796)
    • Mobile (1,839)
    • Science (1,853)
    • Technology (1,789)
    • The Future (1,635)
    Most Popular
    Technology

    For the first time NASA has asked industry about private missions to Mars

    Mobile

    Google Wallet’s list of supported banks is bigger than ever with these new additions

    AI

    Top AI Tools for Data Analysts 2023

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.