Close Menu
Ztoog
    What's Hot
    Crypto

    Optimism execs are optimistic for Ethereum’s future

    Mobile

    WhatsApp now lets you send video messages

    AI

    Closing the design-to-manufacturing gap for optical devices | Ztoog

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Any wall can be turned into a camera to see around corners

      JD Vance and President Trump’s Sons Hype Bitcoin at Las Vegas Conference

      AI may already be shrinking entry-level jobs in tech, new research suggests

      Today’s NYT Strands Hints, Answer and Help for May 26 #449

      LiberNovo Omni: The World’s First Dynamic Ergonomic Chair

    • Technology

      A Replit employee details a critical security flaw in web apps created using AI-powered app builder Lovable that exposes API keys and personal info of app users (Reed Albergotti/Semafor)

      Gemini in Google Drive can now help you skip watching that painfully long Zoom meeting

      Apple iPhone exports from China to the US fall 76% as India output surges

      Today’s NYT Wordle Hints, Answer and Help for May 26, #1437

      5 Skills Kids (and Adults) Need in an AI World – O’Reilly

    • Gadgets

      Future-proof your career by mastering AI skills for just $20

      8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

      Google Home is getting deeper Gemini integration and a new widget

      Google Announces AI Ultra Subscription Plan With Premium Features

      Google shows off Android XR-based glasses, announces Warby Parker team-up

    • Mobile

      Microsoft is done being subtle – this new tool screams “upgrade now”

      Wallpaper Wednesday: Android wallpapers 2025-05-28

      Google can make smart glasses accessible with Warby Parker, Gentle Monster deals

      vivo T4 Ultra specs leak

      Forget screens: more details emerge on the mysterious Jony Ive + OpenAI device

    • Science

      Analysts Say Trump Trade Wars Would Harm the Entire US Energy Sector, From Oil to Solar

      Do we have free will? Quantum experiments may soon reveal the answer

      Was Planet Nine exiled from the solar system as a baby?

      How farmers can help rescue water-loving birds

      A trip to the farm where loofahs grow on vines

    • AI

      Rationale engineering generates a compact new tool for gene therapy | Ztoog

      The AI Hype Index: College students are hooked on ChatGPT

      Learning how to predict rare kinds of failures | Ztoog

      Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

      AI learns how vision and sound are connected, without human intervention | Ztoog

    • Crypto

      GameStop bought $500 million of bitcoin

      CoinW Teams Up with Superteam Europe to Conclude Solana Hackathon and Accelerate Web3 Innovation in Europe

      Ethereum Net Flows Turn Negative As Bulls Push For $3,500

      Bitcoin’s Power Compared To Nuclear Reactor By Brazilian Business Leader

      Senate advances GENIUS Act after cloture vote passes

    Ztoog
    Home » Sites scramble to block ChatGPT web crawler after instructions emerge
    Gadgets

    Sites scramble to block ChatGPT web crawler after instructions emerge

    Facebook Twitter Pinterest WhatsApp
    Sites scramble to block ChatGPT web crawler after instructions emerge
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Without announcement, OpenAI lately added particulars about its web crawler, GPTBot, to its on-line documentation website. GPTBot is the title of the person agent that the corporate makes use of to retrieve webpages to practice the AI fashions behind ChatGPT, similar to GPT-4. Earlier this week, some websites rapidly introduced their intention to block GPTBot’s entry to their content material.

    In the brand new documentation, OpenAI says that webpages crawled with GPTBot “could doubtlessly be used to enhance future fashions,” and that permitting GPTBot to entry your website “may also help AI fashions grow to be extra correct and enhance their normal capabilities and security.”

    OpenAI claims it has applied filters making certain that sources behind paywalls, these gathering personally identifiable data, or any content material violating OpenAI’s insurance policies won’t be accessed by GPTBot.

    News of having the ability to doubtlessly block OpenAI’s coaching scrapes (in the event that they honor them) comes too late to have an effect on ChatGPT or GPT-4’s present coaching information, which was scraped with out announcement years in the past. OpenAI collected the information ending in September 2021, which is the present “data” cutoff for OpenAI’s language fashions.

    It’s value noting that the brand new instructions may not forestall web-browsing variations of ChatGPT or ChatGPT plugins from accessing present web sites to relay up-to-date data to the person. That level was not spelled out within the documentation, and we reached out to OpenAI for clarification.

    The reply lies with robots.txt

    According to OpenAI’s documentation, GPTBot will likely be identifiable by the person agent token “GPTBot,” with its full string being “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; appropriate; GPTBot/1.0; +https://openai.com/gptbot)”.

    The OpenAI docs additionally give instructions about how to block GPTBot from crawling web sites utilizing the industry-standard robots.txt file, which is a textual content file that sits on the root listing of an internet site and instructs web crawlers (similar to these utilized by serps) not to index the location.

    Advertisement

    It’s as straightforward as including these two strains to a website’s robots.txt file:

    User-agent: GPTBot
    Disallow: /

    OpenAI additionally says that admins can limit GPTBot from sure elements of the location in robots.txt with completely different tokens:

    User-agent: GPTBot
    Allow: /directory-1/
    Disallow: /directory-2/

    Additionally, OpenAI has offered the precise IP deal with blocks from which the GPTBot will likely be working, which might be blocked by firewalls as effectively.

    Despite this feature, blocking GPTBot won’t assure {that a} website’s information doesn’t find yourself coaching all AI fashions of the longer term. Aside from problems with scrapers ignoring robots.txt recordsdata, there are different massive information units of scraped web sites (similar to The Pile) that aren’t affiliated with OpenAI. These information units are generally used to practice open supply (or source-available) LLMs similar to Meta’s Llama 2.

    Some websites react with haste

    While wildly profitable from a tech perspective, ChatGPT has additionally been controversial by the way it scraped copyrighted information with out permission and concentrated that worth right into a industrial product that circumvents the standard on-line publication mannequin. OpenAI has been accused of (and sued for) plagiarism alongside these strains.

    Accordingly, it isn’t stunning to see some individuals react to the information of having the ability to doubtlessly block their content material from future GPT fashions with a form of pent-up relish. For instance, on Tuesday, VentureBeat famous that The Verge, Substack author Casey Newton, and Neil Clarke of Clarkesworld, all stated they’d block GPTBot quickly after information of the bot broke.

    But for big web site operators, the selection to block massive language mannequin (LLM) crawlers is not as straightforward as it might appear. Making some LLMs blind to sure web site information will go away gaps of information that would serve some websites very effectively (similar to websites that do not need to lose guests if ChatGPT provides their data for them), however it might additionally damage others. For instance, blocking content material from future AI fashions might lower a website’s or a model’s cultural footprint if AI chatbots grow to be a main person interface sooner or later. As a thought experiment, think about a web based enterprise declaring that it did not need its web site listed by Google within the 12 months 2002—a self-defeating transfer when that was the preferred on-ramp for locating data on-line.

    It’s nonetheless early within the generative AI recreation, and regardless of which manner expertise goes—or which particular person websites try to choose out of AI mannequin coaching—a minimum of OpenAI is offering the choice.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    Gadgets

    Future-proof your career by mastering AI skills for just $20

    Gadgets

    8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed

    Gadgets

    Google Home is getting deeper Gemini integration and a new widget

    Gadgets

    Google Announces AI Ultra Subscription Plan With Premium Features

    Gadgets

    Google shows off Android XR-based glasses, announces Warby Parker team-up

    Gadgets

    The market’s down, but this OpenAI for the stock market can help you trade up

    Gadgets

    We Hand-Picked the 24 Best Deals From the 2025 REI Anniversary Sale

    Gadgets

    “Google wanted that”: Nextcloud decries Android permissions as “gatekeeping”

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    DoorDash sued for charging iPhone users more than Android users because they earn more

    DoorDash has been sued for charging iPhone users more than Android users, studies Business Insider.Filed by…

    Crypto

    Bitcoin ETFs Slow: BlackRock’s IBIT Streak Ends, Fidelity Sees Outflows

    This week witnessed a notable shift within the momentum of two of probably the most…

    Crypto

    Top 5 Events That Shook Up The Crypto Industry In 2023

    In the yr 2023, the crypto business witnessed a collection of occasions that left a…

    Technology

    How To Come Back After A Layoff

    This article is crossposted from IEEE Spectrum’s careers publication. Sign up now to get insider…

    Crypto

    Play Our 2023 Ztoog Pub Quiz!

    From the collapse and rebirth of Silicon Valley Bank to the indictment of Sam Bankman-Fried,…

    Our Picks
    Mobile

    Best Galaxy S24 Ultra screen protectors

    The Future

    Cruise ceases robotaxi operations, the Apple Watch gets a new feature and Carta tries to head off bad press

    Gadgets

    Swatch x Blancpain Scuba Fifty Fathom: price, availability, specs

    Categories
    • AI (1,493)
    • Crypto (1,753)
    • Gadgets (1,805)
    • Mobile (1,850)
    • Science (1,866)
    • Technology (1,802)
    • The Future (1,648)
    Most Popular
    Crypto

    Crypto enforcers wielded a heavy hand this year, but don’t expect it to get softer in 2024

    Crypto

    Worldcoin faces pivotal EU privacy decision within weeks

    Science

    Fluffy exoplanet blasted by its sun has clouds that rain sand

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.