Close Menu
Ztoog
    What's Hot
    The Future

    Samsung’s Galaxy Ring Finally Shows Up in Person

    Technology

    With TikTok Under Fire, Brands That Rely on It Worry

    The Future

    These new innovations are making the smart home easier to control and simpler to set up

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      OPPO launches A5 Pro 5G: Premium features at a budget price

      How I Turn Unstructured PDFs into Revenue-Ready Spreadsheets

      Is it the best tool for 2025?

      The clocks that helped define time from London’s Royal Observatory

      Summer Movies Are Here, and So Are the New Popcorn Buckets

    • Technology

      What It Is and Why It Matters—Part 1 – O’Reilly

      Ensure Hard Work Is Recognized With These 3 Steps

      Cicada map 2025: Where will Brood XIV cicadas emerge this spring?

      Is Duolingo the face of an AI jobs crisis?

      The US DOD transfers its AI-based Open Price Exploration for National Security program to nonprofit Critical Minerals Forum to boost Western supply deals (Ernest Scheyder/Reuters)

    • Gadgets

      Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

      Apple plans to split iPhone 18 launch into two phases in 2026

      Upgrade your desk to Starfleet status with this $95 USB-C hub

      37 Best Graduation Gift Ideas (2025): For College Grads

      Backblaze responds to claims of “sham accounting,” customer backups at risk

    • Mobile

      Motorola’s Moto Watch needs to start living up to the brand name

      Samsung Galaxy S25 Edge promo materials leak

      What are people doing with those free T-Mobile lines? Way more than you’d expect

      Samsung doesn’t want budget Galaxy phones to use exclusive AI features

      COROS’s charging adapter is a neat solution to the smartwatch charging cable problem

    • Science

      Nothing is stronger than quantum connections – and now we know why

      Failed Soviet probe will soon crash to Earth – and we don’t know where

      Trump administration cuts off all future federal funding to Harvard

      Does kissing spread gluten? New research offers a clue.

      Why Balcony Solar Panels Haven’t Taken Off in the US

    • AI

      Hybrid AI model crafts smooth, high-quality videos in seconds | Ztoog

      How to build a better AI benchmark

      Q&A: A roadmap for revolutionizing health care through data-driven innovation | Ztoog

      This data set helps researchers spot harmful stereotypes in LLMs

      Making AI models more trustworthy for high-stakes settings | Ztoog

    • Crypto

      Ethereum Breaks Key Resistance In One Massive Move – Higher High Confirms Momentum

      ‘The Big Short’ Coming For Bitcoin? Why BTC Will Clear $110,000

      Bitcoin Holds Above $95K Despite Weak Blockchain Activity — Analytics Firm Explains Why

      eToro eyes US IPO launch as early as next week amid easing concerns over Trump’s tariffs

      Cardano ‘Looks Dope,’ Analyst Predicts Big Move Soon

    Ztoog
    Home » Sites scramble to block ChatGPT web crawler after instructions emerge
    Gadgets

    Sites scramble to block ChatGPT web crawler after instructions emerge

    Facebook Twitter Pinterest WhatsApp
    Sites scramble to block ChatGPT web crawler after instructions emerge
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Without announcement, OpenAI lately added particulars about its web crawler, GPTBot, to its on-line documentation website. GPTBot is the title of the person agent that the corporate makes use of to retrieve webpages to practice the AI fashions behind ChatGPT, similar to GPT-4. Earlier this week, some websites rapidly introduced their intention to block GPTBot’s entry to their content material.

    In the brand new documentation, OpenAI says that webpages crawled with GPTBot “could doubtlessly be used to enhance future fashions,” and that permitting GPTBot to entry your website “may also help AI fashions grow to be extra correct and enhance their normal capabilities and security.”

    OpenAI claims it has applied filters making certain that sources behind paywalls, these gathering personally identifiable data, or any content material violating OpenAI’s insurance policies won’t be accessed by GPTBot.

    News of having the ability to doubtlessly block OpenAI’s coaching scrapes (in the event that they honor them) comes too late to have an effect on ChatGPT or GPT-4’s present coaching information, which was scraped with out announcement years in the past. OpenAI collected the information ending in September 2021, which is the present “data” cutoff for OpenAI’s language fashions.

    It’s value noting that the brand new instructions may not forestall web-browsing variations of ChatGPT or ChatGPT plugins from accessing present web sites to relay up-to-date data to the person. That level was not spelled out within the documentation, and we reached out to OpenAI for clarification.

    The reply lies with robots.txt

    According to OpenAI’s documentation, GPTBot will likely be identifiable by the person agent token “GPTBot,” with its full string being “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; appropriate; GPTBot/1.0; +https://openai.com/gptbot)”.

    The OpenAI docs additionally give instructions about how to block GPTBot from crawling web sites utilizing the industry-standard robots.txt file, which is a textual content file that sits on the root listing of an internet site and instructs web crawlers (similar to these utilized by serps) not to index the location.

    Advertisement

    It’s as straightforward as including these two strains to a website’s robots.txt file:

    User-agent: GPTBot
    Disallow: /

    OpenAI additionally says that admins can limit GPTBot from sure elements of the location in robots.txt with completely different tokens:

    User-agent: GPTBot
    Allow: /directory-1/
    Disallow: /directory-2/

    Additionally, OpenAI has offered the precise IP deal with blocks from which the GPTBot will likely be working, which might be blocked by firewalls as effectively.

    Despite this feature, blocking GPTBot won’t assure {that a} website’s information doesn’t find yourself coaching all AI fashions of the longer term. Aside from problems with scrapers ignoring robots.txt recordsdata, there are different massive information units of scraped web sites (similar to The Pile) that aren’t affiliated with OpenAI. These information units are generally used to practice open supply (or source-available) LLMs similar to Meta’s Llama 2.

    Some websites react with haste

    While wildly profitable from a tech perspective, ChatGPT has additionally been controversial by the way it scraped copyrighted information with out permission and concentrated that worth right into a industrial product that circumvents the standard on-line publication mannequin. OpenAI has been accused of (and sued for) plagiarism alongside these strains.

    Accordingly, it isn’t stunning to see some individuals react to the information of having the ability to doubtlessly block their content material from future GPT fashions with a form of pent-up relish. For instance, on Tuesday, VentureBeat famous that The Verge, Substack author Casey Newton, and Neil Clarke of Clarkesworld, all stated they’d block GPTBot quickly after information of the bot broke.

    But for big web site operators, the selection to block massive language mannequin (LLM) crawlers is not as straightforward as it might appear. Making some LLMs blind to sure web site information will go away gaps of information that would serve some websites very effectively (similar to websites that do not need to lose guests if ChatGPT provides their data for them), however it might additionally damage others. For instance, blocking content material from future AI fashions might lower a website’s or a model’s cultural footprint if AI chatbots grow to be a main person interface sooner or later. As a thought experiment, think about a web based enterprise declaring that it did not need its web site listed by Google within the 12 months 2002—a self-defeating transfer when that was the preferred on-ramp for locating data on-line.

    It’s nonetheless early within the generative AI recreation, and regardless of which manner expertise goes—or which particular person websites try to choose out of AI mannequin coaching—a minimum of OpenAI is offering the choice.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    Gadgets

    Maono Caster G1 Neo & PD200X Review: Budget Streaming Gear for Aspiring Creators

    Gadgets

    Apple plans to split iPhone 18 launch into two phases in 2026

    Gadgets

    Upgrade your desk to Starfleet status with this $95 USB-C hub

    Gadgets

    37 Best Graduation Gift Ideas (2025): For College Grads

    Gadgets

    Backblaze responds to claims of “sham accounting,” customer backups at risk

    Gadgets

    Snapdragon X Plus Could Bring Faster, More Powerful Chromebooks

    Gadgets

    This AI Tool Can Detect Scams in Photos, Videos and WhatsApp

    Gadgets

    Digital frame maker Aura introduces the Aspen, a $229 frame with more intelligent features

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    Seeing AI as a collaborator, not a creator

    The purpose you’re studying this letter from me in the present day is that I…

    The Future

    Bridging the Gap Between Fiat and Crypto

    In the ever-evolving world of finance and know-how, blockchain cost gateways have emerged as a…

    Crypto

    FLOKI Skyrockets To New All-Time High, Soaring 60% Following Two Major Listings

    Web3 mission and meme coin Floki Inu (FLOKI) has garnered important consideration within the cryptocurrency…

    Technology

    Top 10 Web Development Companies In USA 2023

    As web availability has grown, firms have realized why it’s essential to have and keep…

    AI

    Large sequence models for software development activities – Ztoog

    Posted by Petros Maniatis and Daniel Tarlow, Research Scientists, Google

    Our Picks
    Science

    Elon Musk’s Neuralink puts brain chip in first human amid federal scrutiny

    The Future

    How to Unlock a Samsung Phone

    Technology

    Ztoog Space: Returning to the moon

    Categories
    • AI (1,483)
    • Crypto (1,745)
    • Gadgets (1,796)
    • Mobile (1,840)
    • Science (1,854)
    • Technology (1,790)
    • The Future (1,636)
    Most Popular
    Gadgets

    LG Elevates All-in-one Lineup With New 136-Inch Micro LED Display

    The Future

    Harvest vs Toggl: 2023 detailed comparison

    Mobile

    WiiM Amp review: This all-in-one network streaming amp is incredible

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2025 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.