Close Menu
Ztoog
    What's Hot
    Crypto

    Bitcoin Bulls Bet on Weaker Dollar for Rally Extension

    Technology

    Google’s Pixel 8 Pro camera is the new mobile photography champ

    Science

    Watch a rocket engine test in ultra-slow motion

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » New method accelerates data retrieval in huge databases | Ztoog
    AI

    New method accelerates data retrieval in huge databases | Ztoog

    Facebook Twitter Pinterest WhatsApp
    New method accelerates data retrieval in huge databases | Ztoog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Hashing is a core operation in most on-line databases, like a library catalogue or an e-commerce web site. A hash operate generates codes that straight decide the situation the place data can be saved. So, utilizing these codes, it’s simpler to search out and retrieve the data.

    However, as a result of conventional hash capabilities generate codes randomly, generally two items of data could be hashed with the identical worth. This causes collisions — when looking for one merchandise factors a person to many items of data with the identical hash worth. It takes for much longer to search out the best one, ensuing in slower searches and diminished efficiency.

    Certain forms of hash capabilities, often called excellent hash capabilities, are designed to position the data in a approach that stops collisions. But they’re time-consuming to assemble for every dataset and take extra time to compute than conventional hash capabilities.

    Since hashing is used in so many purposes, from database indexing to data compression to cryptography, quick and environment friendly hash capabilities are vital. So, researchers from MIT and elsewhere got down to see if they might use machine studying to construct higher hash capabilities.

    They discovered that, in sure conditions, utilizing discovered fashions as an alternative of conventional hash capabilities might end result in half as many collisions. These discovered fashions are created by operating a machine-learning algorithm on a dataset to seize particular traits. The workforce’s experiments additionally confirmed that discovered fashions have been usually extra computationally environment friendly than excellent hash capabilities.

    “What we found in this work is that in some situations we can come up with a better tradeoff between the computation of the hash function and the collisions we will face. In these situations, the computation time for the hash function can be increased a bit, but at the same time its collisions can be reduced very significantly,” says Ibrahim Sabek, a postdoc in the MIT Data Systems Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Their analysis, which will likely be introduced on the 2023 International Conference on Very Large Databases, demonstrates how a hash operate could be designed to considerably velocity up searches in a huge database. For occasion, their approach might speed up computational programs that scientists use to retailer and analyze DNA, amino acid sequences, or different organic data.

    Sabek is the co-lead creator of the paper with Department of Electrical Engineering and Computer Science (EECS) graduate pupil Kapil Vaidya. They are joined by co-authors Dominik Horn, a graduate pupil on the Technical University of Munich; Andreas Kipf, an MIT postdoc; Michael Mitzenmacher, professor of pc science on the Harvard John A. Paulson School of Engineering and Applied Sciences; and senior creator Tim Kraska, affiliate professor of EECS at MIT and co-director of the Data, Systems, and AI Lab.

    Hashing it out

    Given a data enter, or key, a conventional hash operate generates a random quantity, or code, that corresponds to the slot the place that key will likely be saved. To use a easy instance, if there are 10 keys to be put into 10 slots, the operate would generate an integer between 1 and 10 for every enter. It is extremely possible that two keys will find yourself in the identical slot, inflicting collisions.

    Perfect hash capabilities present a collision-free various. Researchers give the operate some additional information, such because the variety of slots the data are to be positioned into. Then it might carry out further computations to determine the place to place every key to keep away from collisions. However, these added computations make the operate more durable to create and fewer environment friendly.

    “We were wondering, if we know more about the data — that it will come from a particular distribution — can we use learned models to build a hash function that can actually reduce collisions?” Vaidya says.

    A data distribution reveals all doable values in a dataset, and the way usually every worth happens. The distribution can be utilized to calculate the likelihood {that a} explicit worth is in a data pattern.

    The researchers took a small pattern from a dataset and used machine studying to approximate the form of the data’s distribution, or how the data are unfold out. The discovered mannequin then makes use of the approximation to foretell the situation of a key in the dataset.

    They discovered that discovered fashions have been simpler to construct and sooner to run than excellent hash capabilities and that they led to fewer collisions than conventional hash capabilities if data are distributed in a predictable approach. But if the data will not be predictably distributed as a result of gaps between data factors differ too broadly, utilizing discovered fashions may trigger extra collisions.

    “We may have a huge number of data inputs, and the gaps between consecutive inputs are very different, so learning a model to capture the data distribution of these inputs is quite difficult,” Sabek explains.

    Fewer collisions, sooner outcomes

    When data have been predictably distributed, discovered fashions might cut back the ratio of colliding keys in a dataset from 30 % to fifteen %, in contrast with conventional hash capabilities. They have been additionally capable of obtain higher throughput than excellent hash capabilities. In the very best instances, discovered fashions diminished the runtime by practically 30 %.

    As they explored the usage of discovered fashions for hashing, the researchers additionally discovered that throughput was impacted most by the variety of sub-models. Each discovered mannequin consists of smaller linear fashions that approximate the data distribution for various components of the data. With extra sub-models, the discovered mannequin produces a extra correct approximation, however it takes extra time.

    “At a certain threshold of sub-models, you get enough information to build the approximation that you need for the hash function. But after that, it won’t lead to more improvement in collision reduction,” Sabek says.

    Building off this evaluation, the researchers wish to use discovered fashions to design hash capabilities for different forms of data. They additionally plan to discover discovered hashing for databases in which data could be inserted or deleted. When data are up to date in this fashion, the mannequin wants to vary accordingly, however altering the mannequin whereas sustaining accuracy is a troublesome drawback.

    “We want to encourage the community to use machine learning inside more fundamental data structures and algorithms. Any kind of core data structure presents us with an opportunity to use machine learning to capture data properties and get better performance. There is still a lot we can explore,” Sabek says.

    “Hashing and indexing functions are core to a lot of database functionality. Given the variety of users and use cases, there is no one size fits all hashing, and learned models help adapt the database to a specific user. This paper is a great balanced analysis of the feasibility of these new techniques and does a good job of talking rigorously about the pros and cons, and helps us build our understanding of when such methods can be expected to work well,” says Murali Narayanaswamy, a principal machine studying scientist at Amazon, who was not concerned with this work. “Exploring these kinds of enhancements is an exciting area of research both in academia and industry, and the kind of rigor shown in this work is critical for these methods to have large impact.”

    This work was supported, in half, by Google, Intel, Microsoft, the U.S. National Science Foundation, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Technology

    Nvidia Research announces Eureka, a new AI agent powered by OpenAI's GPT-4 to autonomously write reward algorithms and teach robots complex skills (Sharon Goldman/VentureBeat)

    Sharon Goldman / VentureBeat: Nvidia Research announces Eureka, a new AI agent powered by OpenAI’s…

    Mobile

    Suunto Race review: Near the finish line

    I acquired a Suunto Race evaluate unit in mid-May, just for Suunto to launch a…

    The Future

    OPPO Unveils Reno12 Series and Watch X in Australia: AI-Powered Innovation at Your Fingertips

    Leading expertise model OPPO has introduced the upcoming launch of the OPPO Reno12 Series and…

    Gadgets

    Google May Delete Your Old Accounts. Here’s How to Stop It

    On May 16, Google introduced new plans to delete accounts which have been inactive for…

    Gadgets

    Transform your communication skills with this top-rated ASL bundle, now $20

    We might earn income from the merchandise accessible on this web page and take part…

    Our Picks
    Science

    What would signal life on another planet?

    Science

    These are all of the missions heading to the moon in 2025

    Mobile

    Apple executive involved in multi-touch, Touch ID, Face ID, and Vision Pro is leaving

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    Crypto

    Potential rejection of Ethereum spot ETFs is not a major setback, says expert

    Gadgets

    7 Best TV Deals to Catch Up on Oscar-Nominated Films (or the Super Bowl)

    Gadgets

    Man Spends Over 100 Days Underwater And Emerges 10 Years Younger

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.