Close Menu
Ztoog
    What's Hot
    Gadgets

    Omega 2024 White Speedmaster Moonwatch: Specs, Price, Availability

    Technology

    Ilya Sutskever, OpenAI Co-Founder Who Helped Oust Sam Altman, Starts His Own Company

    Crypto

    Bitcoin Enthusiast Javier Milei Secures Win In Argentina’s Presidential Contest

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Reality TV Star-Senate Candidate Claims He Intentionally Got Caught Insider Trading on Kalshi to Make a Point

      Once close enough for an acquisition, Stripe and Airwallex are now going after each other

      Is Resume Genius Legit? Pricing, Features, and Cancellation Policy

      Workforce analytics vs HR analytics: What’s the difference?

      Tomás Palacios named director of the Institute for Soldier Nanotechnologies | Ztoog

    • Technology

      Today’s NYT Mini Crossword Answers for April 18

      Soft Photonic Switch Could Drive All‑Optical Logic

      Iran war: Why Trump’s defense secretary keeps talking about “lethality”

      CFTC and DOJ sue states over prediction markets regulation dispute

      De-fi platform Drift suspends deposits and withdrawals after millions in crypto stolen in hack

    • Gadgets

      Fitbit Enhances Sleep Score With Deep Analytics And Digital Coaching

      Google shoehorned Rust into Pixel 10 modem to make legacy code safer

      Samsung Galaxy A37 And A57 5G Launch In The US: Affordable Pricing And Several AI-powered tools

      LG’s spring sale at Home Depot Cuts Up to 43% Off Ranges, Refrigerators, and Washers

      Ring Promo Codes and Discounts: Up to 50% Off

    • Mobile

      Oppo Find X9 Ultra teardown video goes in-depth with every component

      T-Mobile tells stunned subscriber that T-Force reps are human, not AI

      We asked, you answered: Android users pick between gestures and 3-button navigation, and the top choice might surprise you

      Honor Earbuds 4 unboxing and hands-on

      Sorry everyone, but you need to stop copying Apple already

    • Science

      The rise, the fall and the rebound of cyclic cosmology

      After a saga of broken promises, a European rover finally has a ride to Mars

      $50,000 rare coin hunt will take over San Francisco

      Artemis II Astronauts Safely Return to Earth After Historic Flight Around the Moon

      How a century-long argument over light’s true nature came to an end

    • AI

      Google Introduces Simula: A Reasoning-First Framework for Generating Controllable, Scalable Synthetic Datasets Across Specialized AI Domains

      Treating enterprise AI as an operating layer

      A philosophy of work | Ztoog

      Enabling agent-first process redesign | MIT Technology Review

      Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All

    • Crypto

      Danger Zone Or Entry Point?

      Analyst Shares ‘Realistic’ Ethereum Price Targets For The Next 3 Years

      Is April 13 The Best Time To Buy Bitcoin? Analyst Shares The Best Strategy For Getting The Most Profits

      Trump warns Iran of catastrophe without deal in 12 hours

      Bitcoin On-Chain Data Hints At Macro Bottom Near $47,960

    Ztoog
    Home » How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    AI

    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution

    Facebook Twitter Pinterest WhatsApp
    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    In this tutorial, we show how we use Ibis to construct a transportable, in-database characteristic engineering pipeline that appears and looks like Pandas however executes completely contained in the database. We present how we join to DuckDB, register knowledge safely contained in the backend, and outline advanced transformations utilizing window features and aggregations with out ever pulling uncooked knowledge into native reminiscence. By protecting all transformations lazy and backend-agnostic, we show how to write analytics code as soon as in Python and depend on Ibis to translate it into environment friendly SQL. Check out the FULL CODES right here.

    !pip -q set up "ibis-framework[duckdb,examples]" duckdb pyarrow pandas
    
    
    import ibis
    from ibis import _
    
    
    print("Ibis model:", ibis.__version__)
    
    
    con = ibis.duckdb.join()
    ibis.choices.interactive = True

    We set up the required libraries and initialize the Ibis surroundings. We set up a DuckDB connection and allow interactive execution so that each one subsequent operations stay lazy and backend-driven. Check out the FULL CODES right here.

    attempt:
       base_expr = ibis.examples.penguins.fetch(backend=con)
    besides TypeError:
       base_expr = ibis.examples.penguins.fetch()
    
    
    if "penguins" not in con.list_tables():
       attempt:
           con.create_table("penguins", base_expr, overwrite=True)
       besides Exception:
           con.create_table("penguins", base_expr.execute(), overwrite=True)
    
    
    t = con.desk("penguins")
    print(t.schema())

    We load the Penguins dataset and explicitly register it contained in the DuckDB catalog to guarantee it’s obtainable for SQL execution. We confirm the desk schema and verify that the information now lives contained in the database somewhat than in native reminiscence. Check out the FULL CODES right here.

    def penguin_feature_pipeline(penguins):
       base = penguins.mutate(
           bill_ratio=_.bill_length_mm / _.bill_depth_mm,
           is_male=(_.intercourse == "male").ifelse(1, 0),
       )
    
    
       cleaned = base.filter(
           _.bill_length_mm.notnull()
           & _.bill_depth_mm.notnull()
           & _.body_mass_g.notnull()
           & _.flipper_length_mm.notnull()
           & _.species.notnull()
           & _.island.notnull()
           & _.12 months.notnull()
       )
    
    
       w_species = ibis.window(group_by=[cleaned.species])
       w_island_year = ibis.window(
           group_by=[cleaned.island],
           order_by=[cleaned.year],
           previous=2,
           following=0,
       )
    
    
       feat = cleaned.mutate(
           species_avg_mass=cleaned.body_mass_g.imply().over(w_species),
           species_std_mass=cleaned.body_mass_g.std().over(w_species),
           mass_z=(
               cleaned.body_mass_g
               - cleaned.body_mass_g.imply().over(w_species)
           ) / cleaned.body_mass_g.std().over(w_species),
           island_mass_rank=cleaned.body_mass_g.rank().over(
               ibis.window(group_by=[cleaned.island])
           ),
           rolling_3yr_island_avg_mass=cleaned.body_mass_g.imply().over(
               w_island_year
           ),
       )
    
    
       return feat.group_by(["species", "island", "year"]).agg(
           n=feat.depend(),
           avg_mass=feat.body_mass_g.imply(),
           avg_flipper=feat.flipper_length_mm.imply(),
           avg_bill_ratio=feat.bill_ratio.imply(),
           avg_mass_z=feat.mass_z.imply(),
           avg_rolling_3yr_mass=feat.rolling_3yr_island_avg_mass.imply(),
           pct_male=feat.is_male.imply(),
       ).order_by(["species", "island", "year"])

    We outline a reusable characteristic engineering pipeline utilizing pure Ibis expressions. We compute derived options, apply knowledge cleansing, and use window features and grouped aggregations to construct superior, database-native options whereas protecting your entire pipeline lazy. Check out the FULL CODES right here.

    options = penguin_feature_pipeline(t)
    print(con.compile(options))
    
    
    attempt:
       df = options.to_pandas()
    besides Exception:
       df = options.execute()
    
    
    show(df.head())

    We invoke the characteristic pipeline and compile it into DuckDB SQL to validate that each one transformations are pushed down to the database. We then run the pipeline and return solely the ultimate aggregated outcomes for inspection. Check out the FULL CODES right here.

    con.create_table("penguin_features", options, overwrite=True)
    
    
    feat_tbl = con.desk("penguin_features")
    
    
    attempt:
       preview = feat_tbl.restrict(10).to_pandas()
    besides Exception:
       preview = feat_tbl.restrict(10).execute()
    
    
    show(preview)
    
    
    out_path = "/content material/penguin_features.parquet"
    con.raw_sql(f"COPY penguin_features TO '{out_path}' (FORMAT PARQUET);")
    print(out_path)

    We materialize the engineered options as a desk instantly inside DuckDB and question it lazily for verification. We additionally export the outcomes to a Parquet file, demonstrating how we will hand off database-computed options to downstream analytics or machine studying workflows.

    In conclusion, we constructed, compiled, and executed a sophisticated characteristic engineering workflow absolutely inside DuckDB utilizing Ibis. We demonstrated how to examine the generated SQL, materialized outcomes instantly within the database, and exported them for downstream use whereas preserving portability throughout analytical backends. This strategy reinforces the core thought behind Ibis: we maintain computation shut to the information, decrease pointless knowledge motion, and keep a single, reusable Python codebase that scales from native experimentation to manufacturing databases.


    Check out the FULL CODES right here. Also, be happy to observe us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you may be a part of us on telegram as effectively.

    Check out our newest launch of ai2025.dev, a 2025-focused analytics platform that turns mannequin launches, benchmarks, and ecosystem exercise right into a structured dataset you may filter, evaluate, and export.


    Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Artificial Intelligence for social good. His most up-to-date endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

    ztoog.com

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Google Introduces Simula: A Reasoning-First Framework for Generating Controllable, Scalable Synthetic Datasets Across Specialized AI Domains

    AI

    Treating enterprise AI as an operating layer

    AI

    A philosophy of work | Ztoog

    AI

    Enabling agent-first process redesign | MIT Technology Review

    AI

    Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All

    AI

    Evaluating the ethics of autonomous systems | Ztoog

    AI

    This startup wants to change how mathematicians do math

    AI

    Meta Releases TRIBE v2: A Brain Encoding Model That Predicts fMRI Responses Across Video, Audio, and Text Stimuli

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    WhatsApp will get reverse image search

    WhatsApp is engaged on a brand new characteristic which is now out there in its…

    Crypto

    Robinhood Faces $16 Million Whale Exodus

    Despite the truth that the cryptocurrency market is at all times altering, Dogecoin whales have…

    Crypto

    Trump Jr. wants to ‘make finance great again’ with new DeFi project

    Key Takeaways The DeFiant Ones DeFi platform was introduced by Donald Trump Jr. to overhaul…

    Crypto

    Cathie Wood’s ARK Invest Joins Ethereum Futures ETF Race After Spot Bitcoin ETF Delay

    The US Securities and Exchange Commission (SEC) had on August 11 moved to delay its…

    The Future

    Wearable Technology: From Fitness to Healthcare Companion

    In an period dominated by technological innovation, the evolution of wearable expertise has transcended its…

    Our Picks
    The Future

    NFTs died a slow, painful death in 2023 as most are now worthless

    Technology

    TikTok’s Lip Gloss Trick – The New York Times

    Science

    Why Antidepressants Take So Long to Work

    Categories
    • AI (1,572)
    • Crypto (1,840)
    • Gadgets (1,879)
    • Mobile (1,921)
    • Science (1,952)
    • Technology (1,872)
    • The Future (1,727)
    Most Popular
    The Future

    Formula E team caught using RFID scanner that could grab live tire data from other cars

    Technology

    Samsung finally gives One UI 6 to its older foldable Galaxy phones

    Science

    Annular eclipse: How to spot October 2023’s ‘ring of fire’ solar eclipse across the Americas

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.