Close Menu
Ztoog
    What's Hot
    The Future

    Iran Launches ‘Pars 1’ Imaging Satellite Into Orbit From Russia

    Gadgets

    Ubergizmo’s Best of CES 2024

    Science

    Mysterious marsquake had a surprising source

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      How to Make Money Online in 2026: The Art of the Obscure

      Link Building in 2026: A Desperate, Last-Ditch Guide for the Terminally Online

      ‘Smoke Weed and Earn Bitcoin’ With This Vape Pen in Our Increasingly Dystopian Nightmare

      Everything Google announced at its Android Show, from Googlebooks to vibe-coded widgets

      CapCut Vs InShot: Which is the Best Video Editing Tool?

    • Technology

      IEEE Society ‘s Pitch Sessions Link Lab With Market

      Britain launches coordinated taskforce targeting illegal gambling payments advertising and operators

      Marc Lore says that AI will soon enable anyone open a restaurant

      Snapdragon 8 Elite Gen 5 vs Dimensity 9500: The performance gap shrinks

      Today’s NYT Mini Crossword Answers for April 18

    • Gadgets

      How to Eliminate Smoke Smells from Furniture

      The 2026 Gadget Odyssey: An Honest Take on Tech That Actually Works

      AcuRite Explains Why It Is Discontinuing Its Legacy App

      Backup all your emails in one place with Mail Backup X

      Asus Zenbook A16 (2026) Review: Savor the Power, Ignore the Beige

    • Mobile

      Leaked Internal memo from T-Mobile COO Freier reveals official date when T-Mobile goes 100% digital

      Android 17 creator features bring AI editing, Premiere, and better Instagram uploads

      Oppo Enco Clip2 unboxing and hands-on

      The app Splitwise is the best hack to split group trip expenses in 2026

      Oppo Find X9 Ultra teardown video goes in-depth with every component

    • Science

      Whatever the mirror test tells us, beluga whales pass it

      Ready to hunt some enormous snakes? The Florida Python Challenge returns.

      The First Atomic Bomb Test in 1945 Created an Entirely New Material

      Pressure from individual particles measured for the first time

      The problem of cosmic inflation and how to solve it

    • AI

      The Great AI Bake-Off of 2026: Why Your Chatbot is a Genius (And Also Thirsty)

      Google I/O showed how the path for AI-driven science is shifting

      Two from MIT named 2026 Knight-Hennessy Scholars | Ztoog

      Establishing AI and data sovereignty in the age of autonomous systems

      Study: Firms often use automation to control certain workers’ wages | Ztoog

    • Crypto

      The Great Crypto Unravelling: Tea, Sympathy, and £1.5 Billion Down the Drain

      American Mega Bank Is Dumping Its Ethereum Holdings, Here’s What It’s Buying

      Bitcoin’s Social Euphoria Hits Annual Peak Due To CLARITY Act, But History Says Caution Is Warranted

      Anthropic warns investors to avoid unauthorized secondary market sellers

      Binance Founder CZ Sees Major Changes Ahead For Crypto

    Ztoog
    Home » How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    AI

    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution

    Facebook Twitter Pinterest WhatsApp
    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    In this tutorial, we show how we use Ibis to construct a transportable, in-database characteristic engineering pipeline that appears and looks like Pandas however executes completely contained in the database. We present how we join to DuckDB, register knowledge safely contained in the backend, and outline advanced transformations utilizing window features and aggregations with out ever pulling uncooked knowledge into native reminiscence. By protecting all transformations lazy and backend-agnostic, we show how to write analytics code as soon as in Python and depend on Ibis to translate it into environment friendly SQL. Check out the FULL CODES right here.

    !pip -q set up "ibis-framework[duckdb,examples]" duckdb pyarrow pandas
    
    
    import ibis
    from ibis import _
    
    
    print("Ibis model:", ibis.__version__)
    
    
    con = ibis.duckdb.join()
    ibis.choices.interactive = True

    We set up the required libraries and initialize the Ibis surroundings. We set up a DuckDB connection and allow interactive execution so that each one subsequent operations stay lazy and backend-driven. Check out the FULL CODES right here.

    attempt:
       base_expr = ibis.examples.penguins.fetch(backend=con)
    besides TypeError:
       base_expr = ibis.examples.penguins.fetch()
    
    
    if "penguins" not in con.list_tables():
       attempt:
           con.create_table("penguins", base_expr, overwrite=True)
       besides Exception:
           con.create_table("penguins", base_expr.execute(), overwrite=True)
    
    
    t = con.desk("penguins")
    print(t.schema())

    We load the Penguins dataset and explicitly register it contained in the DuckDB catalog to guarantee it’s obtainable for SQL execution. We confirm the desk schema and verify that the information now lives contained in the database somewhat than in native reminiscence. Check out the FULL CODES right here.

    def penguin_feature_pipeline(penguins):
       base = penguins.mutate(
           bill_ratio=_.bill_length_mm / _.bill_depth_mm,
           is_male=(_.intercourse == "male").ifelse(1, 0),
       )
    
    
       cleaned = base.filter(
           _.bill_length_mm.notnull()
           & _.bill_depth_mm.notnull()
           & _.body_mass_g.notnull()
           & _.flipper_length_mm.notnull()
           & _.species.notnull()
           & _.island.notnull()
           & _.12 months.notnull()
       )
    
    
       w_species = ibis.window(group_by=[cleaned.species])
       w_island_year = ibis.window(
           group_by=[cleaned.island],
           order_by=[cleaned.year],
           previous=2,
           following=0,
       )
    
    
       feat = cleaned.mutate(
           species_avg_mass=cleaned.body_mass_g.imply().over(w_species),
           species_std_mass=cleaned.body_mass_g.std().over(w_species),
           mass_z=(
               cleaned.body_mass_g
               - cleaned.body_mass_g.imply().over(w_species)
           ) / cleaned.body_mass_g.std().over(w_species),
           island_mass_rank=cleaned.body_mass_g.rank().over(
               ibis.window(group_by=[cleaned.island])
           ),
           rolling_3yr_island_avg_mass=cleaned.body_mass_g.imply().over(
               w_island_year
           ),
       )
    
    
       return feat.group_by(["species", "island", "year"]).agg(
           n=feat.depend(),
           avg_mass=feat.body_mass_g.imply(),
           avg_flipper=feat.flipper_length_mm.imply(),
           avg_bill_ratio=feat.bill_ratio.imply(),
           avg_mass_z=feat.mass_z.imply(),
           avg_rolling_3yr_mass=feat.rolling_3yr_island_avg_mass.imply(),
           pct_male=feat.is_male.imply(),
       ).order_by(["species", "island", "year"])

    We outline a reusable characteristic engineering pipeline utilizing pure Ibis expressions. We compute derived options, apply knowledge cleansing, and use window features and grouped aggregations to construct superior, database-native options whereas protecting your entire pipeline lazy. Check out the FULL CODES right here.

    options = penguin_feature_pipeline(t)
    print(con.compile(options))
    
    
    attempt:
       df = options.to_pandas()
    besides Exception:
       df = options.execute()
    
    
    show(df.head())

    We invoke the characteristic pipeline and compile it into DuckDB SQL to validate that each one transformations are pushed down to the database. We then run the pipeline and return solely the ultimate aggregated outcomes for inspection. Check out the FULL CODES right here.

    con.create_table("penguin_features", options, overwrite=True)
    
    
    feat_tbl = con.desk("penguin_features")
    
    
    attempt:
       preview = feat_tbl.restrict(10).to_pandas()
    besides Exception:
       preview = feat_tbl.restrict(10).execute()
    
    
    show(preview)
    
    
    out_path = "/content material/penguin_features.parquet"
    con.raw_sql(f"COPY penguin_features TO '{out_path}' (FORMAT PARQUET);")
    print(out_path)

    We materialize the engineered options as a desk instantly inside DuckDB and question it lazily for verification. We additionally export the outcomes to a Parquet file, demonstrating how we will hand off database-computed options to downstream analytics or machine studying workflows.

    In conclusion, we constructed, compiled, and executed a sophisticated characteristic engineering workflow absolutely inside DuckDB utilizing Ibis. We demonstrated how to examine the generated SQL, materialized outcomes instantly within the database, and exported them for downstream use whereas preserving portability throughout analytical backends. This strategy reinforces the core thought behind Ibis: we maintain computation shut to the information, decrease pointless knowledge motion, and keep a single, reusable Python codebase that scales from native experimentation to manufacturing databases.


    Check out the FULL CODES right here. Also, be happy to observe us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you may be a part of us on telegram as effectively.

    Check out our newest launch of ai2025.dev, a 2025-focused analytics platform that turns mannequin launches, benchmarks, and ecosystem exercise right into a structured dataset you may filter, evaluate, and export.


    Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Artificial Intelligence for social good. His most up-to-date endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

    ztoog.com

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    The Great AI Bake-Off of 2026: Why Your Chatbot is a Genius (And Also Thirsty)

    AI

    Google I/O showed how the path for AI-driven science is shifting

    Science

    Ready to hunt some enormous snakes? The Florida Python Challenge returns.

    AI

    Two from MIT named 2026 Knight-Hennessy Scholars | Ztoog

    AI

    Establishing AI and data sovereignty in the age of autonomous systems

    AI

    Study: Firms often use automation to control certain workers’ wages | Ztoog

    AI

    A blueprint for using AI to strengthen democracy

    AI

    Sakana AI Introduces KAME: A Tandem Speech-to-Speech Architecture That Injects LLM Knowledge in Real Time

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    Decoding the Test Automation Pyramid: A Comprehensive Guide

    Test automation has turn out to be an actual savior for software program builders in…

    Crypto

    Ethereum (ETH) Beacon Chain Shatters Records With $7.7 Billion Inflows

    Ethereum’s (ETH) Beacon Chain has seen important inflows since staking withdrawals had been enabled on…

    Technology

    Mercedes jumps into the ChatGPT fray and Toyota plays catch-up

    The Station is a weekly e-newsletter devoted to all issues transportation. Sign up right here — simply…

    The Future

    UK probes Amazon and Microsoft over AI partnerships with Mistral, Anthropic, and Inflection

    The U.Okay.’s Competition and Markets Authority (CMA) is launching preliminary enquiries into whether or not…

    AI

    Advances in document understanding – Google Research Blog

    Posted by Sandeep Tata, Software Engineer, Google Research, Athena Team

    Our Picks
    Gadgets

    Our Favorite Garmin Smartwatches Are on Sale

    The Future

    Epic Games Store and Fortnite are coming to iPhones in 2024

    Mobile

    A VR headset isn’t going to bring Huawei back from the dead

    Categories
    • AI (1,581)
    • Crypto (1,849)
    • Gadgets (1,885)
    • Mobile (1,924)
    • Science (1,960)
    • Technology (1,876)
    • The Future (1,734)
    Most Popular
    AI

    Top Encrypted Email Services in 2023

    Technology

    Netflix competitors expected to lose over $5 billion this year

    Science

    JWST images show off the swirling arms of 19 spiral galaxies

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.