Close Menu
Ztoog
    What's Hot
    Science

    Humans are living longer than ever no matter where they come from 

    Gadgets

    Fairphone 3 gets seven years of updates, besting every other Android OEM

    AI

    Answering billions of reporting queries each day with low latency – Google Research Blog

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

      Bitcoin Trades Below ETF Cost-Basis As MVRV Signals Mounting Pressure

    Ztoog
    Home » How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    AI

    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution

    Facebook Twitter Pinterest WhatsApp
    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    In this tutorial, we show how we use Ibis to construct a transportable, in-database characteristic engineering pipeline that appears and looks like Pandas however executes completely contained in the database. We present how we join to DuckDB, register knowledge safely contained in the backend, and outline advanced transformations utilizing window features and aggregations with out ever pulling uncooked knowledge into native reminiscence. By protecting all transformations lazy and backend-agnostic, we show how to write analytics code as soon as in Python and depend on Ibis to translate it into environment friendly SQL. Check out the FULL CODES right here.

    !pip -q set up "ibis-framework[duckdb,examples]" duckdb pyarrow pandas
    
    
    import ibis
    from ibis import _
    
    
    print("Ibis model:", ibis.__version__)
    
    
    con = ibis.duckdb.join()
    ibis.choices.interactive = True

    We set up the required libraries and initialize the Ibis surroundings. We set up a DuckDB connection and allow interactive execution so that each one subsequent operations stay lazy and backend-driven. Check out the FULL CODES right here.

    attempt:
       base_expr = ibis.examples.penguins.fetch(backend=con)
    besides TypeError:
       base_expr = ibis.examples.penguins.fetch()
    
    
    if "penguins" not in con.list_tables():
       attempt:
           con.create_table("penguins", base_expr, overwrite=True)
       besides Exception:
           con.create_table("penguins", base_expr.execute(), overwrite=True)
    
    
    t = con.desk("penguins")
    print(t.schema())

    We load the Penguins dataset and explicitly register it contained in the DuckDB catalog to guarantee it’s obtainable for SQL execution. We confirm the desk schema and verify that the information now lives contained in the database somewhat than in native reminiscence. Check out the FULL CODES right here.

    def penguin_feature_pipeline(penguins):
       base = penguins.mutate(
           bill_ratio=_.bill_length_mm / _.bill_depth_mm,
           is_male=(_.intercourse == "male").ifelse(1, 0),
       )
    
    
       cleaned = base.filter(
           _.bill_length_mm.notnull()
           & _.bill_depth_mm.notnull()
           & _.body_mass_g.notnull()
           & _.flipper_length_mm.notnull()
           & _.species.notnull()
           & _.island.notnull()
           & _.12 months.notnull()
       )
    
    
       w_species = ibis.window(group_by=[cleaned.species])
       w_island_year = ibis.window(
           group_by=[cleaned.island],
           order_by=[cleaned.year],
           previous=2,
           following=0,
       )
    
    
       feat = cleaned.mutate(
           species_avg_mass=cleaned.body_mass_g.imply().over(w_species),
           species_std_mass=cleaned.body_mass_g.std().over(w_species),
           mass_z=(
               cleaned.body_mass_g
               - cleaned.body_mass_g.imply().over(w_species)
           ) / cleaned.body_mass_g.std().over(w_species),
           island_mass_rank=cleaned.body_mass_g.rank().over(
               ibis.window(group_by=[cleaned.island])
           ),
           rolling_3yr_island_avg_mass=cleaned.body_mass_g.imply().over(
               w_island_year
           ),
       )
    
    
       return feat.group_by(["species", "island", "year"]).agg(
           n=feat.depend(),
           avg_mass=feat.body_mass_g.imply(),
           avg_flipper=feat.flipper_length_mm.imply(),
           avg_bill_ratio=feat.bill_ratio.imply(),
           avg_mass_z=feat.mass_z.imply(),
           avg_rolling_3yr_mass=feat.rolling_3yr_island_avg_mass.imply(),
           pct_male=feat.is_male.imply(),
       ).order_by(["species", "island", "year"])

    We outline a reusable characteristic engineering pipeline utilizing pure Ibis expressions. We compute derived options, apply knowledge cleansing, and use window features and grouped aggregations to construct superior, database-native options whereas protecting your entire pipeline lazy. Check out the FULL CODES right here.

    options = penguin_feature_pipeline(t)
    print(con.compile(options))
    
    
    attempt:
       df = options.to_pandas()
    besides Exception:
       df = options.execute()
    
    
    show(df.head())

    We invoke the characteristic pipeline and compile it into DuckDB SQL to validate that each one transformations are pushed down to the database. We then run the pipeline and return solely the ultimate aggregated outcomes for inspection. Check out the FULL CODES right here.

    con.create_table("penguin_features", options, overwrite=True)
    
    
    feat_tbl = con.desk("penguin_features")
    
    
    attempt:
       preview = feat_tbl.restrict(10).to_pandas()
    besides Exception:
       preview = feat_tbl.restrict(10).execute()
    
    
    show(preview)
    
    
    out_path = "/content material/penguin_features.parquet"
    con.raw_sql(f"COPY penguin_features TO '{out_path}' (FORMAT PARQUET);")
    print(out_path)

    We materialize the engineered options as a desk instantly inside DuckDB and question it lazily for verification. We additionally export the outcomes to a Parquet file, demonstrating how we will hand off database-computed options to downstream analytics or machine studying workflows.

    In conclusion, we constructed, compiled, and executed a sophisticated characteristic engineering workflow absolutely inside DuckDB utilizing Ibis. We demonstrated how to examine the generated SQL, materialized outcomes instantly within the database, and exported them for downstream use whereas preserving portability throughout analytical backends. This strategy reinforces the core thought behind Ibis: we maintain computation shut to the information, decrease pointless knowledge motion, and keep a single, reusable Python codebase that scales from native experimentation to manufacturing databases.


    Check out the FULL CODES right here. Also, be happy to observe us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you may be a part of us on telegram as effectively.

    Check out our newest launch of ai2025.dev, a 2025-focused analytics platform that turns mannequin launches, benchmarks, and ecosystem exercise right into a structured dataset you may filter, evaluate, and export.


    Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Artificial Intelligence for social good. His most up-to-date endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

    ztoog.com

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    Crypto

    Build a pipeline and close deals with an exhibit table at Disrupt 2026

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    AI generates high-quality images 30 times faster in a single step | Ztoog

    In our present age of synthetic intelligence, computer systems can generate their very own “art”…

    The Future

    Mozilla Is Giving Thunderbird Mail a New Logo

    It’s been typically mentioned that Thunderbird, regardless of having energetic help for years, has remained…

    Science

    A New Proof Moves the Needle on a Sticky Geometry Problem

    The unique model of this story appeared in Quanta Magazine.In 1917, the Japanese mathematician Sōichi…

    Mobile

    Nothing explains how it will keep its iMessage app secure

    Damien Wilde / Android AuthorityTL;DR Nothing is launching the beta of Nothing Chats, which guarantees…

    AI

    GPT-5 is here. Now what?

    Whereas o1 was a significant technological development, GPT-5 is, above all else, a refined product.…

    Our Picks
    The Future

    Google working an a performance update for its Chromecast with Google TV

    Mobile

    Galaxy Tab S9 FE and FE+ prices slashed for the first time

    Gadgets

    Canva’s Acquisition Of Affinity Challenges Adobe’s Dominance

    Categories
    • AI (1,560)
    • Crypto (1,826)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    The Future

    The 13 Best Yoga Apps So You Can Practice at Home

    The Future

    CesiumAstro claims former exec spilled trade secrets to upstart competitor AnySignal

    Science

    Snowflakes fall to the ground according to a universal pattern

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.