Close Menu
Ztoog
    What's Hot
    Science

    Elon Musk Says a Human Patient Has Received Neuralink’s Brain Implant

    Technology

    Best Personal Drone for 2023

    Technology

    How to Fix “AI’s Original Sin” – O’Reilly

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Everything Google announced at its Android Show, from Googlebooks to vibe-coded widgets

      CapCut Vs InShot: Which is the Best Video Editing Tool?

      What Meta gets wrong about workforce analytics

      Do you need to worry about Mythos, Anthropic’s computer-hacking AI?

      DraftKings is set to be the first sportsbook to launch its own federal PAC

    • Technology

      Marc Lore says that AI will soon enable anyone open a restaurant

      Snapdragon 8 Elite Gen 5 vs Dimensity 9500: The performance gap shrinks

      Today’s NYT Mini Crossword Answers for April 18

      Soft Photonic Switch Could Drive All‑Optical Logic

      Iran war: Why Trump’s defense secretary keeps talking about “lethality”

    • Gadgets

      Backup all your emails in one place with Mail Backup X

      Asus Zenbook A16 (2026) Review: Savor the Power, Ignore the Beige

      Drone pilot makes US rescind no-fly zones around unmarked, moving ICE vehicles

      Fitbit Enhances Sleep Score With Deep Analytics And Digital Coaching

      Google shoehorned Rust into Pixel 10 modem to make legacy code safer

    • Mobile

      Android 17 creator features bring AI editing, Premiere, and better Instagram uploads

      Oppo Enco Clip2 unboxing and hands-on

      The app Splitwise is the best hack to split group trip expenses in 2026

      Oppo Find X9 Ultra teardown video goes in-depth with every component

      T-Mobile tells stunned subscriber that T-Force reps are human, not AI

    • Science

      Pressure from individual particles measured for the first time

      The problem of cosmic inflation and how to solve it

      Research roundup: 6 cool science stories we almost missed

      Metal-reinforced scorpions evolved to kill

      A Startup Says It Grew Human Sperm in a Lab—and Used It to Make Embryos

    • AI

      Study: Firms often use automation to control certain workers’ wages | Ztoog

      A blueprint for using AI to strengthen democracy

      Sakana AI Introduces KAME: A Tandem Speech-to-Speech Architecture That Injects LLM Knowledge in Real Time

      Enabling privacy-preserving AI training on everyday devices | Ztoog

      Google Introduces Simula: A Reasoning-First Framework for Generating Controllable, Scalable Synthetic Datasets Across Specialized AI Domains

    • Crypto

      Binance Founder CZ Sees Major Changes Ahead For Crypto

      As crypto cools, a16z crypto raises a $2.2B fund

      Ethereum Shows Strength With $1 Billion In Buying Despite Hawkish Fed

      Bitcoin Faces ‘Most Critical Week In Months’ Amid $76,000 Retest

      Analyst Says Everyone Misunderstood The M2-Bitcoin Relationship, Here’s What Happens

    Ztoog
    Home » How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    AI

    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution

    Facebook Twitter Pinterest WhatsApp
    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    In this tutorial, we show how we use Ibis to construct a transportable, in-database characteristic engineering pipeline that appears and looks like Pandas however executes completely contained in the database. We present how we join to DuckDB, register knowledge safely contained in the backend, and outline advanced transformations utilizing window features and aggregations with out ever pulling uncooked knowledge into native reminiscence. By protecting all transformations lazy and backend-agnostic, we show how to write analytics code as soon as in Python and depend on Ibis to translate it into environment friendly SQL. Check out the FULL CODES right here.

    !pip -q set up "ibis-framework[duckdb,examples]" duckdb pyarrow pandas
    
    
    import ibis
    from ibis import _
    
    
    print("Ibis model:", ibis.__version__)
    
    
    con = ibis.duckdb.join()
    ibis.choices.interactive = True

    We set up the required libraries and initialize the Ibis surroundings. We set up a DuckDB connection and allow interactive execution so that each one subsequent operations stay lazy and backend-driven. Check out the FULL CODES right here.

    attempt:
       base_expr = ibis.examples.penguins.fetch(backend=con)
    besides TypeError:
       base_expr = ibis.examples.penguins.fetch()
    
    
    if "penguins" not in con.list_tables():
       attempt:
           con.create_table("penguins", base_expr, overwrite=True)
       besides Exception:
           con.create_table("penguins", base_expr.execute(), overwrite=True)
    
    
    t = con.desk("penguins")
    print(t.schema())

    We load the Penguins dataset and explicitly register it contained in the DuckDB catalog to guarantee it’s obtainable for SQL execution. We confirm the desk schema and verify that the information now lives contained in the database somewhat than in native reminiscence. Check out the FULL CODES right here.

    def penguin_feature_pipeline(penguins):
       base = penguins.mutate(
           bill_ratio=_.bill_length_mm / _.bill_depth_mm,
           is_male=(_.intercourse == "male").ifelse(1, 0),
       )
    
    
       cleaned = base.filter(
           _.bill_length_mm.notnull()
           & _.bill_depth_mm.notnull()
           & _.body_mass_g.notnull()
           & _.flipper_length_mm.notnull()
           & _.species.notnull()
           & _.island.notnull()
           & _.12 months.notnull()
       )
    
    
       w_species = ibis.window(group_by=[cleaned.species])
       w_island_year = ibis.window(
           group_by=[cleaned.island],
           order_by=[cleaned.year],
           previous=2,
           following=0,
       )
    
    
       feat = cleaned.mutate(
           species_avg_mass=cleaned.body_mass_g.imply().over(w_species),
           species_std_mass=cleaned.body_mass_g.std().over(w_species),
           mass_z=(
               cleaned.body_mass_g
               - cleaned.body_mass_g.imply().over(w_species)
           ) / cleaned.body_mass_g.std().over(w_species),
           island_mass_rank=cleaned.body_mass_g.rank().over(
               ibis.window(group_by=[cleaned.island])
           ),
           rolling_3yr_island_avg_mass=cleaned.body_mass_g.imply().over(
               w_island_year
           ),
       )
    
    
       return feat.group_by(["species", "island", "year"]).agg(
           n=feat.depend(),
           avg_mass=feat.body_mass_g.imply(),
           avg_flipper=feat.flipper_length_mm.imply(),
           avg_bill_ratio=feat.bill_ratio.imply(),
           avg_mass_z=feat.mass_z.imply(),
           avg_rolling_3yr_mass=feat.rolling_3yr_island_avg_mass.imply(),
           pct_male=feat.is_male.imply(),
       ).order_by(["species", "island", "year"])

    We outline a reusable characteristic engineering pipeline utilizing pure Ibis expressions. We compute derived options, apply knowledge cleansing, and use window features and grouped aggregations to construct superior, database-native options whereas protecting your entire pipeline lazy. Check out the FULL CODES right here.

    options = penguin_feature_pipeline(t)
    print(con.compile(options))
    
    
    attempt:
       df = options.to_pandas()
    besides Exception:
       df = options.execute()
    
    
    show(df.head())

    We invoke the characteristic pipeline and compile it into DuckDB SQL to validate that each one transformations are pushed down to the database. We then run the pipeline and return solely the ultimate aggregated outcomes for inspection. Check out the FULL CODES right here.

    con.create_table("penguin_features", options, overwrite=True)
    
    
    feat_tbl = con.desk("penguin_features")
    
    
    attempt:
       preview = feat_tbl.restrict(10).to_pandas()
    besides Exception:
       preview = feat_tbl.restrict(10).execute()
    
    
    show(preview)
    
    
    out_path = "/content material/penguin_features.parquet"
    con.raw_sql(f"COPY penguin_features TO '{out_path}' (FORMAT PARQUET);")
    print(out_path)

    We materialize the engineered options as a desk instantly inside DuckDB and question it lazily for verification. We additionally export the outcomes to a Parquet file, demonstrating how we will hand off database-computed options to downstream analytics or machine studying workflows.

    In conclusion, we constructed, compiled, and executed a sophisticated characteristic engineering workflow absolutely inside DuckDB utilizing Ibis. We demonstrated how to examine the generated SQL, materialized outcomes instantly within the database, and exported them for downstream use whereas preserving portability throughout analytical backends. This strategy reinforces the core thought behind Ibis: we maintain computation shut to the information, decrease pointless knowledge motion, and keep a single, reusable Python codebase that scales from native experimentation to manufacturing databases.


    Check out the FULL CODES right here. Also, be happy to observe us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you may be a part of us on telegram as effectively.

    Check out our newest launch of ai2025.dev, a 2025-focused analytics platform that turns mannequin launches, benchmarks, and ecosystem exercise right into a structured dataset you may filter, evaluate, and export.


    Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Artificial Intelligence for social good. His most up-to-date endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

    ztoog.com

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Study: Firms often use automation to control certain workers’ wages | Ztoog

    AI

    A blueprint for using AI to strengthen democracy

    AI

    Sakana AI Introduces KAME: A Tandem Speech-to-Speech Architecture That Injects LLM Knowledge in Real Time

    AI

    Enabling privacy-preserving AI training on everyday devices | Ztoog

    AI

    Google Introduces Simula: A Reasoning-First Framework for Generating Controllable, Scalable Synthetic Datasets Across Specialized AI Domains

    AI

    Treating enterprise AI as an operating layer

    AI

    A philosophy of work | Ztoog

    AI

    Enabling agent-first process redesign | MIT Technology Review

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    AI

    A new way to look at data privacy | Ztoog

    Imagine {that a} crew of scientists has developed a machine-learning mannequin that may predict whether…

    Mobile

    Android 14 QPR3 Beta 2 is now available with bug fixes for Pixel devices

    Google at this time introduced the discharge of Android 14 QPR3 Beta 2 (Build AP21.240216.010),…

    AI

    New method assesses and improves the reliability of radiologists’ diagnostic reports | Ztoog

    Due to the inherent ambiguity in medical pictures like X-rays, radiologists usually use phrases like…

    AI

    Scalable spherical CNNs for scientific applications – Google Research Blog

    Posted by Carlos Esteves and Ameesh Makadia, Research Scientists, Google Research, Athena Team

    Technology

    An analysis of 36GB of US school network logs covering January 2022 to August 2023 finds widespread use of filters to censor the internet, including health info (Wired)

    Wired: An analysis of 36GB of US school network logs covering January 2022 to August…

    Our Picks
    AI

    Humans at the heart of generative AI

    Mobile

    iOS 17.1 breaks iPhone 15 wireless charging on some GM vehicles

    AI

    Sorting waste and recyclables with a fleet of robots – Ztoog

    Categories
    • AI (1,577)
    • Crypto (1,845)
    • Gadgets (1,882)
    • Mobile (1,923)
    • Science (1,957)
    • Technology (1,874)
    • The Future (1,731)
    Most Popular
    Mobile

    Leaks suggest the Samsung Galaxy Z Fold 6 Slim might be worth waiting for

    The Future

    The Benefits and Risks of Using Virtual Data Rooms for Startups

    Technology

    Acer unveils new Swift Edge 16 and Predator Triton 16 laptops

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.