Close Menu
Ztoog
    What's Hot
    Mobile

    Scoop up this popular Anker 737 Power Bank bundle at irresistible prices on Amazon

    The Future

    Nothing Launches Smartwatch for Under $70, Alongside $49 Earbuds

    Science

    Parasites found in 200 million-year-old fossilized poop

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      Link Building in 2026: A Desperate, Last-Ditch Guide for the Terminally Online

      ‘Smoke Weed and Earn Bitcoin’ With This Vape Pen in Our Increasingly Dystopian Nightmare

      Everything Google announced at its Android Show, from Googlebooks to vibe-coded widgets

      CapCut Vs InShot: Which is the Best Video Editing Tool?

      What Meta gets wrong about workforce analytics

    • Technology

      IEEE Society ‘s Pitch Sessions Link Lab With Market

      Britain launches coordinated taskforce targeting illegal gambling payments advertising and operators

      Marc Lore says that AI will soon enable anyone open a restaurant

      Snapdragon 8 Elite Gen 5 vs Dimensity 9500: The performance gap shrinks

      Today’s NYT Mini Crossword Answers for April 18

    • Gadgets

      The 2026 Gadget Odyssey: An Honest Take on Tech That Actually Works

      AcuRite Explains Why It Is Discontinuing Its Legacy App

      Backup all your emails in one place with Mail Backup X

      Asus Zenbook A16 (2026) Review: Savor the Power, Ignore the Beige

      Drone pilot makes US rescind no-fly zones around unmarked, moving ICE vehicles

    • Mobile

      Leaked Internal memo from T-Mobile COO Freier reveals official date when T-Mobile goes 100% digital

      Android 17 creator features bring AI editing, Premiere, and better Instagram uploads

      Oppo Enco Clip2 unboxing and hands-on

      The app Splitwise is the best hack to split group trip expenses in 2026

      Oppo Find X9 Ultra teardown video goes in-depth with every component

    • Science

      Whatever the mirror test tells us, beluga whales pass it

      Ready to hunt some enormous snakes? The Florida Python Challenge returns.

      The First Atomic Bomb Test in 1945 Created an Entirely New Material

      Pressure from individual particles measured for the first time

      The problem of cosmic inflation and how to solve it

    • AI

      The Great AI Bake-Off of 2026: Why Your Chatbot is a Genius (And Also Thirsty)

      Google I/O showed how the path for AI-driven science is shifting

      Two from MIT named 2026 Knight-Hennessy Scholars | Ztoog

      Establishing AI and data sovereignty in the age of autonomous systems

      Study: Firms often use automation to control certain workers’ wages | Ztoog

    • Crypto

      American Mega Bank Is Dumping Its Ethereum Holdings, Here’s What It’s Buying

      Bitcoin’s Social Euphoria Hits Annual Peak Due To CLARITY Act, But History Says Caution Is Warranted

      Anthropic warns investors to avoid unauthorized secondary market sellers

      Binance Founder CZ Sees Major Changes Ahead For Crypto

      As crypto cools, a16z crypto raises a $2.2B fund

    Ztoog
    Home » How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    AI

    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution

    Facebook Twitter Pinterest WhatsApp
    How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    In this tutorial, we show how we use Ibis to construct a transportable, in-database characteristic engineering pipeline that appears and looks like Pandas however executes completely contained in the database. We present how we join to DuckDB, register knowledge safely contained in the backend, and outline advanced transformations utilizing window features and aggregations with out ever pulling uncooked knowledge into native reminiscence. By protecting all transformations lazy and backend-agnostic, we show how to write analytics code as soon as in Python and depend on Ibis to translate it into environment friendly SQL. Check out the FULL CODES right here.

    !pip -q set up "ibis-framework[duckdb,examples]" duckdb pyarrow pandas
    
    
    import ibis
    from ibis import _
    
    
    print("Ibis model:", ibis.__version__)
    
    
    con = ibis.duckdb.join()
    ibis.choices.interactive = True

    We set up the required libraries and initialize the Ibis surroundings. We set up a DuckDB connection and allow interactive execution so that each one subsequent operations stay lazy and backend-driven. Check out the FULL CODES right here.

    attempt:
       base_expr = ibis.examples.penguins.fetch(backend=con)
    besides TypeError:
       base_expr = ibis.examples.penguins.fetch()
    
    
    if "penguins" not in con.list_tables():
       attempt:
           con.create_table("penguins", base_expr, overwrite=True)
       besides Exception:
           con.create_table("penguins", base_expr.execute(), overwrite=True)
    
    
    t = con.desk("penguins")
    print(t.schema())

    We load the Penguins dataset and explicitly register it contained in the DuckDB catalog to guarantee it’s obtainable for SQL execution. We confirm the desk schema and verify that the information now lives contained in the database somewhat than in native reminiscence. Check out the FULL CODES right here.

    def penguin_feature_pipeline(penguins):
       base = penguins.mutate(
           bill_ratio=_.bill_length_mm / _.bill_depth_mm,
           is_male=(_.intercourse == "male").ifelse(1, 0),
       )
    
    
       cleaned = base.filter(
           _.bill_length_mm.notnull()
           & _.bill_depth_mm.notnull()
           & _.body_mass_g.notnull()
           & _.flipper_length_mm.notnull()
           & _.species.notnull()
           & _.island.notnull()
           & _.12 months.notnull()
       )
    
    
       w_species = ibis.window(group_by=[cleaned.species])
       w_island_year = ibis.window(
           group_by=[cleaned.island],
           order_by=[cleaned.year],
           previous=2,
           following=0,
       )
    
    
       feat = cleaned.mutate(
           species_avg_mass=cleaned.body_mass_g.imply().over(w_species),
           species_std_mass=cleaned.body_mass_g.std().over(w_species),
           mass_z=(
               cleaned.body_mass_g
               - cleaned.body_mass_g.imply().over(w_species)
           ) / cleaned.body_mass_g.std().over(w_species),
           island_mass_rank=cleaned.body_mass_g.rank().over(
               ibis.window(group_by=[cleaned.island])
           ),
           rolling_3yr_island_avg_mass=cleaned.body_mass_g.imply().over(
               w_island_year
           ),
       )
    
    
       return feat.group_by(["species", "island", "year"]).agg(
           n=feat.depend(),
           avg_mass=feat.body_mass_g.imply(),
           avg_flipper=feat.flipper_length_mm.imply(),
           avg_bill_ratio=feat.bill_ratio.imply(),
           avg_mass_z=feat.mass_z.imply(),
           avg_rolling_3yr_mass=feat.rolling_3yr_island_avg_mass.imply(),
           pct_male=feat.is_male.imply(),
       ).order_by(["species", "island", "year"])

    We outline a reusable characteristic engineering pipeline utilizing pure Ibis expressions. We compute derived options, apply knowledge cleansing, and use window features and grouped aggregations to construct superior, database-native options whereas protecting your entire pipeline lazy. Check out the FULL CODES right here.

    options = penguin_feature_pipeline(t)
    print(con.compile(options))
    
    
    attempt:
       df = options.to_pandas()
    besides Exception:
       df = options.execute()
    
    
    show(df.head())

    We invoke the characteristic pipeline and compile it into DuckDB SQL to validate that each one transformations are pushed down to the database. We then run the pipeline and return solely the ultimate aggregated outcomes for inspection. Check out the FULL CODES right here.

    con.create_table("penguin_features", options, overwrite=True)
    
    
    feat_tbl = con.desk("penguin_features")
    
    
    attempt:
       preview = feat_tbl.restrict(10).to_pandas()
    besides Exception:
       preview = feat_tbl.restrict(10).execute()
    
    
    show(preview)
    
    
    out_path = "/content material/penguin_features.parquet"
    con.raw_sql(f"COPY penguin_features TO '{out_path}' (FORMAT PARQUET);")
    print(out_path)

    We materialize the engineered options as a desk instantly inside DuckDB and question it lazily for verification. We additionally export the outcomes to a Parquet file, demonstrating how we will hand off database-computed options to downstream analytics or machine studying workflows.

    In conclusion, we constructed, compiled, and executed a sophisticated characteristic engineering workflow absolutely inside DuckDB utilizing Ibis. We demonstrated how to examine the generated SQL, materialized outcomes instantly within the database, and exported them for downstream use whereas preserving portability throughout analytical backends. This strategy reinforces the core thought behind Ibis: we maintain computation shut to the information, decrease pointless knowledge motion, and keep a single, reusable Python codebase that scales from native experimentation to manufacturing databases.


    Check out the FULL CODES right here. Also, be happy to observe us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you may be a part of us on telegram as effectively.

    Check out our newest launch of ai2025.dev, a 2025-focused analytics platform that turns mannequin launches, benchmarks, and ecosystem exercise right into a structured dataset you may filter, evaluate, and export.


    Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Artificial Intelligence for social good. His most up-to-date endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

    ztoog.com

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    The Great AI Bake-Off of 2026: Why Your Chatbot is a Genius (And Also Thirsty)

    AI

    Google I/O showed how the path for AI-driven science is shifting

    Science

    Ready to hunt some enormous snakes? The Florida Python Challenge returns.

    AI

    Two from MIT named 2026 Knight-Hennessy Scholars | Ztoog

    AI

    Establishing AI and data sovereignty in the age of autonomous systems

    AI

    Study: Firms often use automation to control certain workers’ wages | Ztoog

    AI

    A blueprint for using AI to strengthen democracy

    AI

    Sakana AI Introduces KAME: A Tandem Speech-to-Speech Architecture That Injects LLM Knowledge in Real Time

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    The Future

    Google’s market share of search hasn’t been disrupted by AI – yet

    While Bing’s use of synthetic intelligence (AI) has despatched utilization of the search engine skyrocketing,…

    The Future

    Clibrain’s Lince: The LLM That Understands Spanish Like a Native Speaker

    Clibrain, a Madrid-based AI startup, has joined the race to create generative AI fashions optimized…

    Technology

    Memorial Day Sales 2024: Shop the Very Best Deals at Amazon, Walmart, Best Buy and More

    Our Experts Written by  Adam Oram, Stephanie Barnes, Aashna Gheewalla, Oliver Haslam, Adrian Marlow Our…

    AI

    Google I/O showed how the path for AI-driven science is shifting

    Just this week, Pushmeet Kohli, Google Cloud’s chief scientist, published a piece in a special…

    Gadgets

    XReal introduces a $200 device that brings Android apps to its AR glasses

    XReal has largely flown below the radar right here within the States. The Beijing agency’s…

    Our Picks
    Technology

    Meta AI removes block on election-related queries in India while Google still applying limits

    Mobile

    Future Apple Watch models might use your sweat to see if you’re healthy

    AI

    LMSYS ORG Introduces Arena-Hard: A Data Pipeline to Build High-Quality Benchmarks from Live Data in Chatbot Arena, which is a Crowd-Sourced Platform for LLM Evals

    Categories
    • AI (1,581)
    • Crypto (1,848)
    • Gadgets (1,884)
    • Mobile (1,924)
    • Science (1,960)
    • Technology (1,876)
    • The Future (1,733)
    Most Popular
    Crypto

    570% Rally On The Horizon, Expert Trader Says

    Crypto

    What is Solana?

    Technology

    Experts say terrorist groups are using generative AI tools to evade the hashing algorithms used by tech companies to automatically remove extremist content (David Gilbert/Wired)

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.