Close Menu
Ztoog
    What's Hot
    AI

    LUMOS: An Open-Source Generalizable Language Agent Training Framework

    Mobile

    Samsung Galaxy S23 FE arrives in Europe and South Korea

    Technology

    How to Annotate Images on Google Jamboard

    Important Pages:
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    Facebook X (Twitter) Instagram Pinterest
    Facebook X (Twitter) Instagram Pinterest
    Ztoog
    • Home
    • The Future

      What is Project Management? 5 Best Tools that You Can Try

      Operational excellence strategy and continuous improvement

      Hannah Fry: AI isn’t as powerful as we think

      FanDuel goes all in on responsible gaming push with new Play with a Plan campaign

      Gettyimages.com Is the Best Website on the Internet Right Now

    • Technology

      Iran war: How could it end?

      Democratic senators question CFTC staffing cuts in Chicago enforcement office

      Google’s Cloud AI lead on the three frontiers of model capability

      AMD agrees to backstop a $300M loan from Goldman Sachs for Crusoe to buy AMD AI chips, the first known case of AMD chips used as debt collateral (The Information)

      Productivity apps failed me when I needed them most

    • Gadgets

      macOS Tahoe 26.3.1 update will “upgrade” your M5’s CPU to new “super” cores

      Lenovo Shows Off a ThinkBook Modular AI PC Concept With Swappable Ports and Detachable Displays at MWC 2026

      POCO M8 Review: The Ultimate Budget Smartphone With Some Cons

      The Mission: Impossible of SSDs has arrived with a fingerprint lock

      6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

    • Mobile

      Android’s March update is all about finding people, apps, and your missing bags

      Watch Xiaomi’s global launch event live here

      Our poll shows what buyers actually care about in new smartphones (Hint: it’s not AI)

      Is Strava down for you? You’re not alone

      The Motorola Razr FIFA World Cup 2026 Edition was literally just unveiled, and Verizon is already giving them away

    • Science

      Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

      Inside the best dark matter detector ever built

      NASA’s Artemis moon exploration programme is getting a major makeover

      Scientists crack the case of “screeching” Scotch tape

      Blue-faced, puffy-lipped monkey scores a rare conservation win

    • AI

      Online harassment is entering its AI era

      Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

      New method could increase LLM training efficiency | Ztoog

      The human work behind humanoid robots is being hidden

      NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    • Crypto

      SEC Vs. Justin Sun Case Ends In $10M Settlement

      Google paid startup Form Energy $1B for its massive 100-hour battery

      Ethereum Breakout Alert: Corrective Channel Flip Sparks Impulsive Wave

      Show Your ID Or No Deal

      Jane Street sued for alleged front-running trades that accelerated Terraform Labs meltdown

    Ztoog
    Home » On-device real-time few-shot face stylization – Google Research Blog
    AI

    On-device real-time few-shot face stylization – Google Research Blog

    Facebook Twitter Pinterest WhatsApp
    On-device real-time few-shot face stylization – Google Research Blog
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp

    Posted by Haolin Jia, Software Engineer, and Qifei Wang, Senior Software Engineer, Core ML

    In current years, we’ve got witnessed rising curiosity throughout customers and researchers in built-in augmented actuality (AR) experiences utilizing real-time face characteristic era and enhancing features in cell functions, together with brief movies, digital actuality, and gaming. As a consequence, there’s a rising demand for light-weight, but high-quality face era and enhancing fashions, which are sometimes primarily based on generative adversarial community (GAN) methods. However, nearly all of GAN fashions undergo from excessive computational complexity and the necessity for a big coaching dataset. In addition, it’s also essential to make use of GAN fashions responsibly.

    In this submit, we introduce MediaPipe FaceStylizer, an environment friendly design for few-shot face stylization that addresses the aforementioned mannequin complexity and information effectivity challenges whereas being guided by Google’s accountable AI Principles. The mannequin consists of a face generator and a face encoder used as GAN inversion to map the picture into latent code for the generator. We introduce a mobile-friendly synthesis community for the face generator with an auxiliary head that converts options to RGB at every stage of the generator to generate prime quality photos from coarse to high-quality granularities. We additionally fastidiously designed the loss features for the aforementioned auxiliary heads and mixed them with the frequent GAN loss features to distill the coed generator from the instructor StyleGAN mannequin, leading to a light-weight mannequin that maintains excessive era high quality. The proposed answer is offered in open supply via MediaPipe. Users can fine-tune the generator to study a method from one or a number of photos utilizing MediaPipe Model Maker, and deploy to on-device face stylization functions with the custom-made mannequin utilizing MediaPipe FaceStylizer.

    Few-shot on-device face stylization

    An end-to-end pipeline

    Our objective is to construct a pipeline to help customers to adapt the MediaPipe FaceStylizer to completely different types by fine-tuning the mannequin with a number of examples. To allow such a face stylization pipeline, we constructed the pipeline with a GAN inversion encoder and environment friendly face generator mannequin (see under). The encoder and generator pipeline can then be tailored to completely different types through a few-shot studying course of. The consumer first sends a single or a number of related samples of the model photos to MediaPipe ModelMaker to fine-tune the mannequin. The fine-tuning course of freezes the encoder module and solely fine-tunes the generator. The coaching course of samples a number of latent codes near the encoding output of the enter model photos because the enter to the generator. The generator is then educated to reconstruct a picture of an individual’s face within the model of the enter model picture by optimizing a joint adversarial loss operate that additionally accounts for model and content material. With such a fine-tuning course of, the MediaPipe FaceStylizer can adapt to the custom-made model, which approximates the consumer’s enter. It can then be utilized to stylize check photos of actual human faces.

    Generator: BlazeStyleGAN

    The StyleGAN mannequin household has been broadly adopted for face era and varied face enhancing duties. To help environment friendly on-device face era, we primarily based the design of our generator on StyleGAN. This generator, which we name BlazeStyleGAN, is much like StyleGAN in that it additionally incorporates a mapping community and synthesis community. However, because the synthesis community of StyleGAN is the foremost contributor to the mannequin’s excessive computation complexity, we designed and employed a extra environment friendly synthesis community. The improved effectivity and era high quality is achieved by:

    1. Reducing the latent characteristic dimension within the synthesis community to 1 / 4 of the decision of the counterpart layers within the instructor StyleGAN,
    2. Designing a number of auxiliary heads to rework the downscaled characteristic to the picture area to type a coarse-to-fine picture pyramid to guage the perceptual high quality of the reconstruction, and
    3. Skipping all however the closing auxiliary head at inference time.

    With the newly designed structure, we practice the BlazeStyleGAN mannequin by distilling it from a instructor StyleGAN mannequin. We use a multi-scale perceptual loss and adversarial loss within the distillation to switch the excessive constancy era functionality from the instructor mannequin to the coed BlazeStyleGAN mannequin and in addition to mitigate the artifacts from the instructor mannequin.

    More particulars of the mannequin structure and coaching scheme could be present in our paper.

    Visual comparability between face samples generated by StyleGAN and BlazeStyleGAN. The photos on the primary row are generated by the instructor StyleGAN. The photos on the second row are generated by the coed BlazeStyleGAN. The face generated by BlazeStyleGAN has related visible high quality to the picture generated by the instructor mannequin. Some outcomes show the coed BlazeStyleGAN suppresses the artifacts from the instructor mannequin within the distillation.

    In the above determine, we show some pattern outcomes of our BlazeStyleGAN. By evaluating with the face picture generated by the instructor StyleGAN mannequin (prime row), the pictures generated by the coed BlazeStyleGAN (backside row) keep excessive visible high quality and additional cut back artifacts produced by the instructor because of the loss operate design in our distillation.

    An encoder for environment friendly GAN inversion

    To help image-to-image stylization, we additionally launched an environment friendly GAN inversion because the encoder to map enter photos to the latent house of the generator. The encoder is outlined by a MobileNet V2 spine and educated with pure face photos. The loss is outlined as a mix of picture perceptual high quality loss, which measures the content material distinction, model similarity and embedding distance, in addition to the L1 loss between the enter photos and reconstructed photos.

    On-device efficiency

    We documented mannequin complexities by way of parameter numbers and computing FLOPs within the following desk. Compared to the instructor StyleGAN (33.2M parameters), BlazeStyleGAN (generator) considerably reduces the mannequin complexity, with solely 2.01M parameters and 1.28G FLOPs for output decision 256×256. Compared to StyleGAN-1024 (producing picture measurement of 1024×1024), the BlazeStyleGAN-1024 can cut back each mannequin measurement and computation complexity by 95% with no notable high quality distinction and might even suppress the artifacts from the instructor StyleGAN mannequin.

    Model     Image Size     #Params (M)     FLOPs (G)
    StyleGAN     1024     33.17     74.3
    BlazeStyleGAN     1024     2.07     4.70
    BlazeStyleGAN     512     2.05     1.57
    BlazeStyleGAN     256     2.01     1.28
    Encoder     256     1.44     0.60
    Model complexity measured by parameter numbers and FLOPs.

    We benchmarked the inference time of the MediaPipe FaceStylizer on varied high-end cell units and demonstrated the leads to the desk under. From the outcomes, each BlazeStyleGAN-256 and BlazeStyleGAN-512 achieved real-time efficiency on all GPU units. It can run in lower than 10 ms runtime on a high-end cellphone’s GPU. BlazeStyleGAN-256 also can obtain real-time efficiency on the iOS units’ CPU.

    Model     BlazeStyleGAN-256 (ms)     Encoder-256 (ms)
    iPhone 11     12.14     11.48
    iPhone 12     11.99     12.25
    iPhone 13 Pro     7.22     5.41
    Pixel 6     12.24     11.23
    Samsung Galaxy S10     17.01     12.70
    Samsung Galaxy S20     8.95     8.20
    Latency benchmark of the BlazeStyleGAN, face encoder, and the end-to-end pipeline on varied cell units.

    Fairness analysis

    The mannequin has been educated with a excessive variety dataset of human faces. The mannequin is anticipated to be honest to completely different human faces. The equity analysis demonstrates the mannequin performs good and balanced by way of human gender, skin-tone, and ages.

    Face stylization visualization

    Some face stylization outcomes are demonstrated within the following determine. The photos within the prime row (in orange containers) signify the model photos used to fine-tune the mannequin. The photos within the left column (within the inexperienced containers) are the pure face photos used for testing. The 2×4 matrix of photos represents the output of the MediaPipe FaceStylizer which is mixing outputs between the pure faces on the left-most column and the corresponding face types on the highest row. The outcomes show that our answer can obtain high-quality face stylization for a number of fashionable types.

    Sample outcomes of our MediaPipe FaceStylizer.

    MediaPipe Solutions

    The MediaPipe FaceStylizer goes to be launched to public customers in MediaPipe Solutions. Users can leverage MediaPipe Model Maker to coach a custom-made face stylization mannequin utilizing their very own model photos. After coaching, the exported bundle of TFLite mannequin information could be deployed to functions throughout platforms (Android, iOS, Web, Python, and many others.) utilizing the MediaPipe Tasks FaceStylizer API in just some traces of code.

    Acknowledgements

    This work is made potential via a collaboration spanning a number of groups throughout Google. We’d prefer to acknowledge contributions from Omer Tov, Yang Zhao, Andrey Vakunov, Fei Deng, Ariel Ephrat, Inbar Mosseri, Lu Wang, Chuo-Ling Chang, Tingbo Hou, and Matthias Grundmann.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp

    Related Posts

    AI

    Online harassment is entering its AI era

    AI

    Meet NullClaw: The 678 KB Zig AI Agent Framework Running on 1 MB RAM and Booting in Two Milliseconds

    AI

    New method could increase LLM training efficiency | Ztoog

    AI

    The human work behind humanoid robots is being hidden

    AI

    NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

    AI

    Personalization features can make LLMs more agreeable | Ztoog

    AI

    AI is already making online crimes easier. It could get much worse.

    AI

    NVIDIA Researchers Introduce KVTC Transform Coding Pipeline to Compress Key-Value Caches by 20x for Efficient LLM Serving

    Leave A Reply Cancel Reply

    Follow Us
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    Top Posts
    Mobile

    GameSir X3 vs Razer Kishi V2

    The sturdy GameSir X3 has some nice options together with a in-built fan and customizable…

    Technology

    Slack gets interface makeover with a more unified approach including dedicated DMs

    Like most software program, Slack developed over time including a slew of latest options like…

    AI

    Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages

    Significant developments in speech expertise have been remodeled the previous decade, permitting it to be…

    Gadgets

    The best hot tubs of 2023

    We might earn income from the merchandise accessible on this web page and take part…

    Technology

    Best Personal Drone for 2023

    Welcome to our complete information on the perfect private drones in 2023! Personal drones have…

    Our Picks
    Science

    Jupiter’s stormy surface replicated in lab

    AI

    This AI Paper Proposes an Interactive Agent Foundation Model that Uses a Novel Multi-Task Agent Training Paradigm for Training AI Agents Across a Wide Range of Domains, Datasets, and Tasks

    Mobile

    The specs that matter, those that don’t

    Categories
    • AI (1,560)
    • Crypto (1,827)
    • Gadgets (1,870)
    • Mobile (1,910)
    • Science (1,939)
    • Technology (1,862)
    • The Future (1,716)
    Most Popular
    The Future

    Don’t sweat the valuation headlines, ByteDance is doing great

    Technology

    NASA finds water and organics in asteroid sample—possible clues to origin of life

    Gadgets

    The Best of CES 2024

    Ztoog
    Facebook X (Twitter) Instagram Pinterest
    • Home
    • About Us
    • Contact us
    • Privacy Policy
    • Terms & Conditions
    © 2026 Ztoog.

    Type above and press Enter to search. Press Esc to cancel.