Chrome Extension

WeChat Mini Program

Use on ChatGLM

Log in

Academic Profile User Profile

My Following Paper Collections Browse History

PuLID: Pure and Lightning ID Customization Via Contrastive Alignment

Zinan Guo,Yanze Wu,Zhuowei Chen, Lang chen,Peng Zhang,Qian HE

NeurIPS 2024（2024）

MS student | Researcher | PhD student

Cited 50|Views255

Abstract

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements (\eg, background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible. Codes and models are available at https://github.com/ToTheBeginning/PuLID

More

Translated text

Key words

diffusion,controllable image generation,image customization

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Try using models to generate summary,it takes about 60s

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Related Papers

Reference papers

Deep Residual Learning for Image Recognition

Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun

2016

被引用268379 | 浏览

Parameter-Efficient Transfer Learning for NLP

Neil Houlsby,Andrei Giurgiu,Stanislaw Jastrzebski, Bruna Morrone,Quentin de laroussilhe,Andrea Gesmundo, Mona Attariyan,Sylvain Gelly

2019

被引用5194 | 浏览

CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

Yuge Huang,Yuhan Wang,Ying Tai,Xiaoming Liu,Pengcheng Shen,Shaoxin Li,Jilin Li,Feiyue Huang

2020

被引用682 | 浏览

Denoising Diffusion Implicit Models

Jiaming Song,Chenlin Meng,Stefano Ermon

2021

被引用8149 | 浏览

Towards Real-World Blind Face Restoration with Generative Facial Prior

Xintao Wang,Yu Li,Honglun Zhang,Ying Shan

2021

被引用597 | 浏览

Learning Transferable Visual Models from Natural Language Supervision

Alec Radford,Jong Wook Kim,Chris Hallacy,Aditya Ramesh,Gabriel Goh,Sandhini Agarwal,Girish Sastry,Amanda Askell,Pamela Mishkin,Jack Clark,Gretchen Krueger,Ilya Sutskever

2021

被引用34405 | 浏览

LoRA: Low-Rank Adaptation of Large Language Models

Edward J Hu,Yelong Shen,Phillip Wallis,Zeyuan Allen-Zhu,Yuanzhi Li,Shean Wang,Lu Wang,Weizhu Chen

2022

被引用13890 | 浏览

Learning Transferable Visual Models From Natural Language Supervision

A. Linear-probe

2021

被引用2135 | 浏览

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models.

Cheng Lu,Yuhao Zhou,Fan Bao,Jianfei Chen,Chongxuan Li,Jun Zhu

2023

被引用155 | 浏览

DiffFace: Diffusion-based Face Swapping with Facial Guidance

Kihong Kim, Yunho Kim,Seokju Cho,Junyoung Seo,Jisu Nam, Kychul Lee,Seungryong Kim,KwangHee Lee

2025

被引用23 | 浏览

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang,Anyi Rao,Maneesh Agrawala

2023

被引用4605 | 浏览

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.

Chong Mou,Xintao Wang,Liangbin Xie,Yanze Wu,Jian Zhang,Zhongang Qi,Ying Shan

2024

被引用1042 | 浏览

EVA-CLIP: Improved Training Techniques for CLIP at Scale.

Quan Sun,Yuxin Fang,Ledell Wu,Xinlong Wang,Yue Cao

2023

被引用103 | 浏览

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

Xuhui Jia,Yang Zhao,Kelvin C. K. Chan,Yandong Li,Han Zhang,Boqing Gong,Tingbo Hou,Huisheng Wang,Yu-Chuan Su

2023

被引用45 | 浏览

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Guangxuan Xiao,Tianwei Yin,William T. Freeman,Frédo Durand,Song Han

2024

被引用235 | 浏览

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face.

Dani Valevski, Danny Lumen,Yossi Matias,Yaniv Leviathan

2023

被引用16 | 浏览

DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation

Zhuowei Chen,Shancheng Fang,Wei Liu,Qian He,Mengqi Huang,Zhendong Mao

2024

被引用32 | 浏览

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell,Zion English,Kyle Lacey,Andreas Blattmann,Tim Dockhorn,Jonas Müller,Joe Penna,Robin Rombach

2024

被引用2287 | 浏览

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Hu Ye,Jun Zhang,Sibo Liu,Xiao Han,Wei Yang

2023

被引用112 | 浏览

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Li Chen,Mengyi Zhao,Yiheng Liu, Mingxu Ding,Yangyang Song,Shizun Wang,Xu Wang,Hao Yang,Jing Liu,Kang Du,Min Zheng

2023

被引用54 | 浏览

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Simian Luo,Yiqin Tan,Longbo Huang,Jian Li,Hang Zhao

2024

被引用68 | 浏览

Adversarial Diffusion Distillation

Axel Sauer,Dominik Lorenz,Andreas Blattmann,Robin Rombach

2024

被引用390 | 浏览

When StyleGAN Meets Stable Diffusion: a ${\mathcal{w}_+}$ Adapter for Personalized Image Generation

Xiaoming Li,Xinyu Hou,Chen Change Loy

2024

被引用11 | 浏览

PhotoMaker: Customizing Realistic Human Photos Via Stacked ID Embedding

Zhen Li,Mingdeng Cao,Xintao Wang,Zhongang Qi,Ming-Ming Cheng,Ying Shan

2024

被引用177 | 浏览

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization

Xu Peng,Junwei Zhu,Boyuan Jiang,Ying Tai,Donghao Luo,Jiangning Zhang,Wei Lin,Taisong Jin,Chengjie Wang,Rongrong Ji

2024

被引用42 | 浏览

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Li Yun Chen,Mengyi Zhao,Yiheng Liu, Mingyue Ding, Yifan Song,Shizun Wang, Qianqian Wang,Yang Hao,Jing Liu, Ke-Lin Du,Min Zheng

2023

被引用2 | 浏览

InstantID: Zero-shot Identity-Preserving Generation in Seconds

Qixun Wang, Xu Bai,Haofan Wang,Zekui Qin, Anthony Chen,Huaxia Li,Xu Tang,Yao Hu

2024

被引用21 | 浏览

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Shanchuan Lin,Anran Wang,Xiao Yang

2024

被引用8 | 浏览

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

Shilong Zhang,Lianghua Huang,Xi Chen,Yifei Zhang,Zhi-Fan Wu,Yutong Feng,Wei Wang,Yujun Shen,Yu Liu,Ping Luo

2024

被引用1 | 浏览

LCM-Lookahead for Encoder-based Text-to-Image Personalization

Rinon Gal, Or Lichter,Elad Richardson,Or Patashnik,Amit Bermano,Gal Chechik,Danny Cohen-Or

2024

被引用23 | 浏览

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

要点：介绍了一种全新的文本到图像生成的ID定制方法PuLID，通过引入对比对齐损失和准确ID损失，提高了ID的保真度和可编辑性。

方法：通过在标准扩散模型上结合Lightning T2I分支，实现了PuLID的ID定制方法，同时最小化对原模型的干扰。

实验：实验证明，PuLID在ID保真度和可编辑性方面表现优秀，并能在ID插入前后尽可能保持图像元素的一致性。数据集名称和结果未提及。

去 AI 文献库对话