Xiaogang Xu (Nickname: Theo)
|
            ![]() |
Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward
Boosting Fidelity for Pre-Trained-Diffusion-Based Low-Light Image Enhancement via Condition Refinement
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Generative Distribution Distillation
LoViC: Efficient Long Video Generation with Context Compression
Generative Visual Commonsense Answering and Explaining with Generative Scene Graph Constructin
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
From Events to Enhancement: A Survey on Event-Based Imaging Technologies
Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning
Self-supervised Learning for Enhancing Geometrical Modeling in 3D-Aware Generative Adversarial Network
Video Frame Interpolation with Region-Distinguishable Priors from SAM
Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement
General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments
Learnable Feature Patches and Vectors for Boosting Low-light Image Enhancement without External Knowledge
Boosting Image Restoration via Priors from Pre-trained Models
Geometric-Aware Low-Light Image and Video Enhancement via Depth Guidance
Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement
SNR-Aware Low-light Image Enhancement
Low‑light Image Enhancement via Structure Modeling and Guidance
Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition
Deep Parametric 3D Filters for Joint Video Denoising and Illumination Enhancement in Video Super Resolution
Seeing Dynamic Scene in the Dark: A High-Quality Video Dataset with Mechatronic Alignment
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
FashionComposer: Compositional Fashion Image Generations
CFSynthesis: Controllable and free-view 3d human video synthesis
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition
Conditional Temporal Variational AutoEncoder for Action Video Prediction
Hierarchical Image Generation via Transformer-Based Sequential Patch Selection
Text-Guided Human Image Manipulation via Image-Text Shared Space
View Independent Generative Adversarial Network for Novel View Synthesis[Paper] Code
Depth Anything V2
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Parametric Linear Blend Skinning Model for Multiple-Shape 3D Garments
CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
Lighting up NeRF via Unsupervised Decomposition and Enhancement
TriVol: Point Cloud Rendering Via Triple Volumes
Point2Pix: Photo‑Realistic Point Cloud Rendering via Neural Radiance Fields
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
Hawk: Learning to Understand Open-World Video Anomalies
Towards Efficient Large-Scale Language-3D Representation Learning
Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey
DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially Privately
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding
Adversarial Captchas (Applied and deployed on Alibaba's e-commerce platform)
Universal Adaptive Data Augmentation
Densely Annotated Synthetic Images Make Stronger Semantic Segmentation Models
MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning
Dynamic divide-and-conquer adversarial training for robust semantic segmentation
Towards Unified 3D Object Detection via Algorithm and Data Unification
Class Incremental Medical Image Segmentation via Prototype-Guided Calibration and Dual-Aligned Distillation
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
SEE: See Everything Every Time--Adaptive Brightness Adjustment for Broad Light Range Images via Events
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
DiffCamera: Arbitrary Refocusing on Images
PRIME: Prototype-Driven Class Incremental Learning for Medical Image Segmentation
PVDD: A Practical Video Denoising Dataset with Real-World Dynamic Scenes
Geometric-Aware Low-Light Image and Video Enhancement via Depth Guidance
Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios
Diffusion Noise Feature: Accurate and Fast Generated Image Detection
CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models
Learnable Feature Patches and Vectors for Boosting Low-light Image Enhancement without External Knowledge
Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey
Towards Unified 3D Object Detection via Algorithm and Data Unification
FashionComposer: Compositional Fashion Image Generations
CFSynthesis: Controllable and free-view 3d human video synthesis
DiMSOD: A Diffusion-Based Framework for Multi-Modal Salient Object Detection
DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially Privately
Towards Better Adversarial Purification via Adversarial Denoising Diffusion Training
Diving Deep into Regions: Exploiting Regional Information Transformer for Single Image Deraining
Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior
Particle Rendering: Implicitly Aggregating Incident and Outgoing Light Fields for Novel View Synthesis
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding
Parametric Linear Blend Skinning Model for Multiple-Shape 3D Garments
Hawk: Learning to Understand Open-World Video Anomalies
Depth Anything V2
Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement
An Incremental Unified Framework for Small Defect Inspection
Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction
Towards Efficient Large-Scale Language-3D Representation Learning
Generative Active Learning for Long-tailed Instance Segmentation
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
MalGNE: Enhancing the Performance and Efficiency of CFG-based Malware Detector by Graph Node Embedding in Low Dimension Space[Paper] Code
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
UniMODE: Universal Monocular 3D Object Detection
Boosting Image Restoration via Priors from Pre-trained Models
Learning to Remove Wrinkled Transparent Film with Polarized Prior
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Densely Annotated Synthetic Images Make Stronger Semantic Segmentation Models
CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition
Lighting up NeRF via Unsupervised Decomposition and Enhancement
High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation
Low‑light Image Enhancement via Structure Modeling and Guidance
TriVol: Point Cloud Rendering Via Triple Volumes
Point2Pix: Photo‑Realistic Point Cloud Rendering via Neural Radiance Fields
Deep Parametric 3D Filters for Joint Video Denoising and Illumination Enhancement in Video Super Resolution
Conditional Temporal Variational AutoEncoder for Action Video Prediction
Universal Adaptive Data Augmentation
MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning
DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
SNR-Aware Low-light Image Enhancement
Hierarchical Image Generation via Transformer-Based Sequential Patch Selection
Dynamic divide-and-conquer adversarial training for robust semantic segmentation
Seeing Dynamic Scene in the Dark: A High-Quality Video Dataset with Mechatronic Alignment
Text-Guided Human Image Manipulation via Image-Text Shared Space
Adversarial Captchas (Applied and deployed on Alibaba's e-commerce platform)
Self-Supervised 3D Mesh Reconstruction from Single Images
Semantic-Aware Video Color Style Transfer based on Temporal Consistent Sparse Patch Constraint[Paper] Code
Reference-based Video Colorization with Multi-scale Semantic Fusion and Temporal Augmentation[Paper] Code
Domain Adaptive Image-to-image Translation
View Independent Generative Adversarial Network for Novel View Synthesis[Paper] Code
Homomorphic Latent Space Interpolation for Unpaired Image-to-image Translation
Ranking Users in Social Networks with Motif-based PageRank
Ranking Users in Social Networks with Higher-Order Struct
No‑reference stereoscopic image quality assessment based on saliency‑guided binocular feature consolidation