LuxDiT: Lighting Estimation with Video Diffusion Transformer
Abstract
LuxDiT, a video diffusion transformer fine-tuned with low-rank adaptation, generates accurate HDR environment maps from visual input, outperforming existing methods.
Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics. Learning-based approaches are constrained by the scarcity of ground-truth HDR environment maps, which are expensive to capture and limited in diversity. While recent generative models offer strong priors for image synthesis, lighting estimation remains difficult due to its reliance on indirect visual cues, the need to infer global (non-local) context, and the recovery of high-dynamic-range outputs. We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input. Trained on a large synthetic dataset with diverse lighting conditions, our model learns to infer illumination from indirect visual cues and generalizes effectively to real-world scenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas. Our method produces accurate lighting predictions with realistic angular high-frequency details, outperforming existing state-of-the-art techniques in both quantitative and qualitative evaluations.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Spatiotemporally Consistent Indoor Lighting Estimation with Diffusion Priors (2025)
- GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting (2025)
- IDCNet: Guided Video Diffusion for Metric-Consistent RGBD Scene Generation with Precise Camera Control (2025)
- Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models (2025)
- DualMat: PBR Material Estimation via Coherent Dual-Path Diffusion (2025)
- Generative Video Matting (2025)
- Matrix-3D: Omnidirectional Explorable 3D World Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper