Accepted at CVPR 2026

Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration

A two-stage framework that estimates degradation from a restored face and uses it to guide full-scene restoration.

Amirhossein Kazerouni^1,2,3,4, Maitreya Suin³, Tristan Ty Aumentado-Armstrong³, Sina Honari³, Amanpreet Walia³, Iqbal Mohomed³, Konstantinos G. Derpanis^3,5, Babak Taati^1,2,4, Alex Levinshtein³

¹University of Toronto | ²Vector Institute | ³AI Center-Toronto, Samsung Electronics | ⁴University Health Network | ⁵York University

amirhossein@cs.toronto.ca

Paper (Coming Soon) Code (Coming Soon) arXiv (Coming Soon) Hugging Face

Abstract

Recent advances in image restoration have enabled high-fidelity recovery of faces from degraded inputs using reference-based face restoration models (Ref-FR). However, such methods focus solely on facial regions, neglecting degradation across the full scene, including body and background, which limits practical usability. Meanwhile, full-scene restorers often ignore degradation cues entirely, leading to underdetermined predictions and visual artifacts. In this work, we propose Face2Scene, a two-stage restoration framework that leverages the face as a perceptual oracle to estimate degradation and guide the restoration of the entire image. Given a degraded image and one or more identity references, we first apply a Ref-FR model to reconstruct high-quality facial details. From the restored-degraded face pair, we extract a face-derived degradation code that captures degradation attributes (e.g., noise, blur, compression), which is then transformed into multi-scale degradation-aware tokens. These tokens condition a diffusion model to restore the full scene in a single step, including the body and background. Extensive experiments demonstrate the superior effectiveness of the proposed method compared to state-of-the-art methods.

Method Overview

Face2Scene first applies a reference-based face restoration model to recover identity-faithful facial details from the degraded input. The restored-versus-degraded face pair is then encoded by FaDeX and transformed by MapNet into multi-scale degradation-aware tokens, which condition a one-step diffusion restorer to recover the full image, including face, body, clothing, and background.

1. Stage 1: Ref-FR

Restore a high-quality face from the degraded input using same-identity references.

2. FaDeX + MapNet

Extract a face-derived degradation embedding and map it to multi-scale conditioning tokens.

3. One-Step Diffusion

Condition a diffusion restorer to recover face, body, and background in one step.

Qualitative Results

Face2Scene consistently restores sharper facial identity cues and more coherent scene structure than prior restoration baselines. The gallery below is a curated selection from Figures 11-20 of the paper; additional qualitative comparisons remain available in the full PDF.

Performance Summary

Across the paper and accompanying slides, Face2Scene is the strongest overall method across perceptual, fidelity, and quality metrics, while also producing more consistent restoration of the face, body, and background in side-by-side qualitative comparisons.

Superior Performance Across Details: InScene Synthetic

Curated synthetic examples from Figures 11, 13, and 16. These comparisons evaluate Face2Scene against S3Diff, InvSR, OSEDiff, PASD, SUPIR, and SinSR on the InScene synthetic benchmark.

Face2Scene qualitative comparison on InScene synthetic figure 11 — Figure 11: Full-scene restoration on InScene synthetic samples with stronger face fidelity and scene consistency.

Face2Scene qualitative comparison on InScene synthetic figure 13 — Figure 13: Robust recovery under challenging degradation where scene-wide coherence matters beyond the face crop.

Face2Scene qualitative comparison on InScene synthetic figure 16 — Figure 16: Cleaner textures and more stable restoration across face, clothing, and background regions.

Superior Performance Across Details: Real Validation and Test

Curated real-world examples from Figures 17, 19, and 20. These figures show Face2Scene compared with strong restoration baselines on real validation and real test samples.

Face2Scene qualitative comparison on real validation figure 17 — Figure 17: Real validation comparisons showing improved facial realism and more coherent global restoration.

Face2Scene qualitative comparison on real test figure 19 — Figure 19: Real test samples compared against S3Diff, InvSR, OSEDiff, PASD, ResShift, SUPIR, and DiffBIR.

Face2Scene qualitative comparison on real test figure 20 — Figure 20: Additional real test examples highlighting stronger identity preservation and cleaner scene reconstruction.

Additional qualitative pages are available in Figures 12, 14, 15, and 18 of the paper.

Citation

Update the BibTeX entry when the final metadata is available.

@inproceedings{kazerouni2026face2scene,
  title     = {Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration},
  author    = {Kazerouni, Amirhossein and Suin, Maitreya and Aumentado-Armstrong, Tristan Ty and Honari, Sina and Walia, Amanpreet and Mohomed, Iqbal and Derpanis, Konstantinos G. and Taati, Babak and Levinshtein, Alex},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}