✨ ICCV 2025 ✨
Implicit Neural Representations (INRs) are proving to be a powerful paradigm in unifying task modeling across diverse data domains, offering key advantages such as memory efficiency and resolution independence. Conventional deep learning models are typically modality-dependent, often requiring custom architectures and objectives for different types of signals. However, existing INR frameworks frequently rely on global latent vectors or exhibit computational inefficiencies that limit their broader applicability.
We introduce LIFT, a novel, high-performance framework that addresses these challenges by capturing multiscale information through meta-learning. LIFT leverages multiple parallel localized implicit functions alongside a hierarchical latent generator to produce unified latent representations that span local, intermediate, and global features. This architecture facilitates smooth transitions across local regions, enhancing expressivity while maintaining inference efficiency.
Additionally, we introduce ReLIFT, an enhanced variant of LIFT that incorporates residual connections and expressive frequency encodings. With this straightforward approach, ReLIFT effectively addresses the convergence-capacity gap found in comparable methods, providing an efficient yet powerful solution to improve capacity and speed up convergence.
Empirical results show that LIFT achieves state-of-the-art (SOTA) performance in generative modeling and classification tasks, with notable reductions in computational costs. Moreover, in single-task settings, the streamlined ReLIFT architecture proves effective in signal representations and inverse problem tasks.
Table: Performance comparison of image and voxel reconstruction and generation on CelebA-HQ (64²) and ShapeNet (64³). Each cell is color-coded to indicate the best and second-best performance. Learn. and Inf. refer to learnable and inference parameters, respectively.
Table: Image classification performance on CIFAR-10. The table reports the Top-1 accuracy (in %) along with reconstruction performance across different numbers of augmentations. ★ denotes best classification results from Spatial Functa, while ★★ indicates the use of more sophisticated augmentations, including MixUp and CutMix.
Table: Image reconstruction performance on the ImageNet-100 dataset.
ReLIFT incorporates residual connections and expressive first-layer frequency scaling to mitigate the convergence-capacity gap often seen in INRs. These enhancements enable the model to capture finer details and represent a broader range of frequencies efficiently. The frequency scaling increases the network’s capacity to learn higher-frequency components, while the residual connections adjust layer weights to amplify higher-order harmonics while preserving fundamental components. This design allows ReLIFT to learn a balance between high and low-frequency information with greater efficiency.
(a) ReLIFT
(b) SIREN
Figure: Activation statistics comparison between ReLIFT and SIREN.
Figure: PSNR comparisons of ReLIFT with SOTA models.
Figure: PSNR and reconstruction error comparisons of ReLIFT with SOTA models.
Figure: Qualitative comparisons of ReLIFT with SOTA models.
Figure: PSNR and SSIM comparisons of a 4× single image super-resolution between ReLIFT and SOTA models.
Figure: PSNR and SSIM comparisons between ReLIFT and SOTA models.
Figure: PSNR comparison between LIFT and SOTA models on 25% of the pixels in a 572 × 582 × 3 image.
Figure: Frequency learning comparison between SIREN and ReLIFT. The x-axis shows training steps, the y-axis indicates frequency, and the color represents relative approximation error.
@misc{,
author = {Amirhossein Kazerouni and Soroush Mehraban and Michael Brudno and Babak Taati},
title = {LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding},
eprint = {2503.15420},
year = {2025},
url={https://arxiv.org/abs/2503.15420},
}