LIFT

Abstract

Implicit Neural Representations (INRs) are proving to be a powerful paradigm in unifying task modeling across diverse data domains, offering key advantages such as memory efficiency and resolution independence. Conventional deep learning models are typically modality-dependent, often requiring custom architectures and objectives for different types of signals. However, existing INR frameworks frequently rely on global latent vectors or exhibit computational inefficiencies that limit their broader applicability.

We introduce LIFT, a novel, high-performance framework that addresses these challenges by capturing multiscale information through meta-learning. LIFT leverages multiple parallel localized implicit functions alongside a hierarchical latent generator to produce unified latent representations that span local, intermediate, and global features. This architecture facilitates smooth transitions across local regions, enhancing expressivity while maintaining inference efficiency.

Additionally, we introduce ReLIFT, an enhanced variant of LIFT that incorporates residual connections and expressive frequency encodings. With this straightforward approach, ReLIFT effectively addresses the convergence-capacity gap found in comparable methods, providing an efficient yet powerful solution to improve capacity and speed up convergence.

Empirical results show that LIFT achieves state-of-the-art (SOTA) performance in generative modeling and classification tasks, with notable reductions in computational costs. Moreover, in single-task settings, the streamlined ReLIFT architecture proves effective in signal representations and inverse problem tasks.

Method Overview

LIFT: Experimental Results Across Large Datasets

Performance on CelebA-HQ (64²) and ShapeNet (64³)

Table: Performance comparison of image and voxel reconstruction and generation on CelebA-HQ (64²) and ShapeNet (64³). Each cell is color-coded to indicate the best and second-best performance. Learn. and Inf. refer to learnable and inference parameters, respectively.

Performance on CIFAR-10 (32²)

Table: Image classification performance on CIFAR-10. The table reports the Top-1 accuracy (in %) along with reconstruction performance across different numbers of augmentations. ★ denotes best classification results from Spatial Functa, while ★★ indicates the use of more sophisticated augmentations, including MixUp and CutMix.

Performance on ImageNet-100 (256²)

Table: Image reconstruction performance on the ImageNet-100 dataset.

ReLIFT: Experimental Results in Single-Data Settings

ReLIFT incorporates residual connections and expressive first-layer frequency scaling to mitigate the convergence-capacity gap often seen in INRs. These enhancements enable the model to capture finer details and represent a broader range of frequencies efficiently. The frequency scaling increases the network’s capacity to learn higher-frequency components, while the residual connections adjust layer weights to amplify higher-order harmonics while preserving fundamental components. This design allows ReLIFT to learn a balance between high and low-frequency information with greater efficiency.

(a) ReLIFT

(b) SIREN

Figure: Activation statistics comparison between ReLIFT and SIREN.

Image Representation

Figure: PSNR comparisons of ReLIFT with SOTA models.

Audio Representation

Ground truth

ReLIFT

SIREN

Gauss

ReLU + P.E.

WIRE

FINER

Figure: PSNR and reconstruction error comparisons of ReLIFT with SOTA models.

Shape Representation

Figure: Qualitative comparisons of ReLIFT with SOTA models.

Image Super-resolution

Figure: PSNR and SSIM comparisons of a 4× single image super-resolution between ReLIFT and SOTA models.

Image Denoising

Figure: PSNR and SSIM comparisons between ReLIFT and SOTA models.

Image Inpainting

Figure: PSNR comparison between LIFT and SOTA models on 25% of the pixels in a 572 × 582 × 3 image.

ReLIFT Spectral Bias

Figure: Frequency learning comparison between SIREN and ReLIFT. The x-axis shows training steps, the y-axis indicates frequency, and the color represents relative approximation error.

BibTeX

@misc{,
        author    = {Amirhossein Kazerouni and Soroush Mehraban and Michael Brudno and Babak Taati},
        title     = {LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding},
        eprint   = {2503.15420},
        year      = {2025},
        url={https://arxiv.org/abs/2503.15420},
}

LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding

LIFT enables unified implicit neural representations across diverse tasks by leveraging localized implicit functions and a hierarchical latent generator.