본문 바로가기
On Going/Computer Vision

History of Super Resolution AI

by 에아오요이가야 2024. 7. 17.

Methodologies of Super-Resolution Techniques

Super-resolution technology has evolved using three main methodologies: Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), and Transformers.

1. CNN-based Super-Resolution

1.1. SRCNN (Super-Resolution Convolutional Neural Network): Introduced in 2015, SRCNN was one of the first CNN-based super-resolution models. It has a simple structure that effectively converts low-resolution images to high-resolution images. The model consists of three CNN layers: patch extraction and representation, nonlinear mapping, and reconstruction.

 

1.2. VDSR (Very Deep Super-Resolution): Introduced in 2016, VDSR maintains the structural characteristics of SRCNN but uses a deeper neural network (20 layers) to improve performance. It also incorporates residual learning to achieve faster convergence and higher performance by learning the difference between the input and output images.

 

1.3. MDSR (Multi-scale Deep Super-Resolution): Introduced in 2017, MDSR extends the EDSR (Enhanced Deep Super-Resolution) structure to handle multiple resolutions, utilizing a deeper neural network (64 layers) to maximize performance. EDSR, an integral part of MDSR, focuses on enhancing the depth of the network for better feature extraction and image reconstruction.

 

Conclusion: SRCNN laid the foundation for super-resolution with its simple structure, VDSR improved performance with a deeper network and residual learning, and MDSR maximized performance with an even deeper network and multi-scale capability.

2. GAN-based Super-Resolution

2.1. SRGAN (Super-Resolution Generative Adversarial Network): Introduced in 2017, SRGAN was the first model to apply GAN to super-resolution, significantly contributing to generating realistic high-resolution images. The generator uses residual blocks to convert low-resolution images to high-resolution, and the discriminator employs a GAN loss function to distinguish between real and generated images.

 

2.2. ESRGAN (Enhanced Super-Resolution Generative Adversarial Network): Introduced in 2018, ESRGAN improves on SRGAN by adding dense blocks to the residual blocks in the generator, resulting in Residual-in-Residual Dense Blocks (RRDB). It also employs a relativistic GAN loss function, where the discriminator compares realness between pairs of images rather than judging them individually, enhancing image quality.

 

2.3. Real-ESRGAN (Real-World Enhanced Super-Resolution Generative Adversarial Network): Introduced in 2021, Real-ESRGAN maintains the structural characteristics of ESRGAN but trains on more realistic synthetic data to maximize super-resolution performance in real-world images. It simulates various distortions found in real-world low-resolution images, such as noise, compression artifacts, and blur, during training.

 

Conclusion: SRGAN introduced GAN to super-resolution, ESRGAN improved the network structure and loss function for better performance, and Real-ESRGAN enhanced the model to handle various real-world scenarios.

3. Transformer-based Super-Resolution

3.1. TTSR (Texture Transformer Network for Image Super-Resolution): Introduced in 2020, TTSR is one of the early models that applied the self-attention mechanism of transformers to super-resolution. It utilizes texture transfer to reproduce natural textures in high-resolution images by extracting textures from high-resolution reference images and applying them to low-resolution inputs.

 

3.2. HAT (Hybrid Attention Transformer): Introduced in 2023, HAT combines channel attention and self-attention mechanisms within its transformer-based network. This hybrid attention mechanism allows HAT to effectively emphasize important details and suppress unnecessary information, enhancing the extraction and transfer of critical features in high-resolution images. The model includes Residual Hybrid Attention Groups (RHAG) that integrate multiple Hybrid Attention Blocks (HAB) and Overlapping Cross-Attention Blocks (OCAB).

 

3.3. DRCT (Dense-Residual-Connected Transformer): Introduced in 2024, DRCT uses dense residual connections and shifted window transformers to minimize information loss and extract both shallow and deep features efficiently. It incorporates Residual Deep feature extraction Groups (RDG) and Swin-Dense-Residual-Connected Blocks (SDRCB), utilizing the Swin-Transformer Layer (STL) for long-range dependencies and multi-level spatial information.

 

Conclusion: TTSR emphasizes natural texture reproduction using transformers, HAT effectively reflects important details through a hybrid attention mechanism, and DRCT combines dense residual connections and shifted window transformers for realistic and high-quality image generation.

 

 

EDSR (Enhanced Deep Residual Networks)

  • Model Complexity: High
  • Number of Parameters: Approximately 43M (for EDSR-16)
  • GPU Requirements: High-performance GPU needed (e.g., Titan GTX, RTX 2080)
  • Memory Usage: High (due to deep and wide network architecture)
  • Computational Load (FLOPs): Very high (due to many layers and filters)

Real-ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks)

  • Model Complexity: Medium
  • Number of Parameters: Approximately 16.7M (for RRDB)
  • GPU Requirements: Medium-performance GPU (e.g., GTX 1060, RTX 2060)
  • Memory Usage: Medium (efficient network structure)
  • Computational Load (FLOPs): Medium (efficient block structure)

HAT (Hybrid Attention Transformer)

  • Model Complexity: Very High
  • Number of Parameters: HAT-S (approx. 63M), HAT-M (approx. 90M)
  • GPU Requirements: Very high-performance GPU needed (e.g., RTX 3090, V100)
  • Memory Usage: Very high (due to transformer architecture and hybrid attention mechanism)
  • Computational Load (FLOPs): Very high (due to transformer attention mechanism and many layers)

DRCT (Deep Residual Channel-wise and Spatial Attention Network)

  • Model Complexity: High
  • Number of Parameters: DRCT-S (approx. 16M), DRCT (approx. 25M), DRCT-L (approx. 38M)
  • GPU Requirements: Medium to high-performance GPU (DRCT-S for medium, DRCT-L for high-performance GPU)
  • Memory Usage: High (due to channel and spatial attention mechanisms)
  • Computational Load (FLOPs): High (due to complex attention mechanisms and deep network structure)

댓글