Translate this page into:
Utilizing convolutional neural network and gray wolf optimization for image super-resolution
*Corresponding author E-mail address: sangkeum@hanbat.ac.kr (S. Lee)
-
Received: ,
Accepted: ,
Abstract
Image Super-Resolution (ISR) is a complex task that involves the development of high-resolution (HR) images from low-resolution (LR) inputs, posing a fascinating challenge in the realm of image processing. While deep learning models have shown promise in ISR, the presence of artifacts in images generated by these models often necessitates subsequent post-processing for refinement. This study introduces an innovative approach that combines Convolutional Neural Network (CNN) with Gray Wolf Optimization (GWO) to tackle the obstacles encountered in ISR. The proposed model employs a CNN model for the initial estimation of the upscaled image and incorporates a secondary CNN model utilizing dense layers and hybrid pooling to segment the image and identify regions of uniformity. Simultaneously processing information from the segmented image and the magnification approximation matrix using a GWO-based strategy mitigates the detrimental impact of artifacts on the enlarged image. The GWO algorithm is utilized to dynamically adjust the color layer brightness of individual pixels in distinct regions, tailoring the enhancement process to the specific structural characteristics of each texture region. Performance evaluation of the proposed approach on the Set5, Set14, and Urban100 datasets demonstrates its superiority over existing techniques, yielding enhancements in peak-to-signal noise ratio (PSNR) and structural similarity index measure (SSIM) metrics by a minimum of 1% and 0.5%, respectively.
Keywords
Artifact reduction
Convolutional neural networks
Deep learning
Gray wolf optimization
Image reconstruction
Image super-resolution
1. Introduction
Image super-resolution (ISR) is one of the significant applications in machine vision, which has been the focus of many studies in recent years (Lepcha et al., 2023). An ISR system processes a low-resolution (LR) image sample, and a high-resolution (HR) image with the same visual content is generated (Ooi and Ibrahim, 2021).
This process can be performed based on one image or a set of LR images. The purpose of these models is to preserve the visual content of the image with the slightest distortion (Maral, 2022). ISR systems can be helpful in various applications, such as medical image enhancement (Qiu et al., 2023a), surveillance (Chen et al., 2022), video super-resolution (Liu et al., 2022), satellite image processing (Karwowska and Wierzbicki, 2022), etc.
Super-resolution methods based on convolutional neural networks (CNNs) depend on static feature extraction from the LR image and fixed up-sampling techniques (Wang et al., 2021, Sahito et al., 2023, Fang et al., 2020). The reconstruction method produces satisfactory results but results in multiple artifacts that affect both low-frequency and high-frequency image details (Zhang et al., 2018). The fixed up-sampling method generates artifacts because it fails to reproduce the fine details that exist in HR images (Lai et al., 2019). Static feature maps create difficulties for the network to handle distinct image characteristics because they fail to adapt properly to different image areas (Dai et al., 2019). The performance gap, which includes artifacts, becomes a critical issue for medical imaging because it requires precise detail preservation for accurate diagnosis and analysis (Oktay et al., 2018). Research now aims at resolving super-resolution challenges by adopting adaptive up-sampling techniques and dynamic feature learning methods due to the necessity of building better, flexible ISR methods (Liang et al., 2021).
To overcome the limitations of artifacts introduced by fixed sampling and static features, this research adopts a novel ISR method that combines a CNN-based initial estimation with Gray Wolf Optimization (GWO) for localized refinement. Our approach is structured into three key steps: initial image approximation, region segmentation, and localized optimization. For the initial super-resolution image estimation, we employ a CNN architecture utilizing sub-pixel convolution layers, known for their efficiency in up-sampling (Shi et al., 2016). Subsequently, to segment the image into regions of similar texture, we employ a second CNN with an encoder-decoder structure and hybrid pooling. Encoder-decoder architectures have demonstrated strong performance in semantic segmentation tasks, effectively delineating regions of interest (Kohl et al., 2018). Furthermore, hybrid pooling allows for the capture of both local and global features, enhancing segmentation accuracy. Finally, to mitigate artifacts and enhance structural characteristics within each segmented region, we utilize GWO. GWO, a metaheuristic optimization algorithm, has been shown to be effective in various image processing applications, including image enhancement and artifact reduction, due to its ability to dynamically adapt to local image characteristics (Mirjalili et al., 2014, Rajakumar et al., 2023). By applying GWO to localized regions, we can precisely adjust pixel values to minimize artifacts and accentuate fine details, directly addressing the limitations of global, fixed-strategy approaches.
The presented hybrid mechanism has several advantages. The segmentation step enables further optimization within these regions as opposed to the artifacts resolved by the traditional CNN-based approaches. Besides, with the aid of GWO, we can also determine the brightness level in each region, thus enhancing the quality of the image and decreasing the difference between the existing methods and the best reconstruction. The contributions of the current article include the following: Firstly, this article presents a CNN model based on dense layers and hybrid pooling for image segmentation within the ISR framework, based on which the edges of the enlarged image can be reconstructed with higher accuracy. Secondly, in this article, a solution is presented to reduce the destructive effects of artifacts added to magnified images by using a combination of segmentation and optimization techniques, in which the process of improving the magnified image is applied separately to each region.
The rest of this paper is organized according to the following structure: Section 2 presents the details of the proposed method. Section 3 evaluates and discusses the proposed method’s performance. Finally, section 4 summarizes the results and conclusions.
2. Methodology
Most techniques based on deep neural networks (DNNs) for ISR add unwanted artifacts to the reconstructed image. Therefore, the image resulting from the reconstruction of the DNN needs enhancement in its different regions, and this process should be done based on the structural features of the image. To solve this problem, the introduced approach performs image resolution enhancement using the following main steps:
-
Extracting an approximation of image zooming by a CNN
-
Image zoning to identify connected regions using an encoder-decoder neural network
-
Optimizing the magnified image regions by GWO
In the first step, using a CNN, the LR image is enlarged, and in the next step, the image regions are separated by a CNN model based on the encoder-decoder structure. Since the first two steps are applied to the LR image, the first two steps of the proposed method can be implemented in parallel. In the third step, the GWO algorithm is utilized to improve each region extracted from the image separately. The purpose of this step is to eliminate the destructive effects of reconstruction caused by image enlargement. These steps have been shown in Fig. 1.

- Overview of the proposed ISR methodology. The process begins with a LR input image, which is simultaneously processed by two distinct CNNs. CNN₁ estimates an initial SR image, while CNN₂ segments the LR image into distinct regions based on texture and features. Subsequently, the GWO algorithm refines the estimated SR image by selectively enhancing the segmented regions, utilizing information from both CNN outputs. This optimization step aims to minimize artifacts and improve the overall perceptual quality of the final Optimized SR image.
According to Fig. 1, first, an LR image is received. This image is simultaneously used as the input of two DNN models. In the first deep model, an initial estimate of the HR image is generated by a CNN. This DNN uses convolution and sub-pixel layers (Shi et al., 2016) to achieve this objective. The method combines segmentation and optimization techniques to enhance the reconstructed image and remove the artifacts produced by the CNN model in the second step. For this purpose, image regions are extracted using the second DNN, a dense CNN model based on an encoder-decoder structure. Then, GWO is used to improve each region and remove the artifacts.
2.1 Approximate the resolution-enhanced image based on CNN
The method uses a CNN model to generate an initial estimate of the magnified image. The CNN model consumes an LR color image for each sample. It uses a 3x3 convolution layer to transfer the image to a higher space. In the following, four consecutive convolution layers are utilized to extract low-level features of LR images. The performance of each layer in this CNN model can be described as follows:
where represents the output of the convolution layer l in the CNN model. Also, and represent this layer"s weight and bias values, respectively. The max operator is also applied as an activation function to the output of each convolution layer. After extracting the feature values by the fifth convolution layer, a sampling component is used to enlarge the low-level features. This sampling component can be described as follows:
where represents the sampling component, which consists of two convolution layers and one sub-pixel layer (Shi et al., 2016). The output of represents the reconstructed features of the HR image, and the resulting reconstructed image is used as the input for the next step.
2.2 Segmentation of the magnified image
At the same time as creating the zoom approximation image, the method uses a CNN model based on a dense layer"s encoder-decoder architecture (Yuan et al., 2019) to separate the regions of the image. This step improves the quality of the enlarged image by removing unwanted artifacts. For this purpose, the reconstructed image is first divided into its constituent regions. In the next step, each area in the approximate image can be improved separately based on these regions.
The basic CNN model used in the proposed method for image segmentation consists of two coding and decoding parts, which end with a SoftMax layer. Each of the coding and decoding parts consists of three dense layers. The count of layers in both mentioned parts is the same, but the configuration of these layers is not symmetrical. Each dense layer contains three consecutive layers consisting of convolution, normalization, and activation operators. The structure of each dense layer has been shown in Fig. 2.

- The structure of each dense layer in the basic CNN model for segmentation.
According to Fig. 2, in a dense layer, the input data is applied to all internal convolution layers, and the output of each layer is transferred to the subsequent layers. The activation function is determined in each inner layer of the ReLU type. Thus, for each of the internal layers in the dense layer, if the input data is s, then the output of this layer can be described as follows:
Considering the described structure of dense layers, the basic CNN pattern in the proposed method has a structure according to Fig. 3.

- The basic CNN pattern used in the proposed method for the segmentation phase.
According to Fig. 3, the proposed CNN model for the segmentation phase includes six dense layers, and the structure of each of these layers is as shown in Fig. 2. In addition, this network contains four pooling layers as transfer functions and a SoftMax layer to determine the segmentation output. Adjustable hyperparameters in each convolution layer of dense layers, including filter size, length, and width, are configured using the BayesOpt tool (Martinez-Cantin, 2014). These three parameters can also be adjusted for the network"s last layer. On the other hand, it was found that using the hybrid pooling layer in this architecture can lead to the generality of the segmented model and prevent overfitting in the segmentation process. Because the popular pooling layers—average and max pooling, for example—have drawbacks of their own. Maximum layers, for instance, cause overfitting, while average layers combined with ReLU activations might result in sparse feature maps. In order to address the flaws in both of these algorithms, the hybrid pooling layer uses an adjustable variable, such as p, enabling the heterogeneous mixture of two pooling functions. The following is the formulation of the hybrid pooling layer function (Tong and Tanaka, 2019):
where and represent the result of max pooling and average pooling in different steps.
2.3 Optimizing the reconstructed image using GWO
To refine the initial super-resolution image and reduce artifacts, we employ GWO to adjust pixel brightness intensities within the segmented image regions. GWO is chosen for its proven effectiveness in image optimization tasks, demonstrating adaptability to local image characteristics (Rajakumar et al., 2023). In this step, GWO tries to generate a high-quality image by changing the brightness intensities in the separated regions of the image. The length of each solution vector is equal to the number of areas determined for the image. The range of changes of each region in the reconstructed image is determined to be equal to [a, b]. The GWO algorithm optimizes a solution vector where each element corresponds to a brightness adjustment for a specific image region.
It should be noted that during this process, if the value of a pixel becomes more significant than 255 after being changed by the negative optimization parameter, the values of these pixels are considered equal to zero or 255, respectively. Then, the non-local means method (NLM) is used to measure the degree of fitness. NLM is based on the theory that there are many repeating patterns in natural images. Assume that a pixel can be estimated by a weighted combination of neighboring pixels (Li et al., 2021):
Where is the estimate of the ith pixel, which is displayed as . Also, represents the set of pixels in the neighborhood of . The weight indicates the similarity between the neighborhood of the ith pixel and the corresponding neighborhood of the jth pixel, and is calculated as follows (Li et al., 2021):
where and are column vectors formed by neighboring pixels i and j. Since the NLM method finds repeated patterns in the image, The amount of visual information presented in the image can be obtained from its calculation. That is, after applying the NLM method to the image, the high variance of the obtained values indicates more visual information in the image. Therefore, the fit of the solution vector whose numbers are applied to the regions of the approximation image is calculated using the variance of the values obtained from the NLM method. In other words, first, the values in the solution vector are applied to each region of the reconstructed image. Then, the NLM method is applied to the resulting image, and the fit is determined based on the resulting variance value. Considering the described structure for the solution vector and how to evaluate the fit, the mechanism of the GWO algorithm to improve image regions is as follows.
The GWO is a method of optimization that draws inspiration from the way a wolf pack behaves collectively when hunting prey. The strategy shows the other members of the wolf pack as a collection of ω and determines the best answers in the top three rankings as α, β, and δ, respectively. Like the members α, β, and δ, every member of the set ω looks for prey in the same way, and when it is discovered, surrounds it. This procedure is represented in GWO using the relationship (7) (Li et al., 2021):
where t is the number of algorithm iterations and denotes the prey"s position. The location of the wolf agent in the subsequent round is indicated by the parameter . Vectors A and D are also computed in the manner shown below (Li et al., 2021):
In the above relationships, and are random parameters in the interval [0,+1]. The vector a"s items get smaller while the optimization method runs, going from two to zero. This parameter is updated using the following relationship (Li et al., 2021):
Each individual in the population follows individual α and occasionally follows individuals β and δ as they hunt, with G denoting the total iterations in GWO. A similar procedure is also employed by the GWO algorithm"s core model. It is feasible, therefore, to produce new solutions by employing a larger collection of suitable solution vectors that were found during earlier iterations. The prior best solutions are kept in a list S according to the suggested technique. The likelihood of choosing every individual in S to determine the location of the wolves is also ascertained using a vector Z in addition to this set. Using these collections in the suggested strategy raises the possibility of locating the global optimum by exploring some of the best past solutions. In this example, the first 3 responses in S are examined throughout each iteration cycle of the optimization method. Assume that a solution would be absent in S, then its fitness would be smaller than the collection S"s average fitness. Then, the solution vector is appended to the set S. The expectation vector of Z is updated in the following manner once a new member has been added to the set S:
where indicates the fit of individual i in S. Additionally, |S| denotes the total number of individuals in the set S. Fifty percent of the gray wolves in the population decide their position during each iteration according to the leaders (α, β, and δ); Also, the location of other individuals is decided using individuals in S (using roulette wheel selection. This behavior enhances the exploration of the problem in GWO. The termination conditions in implementing GWO have been defined as either reaching the iterations to the predefined threshold T, or the fitness of the best observation cannot improve after m consecutive iterations.
3. Implementation and results
The presented algorithm was coded using MATLAB 2020a. Three image datasets, Set5, Set14, and Urban100, were used to check the performance of the proposed method (Agustsson and Timofte, 2017). The two datasets, Set5 and Set14, contain 5 and 14 images with zooming factors of 2, 3, and 4, respectively. These sets include the most commonly used images in the field of image processing. The color systems of all these images are RGB, and the dimensions and scale ratios of these images are different. The samples of these two data sets can help measure the generality of applying the image enlargement model due to their non-iterative and complex patterns.
On the other hand, the Urban100 dataset includes 100 color image samples with an RGB system in different dimensions and scales. Most of the samples of this set contain textures with regular patterns, based on which the model"s accuracy can be intuitively displayed by zooming in on regular patterns. Samples of these three data sets have been given in Fig. 4.

- Example image samples from the three benchmark datasets used in this study: Set5, Set14, and Urban100. (Left column): images from Set5, which contains a limited number of HR natural pictures. (Middle column): images from Set14, which includes a broad selection of natural scenes. (Right column): images from the Urban100 dataset that were specifically designed to measure super-resolution capabilities on urban areas with complicated structures and high-definition details.
The evaluation of the effectiveness of the proposed method in enhancing the resolution of images has been done using the following measures:
Root Mean Squared Error (RMSE): This measure shows the root of the mean squared difference between the reconstructed and original images" pixels. RMSE represents the mean squared error of the changes resulting from zooming and is calculated by the equation (12) (Qiu et al., 2023b).
(12)
where N is the number of pixels in the reconstructed image, pi is the value of the ith pixel in the reconstructed image, and so is the value of this pixel in the original image. The objective is to minimize RMSE in the reconstructed image.
Mean absolute error (MAE): By using this measure, it is possible to show how much the pixel values of the reconstructed image differ from the original image. MAE is calculated using the following equation (Qiu et al., 2023):
(13)
In this measure, the objective is to minimize the MAE for the reconstructed image.
-
Peak signal-to-noise ratio (PSNR): This measure indicates the ratio between the maximum possible signal power and the distorting noise power. Because most signals have a wide range and dynamics, PSNR is expressed on a logarithmic scale. The PSNR measure is calculated by the following equation (Qiu et al., 2023):
(14)
MSE is the mean squared error, calculated by multiplying the squared power of RMSE in Eq. (12). Image resolution enhancement algorithms aim to achieve a higher PSNR in reconstructed images.
-
Structural similarity index measure (SSIM): This measure describes the structural similarity between the reconstructed and original images. This measure represents the perceptual quality of the reconstructed image compared to the original image and is calculated as follows (Qiu et al., 2023):
(15)
where represents the square root of the variance of the original image Q, and specifies the square root of the variance of the enlarged image Q0. Also, and represent the average brightness in the original and resulting images, respectively. represents the square root of the correlation of images Q and Q0. Finally, and are the similarity index constants. In these constants, k1, k2, and L values are considered equal to 0.01, 0.03, and 255, respectively. This measure is described as an actual number in the interval [0,+1], and the objective is to maximize this measure in a magnified image.
In Fig. 5, the images magnified by the proposed method and other methods are displayed for some samples of the Set5 set. In the upper row of Fig. 5, the result of increasing the image"s resolution by each of the algorithms is displayed. Also, the bottom line shows the difference between the enlarged and original images. This image is obtained by subtracting individual pixels of the resulting image from the original image. The points displayed in white in these images indicate differences in the pixels of that range, while the black ranges indicate the absence of differences between the pixels located in that region.

- Magnified images by our method and other methods for Set5 samples.
Also, Figs. 6 and 7 display the results of increasing the resolution for some images of Set14 and Urban100.

- Magnified images by our method and other methods for Set14 samples.

- Magnified images by our method and other methods for samples of the Urban100 se.
Based on the results shown in Figs. 5-7, the images obtained by our method have fewer differences from the original images. As a result, the visual quality of the outputs obtained by t our method for the tested images is higher than that of the compared methods. Examining the results shows that most of the methods based on deep learning have errors in the reconstruction of regions with non-uniform textures. This error is more visible in the edges of partial regions. This is because these models use a predefined sampling approach, which is designed based on the bicubic interpolation technique and leads to unwanted artifacts in the resulting image. However, our method can perform this process more accurately using an additional processing step. In this method, an attempt has been made to reduce the effect of these artifacts using the GWO algorithm. Fig. 8 shows the impact of this process on a sample image. In Fig. 8a, the zoomed approximation image obtained from the first step of the introduced mechanism is displayed. As the zoomed region of this image shows, some regions of the approximation image, like other models, contain unwanted changes during zooming. On the other hand, the result of improving each region by GWO in Fig. 8b shows the effectiveness of this approach in reducing the destructive effect of these unwanted changes. Although using the optimization process to improve each region of the image can lead to an increase in the processing load of the system, its significant effects on improving the quality of the result make it possible to ignore this processing load.

- An example of the effect of the region optimization step in our method (a) the enlarged image before the region optimization and (b) after the region optimization.
Table 1 gives the values of RMSE, MAE, PSNR, and SSIM to increase the resolution of the images used in these experiments by the method we used. In this experiment, the zooming factor parameter is considered equal to 2. Also, Table 2 gives these results for 4x zooming of the images. These Tables compare the results obtained by this method with those of previous methods.
| Urban 100 | Set 14 | Set 5 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bicubic | Ref. Sahito et al., 2023 | Ref. Fang et al., 2020 | Proposed | Bicubic | Ref. Sahito et al., 2023 | Ref. Fang et al., 2020 | Proposed | Bicubic | Ref. Sahito et al., 2023 | Ref. Fang et al., 2020 | Proposed | |
| RMSE | 9.0251 | 0.2411 | 0.2520 | 0.230 | 5.963 | 0.194 | 0.215 | 0.181 | 3.961 | 0.068 | 0.083 | 0.061 |
| MAE | 14.641 | 5.998 | 6.136 | 5.871 | 9.912 | 5.362 | 5.616 | 5.187 | 6.749 | 3.125 | 3.550 | 2.959 |
| PSNR | 25.33 | 32.72 | 32.52 | 32.89 | 28.762 | 33.737 | 33.338 | 34.037 | 32.335 | 38.468 | 37.226 | 38.974 |
| SSIM | 0.8341 | 0.9675 | 0.9660 | 0.9690 | 0.8577 | 0.9677 | 0.9643 | 0.9696 | 0.9217 | 0.9879 | 0.9849 | 0.9894 |
| Urban 100 | Set 14 | Set 5 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bicubic | Ref. Sahito et al., 2023 | Ref. Fang et al., 2020 | Proposed | Bicubic | Ref. Sahito et al., 2023 | Ref. Fang et al., 2020 | Proposed | Bicubic | Ref. Sahito et al., 2023 | Ref. Fang et al., 2020 | Proposed | |
| RMSE | 13.347 | 0.919 | 0.9317 | 0.909 | 9.771 | 0.544 | 0.607 | 0.526 | 7.546 | 0.276 | 0.289 | 0.259 |
| MAE | 22.513 | 11.808 | 11.892 | 11.748 | 15.923 | 8.909 | 9.426 | 8.790 | 12.405 | 6.397 | 6.513 | 6.158 |
| PSNR | 21.627 | 26.826 | 26.761 | 26.866 | 24.552 | 29.361 | 28.838 | 29.451 | 27.106 | 32.182 | 32.059 | 32.549 |
| SSIM | 0.6492 | 0.8733 | 0.8725 | 0.8742 | 0.6815 | 0.9143 | 0.9043 | 0.9165 | 0.7916 | 0.9527 | 0.9494 | 0.9541 |
As the results obtained in Tables 1 and 2 show, by using the proposed method, the image resolution can be increased in such a way that, in addition to reducing the error rate, PSNR, and structural similarity can be improved. Therefore, it can be concluded that the images generated by the proposed method have less noise than the output of the compared algorithms. This improvement in the introduced approach can be the result of utilizing GWO to enhance image resolution.
An image resolution enhancement algorithm should be able to maintain its efficiency in different conditions and generate an acceptable output for different values of the zooming factor. To check this efficiency aspect in the proposed method, the quality of the outputs generated for different values of the zooming factor is investigated. Also, the results obtained by the proposed method in these experiments are compared with previous methods. In this experiment, the zooming factor of the image is changed in the interval of [+2, +4], and the RMSE, MAE, PSNR, and SSIM are calculated for these changes. It should be noted that the results obtained in this section represent the mean values obtained from the testing of the images of three sets. Set5, Set14, and Urban100.
Fig. 9a and 9b shows RMSE and MAE graphs for the changes in the zooming factor of the images, respectively. Also, in Fig. 10a and 10b, the graphs of PSNR and SSIM are shown, respectively, for different zooming factors of the images.

- Error analysis for changes in zooming factor (a) RMSE and (b) MAE.

- Accuracy check for changes in (a) PSNR and (b) SSIM zooming factor.
The results obtained in Figs. 9 and 10 show that as the zooming factor increases, the error values increase, and in contrast, PSNR and SSIM decrease. This is because increasing the zooming factor increases the complexity of the problem and the probability of error in enhancing the resolution. Nevertheless, the proposed method can perform the resolution enhancement operation for different zooming factors on average with less error than the compared methods.
The GWO algorithm in the proposed method improves the regions in the obtained image individually. This process optimizes the integrated areas of the output image. The use of an additional optimization step by GWO in the proposed method makes the outputs generated by the proposed method of higher quality and fewer errors than the compared methods.
3.1 Discussion, limitations, and future directions
The results obtained in this work show that the method used for image resolution enhancement provides superior results. The method outperforms the other techniques based on the RMSE, MAE, PSNR, and SSIM metrics over the benchmark datasets Set5, Set14, and Urban100 (Tables 1 and 2). This enhancement can be attributed to the fact that the GWO optimization algorithm enhances the removal of artifacts and increases the structural resemblance between the reconstructed image and the original image, as shown in Fig. 8. In terms of the reconstruction quality, our method is comparable to the deep learning-based methods reported in the literature review (e.g., Sahito et al., 2023, Fang et al., 2020). At the same time, it does not need extensive training data and high computational power.
However, certain limitations should be addressed in future work. As already noted in the results section, for higher zooming factors (Figs. 9 and 10), the introduced approach may lead to higher error and lower PSNR and SSIM. This implies that the method could have issues with large-scale images to some extent. However, using the GWO optimization step, which positively impacts the image quality, also increases the amount of time needed for the processing.
The following are the future research directions that will be aimed at addressing the above limitations. First, changes can be made to the fundamental image enhancement algorithm at higher zooming factors. Further, it would be helpful to explore other optimization methods that could have a faster execution time. Lastly, the applicability of our method to different types of images, rather than those used in this work, such as medical or satellite images, would be further work.
4. Conclusion
In this paper, a novel model was presented for ISR by combining deep learning and optimization techniques. The method performs this action in three steps: "image approximation,” "segmentation,” and "optimization.” The purpose of the first step is to provide an initial approximation of the magnified image, which a CNN does with a sub-pixel layer. The objective of the second and third steps is to improve the image obtained in the first step, in which, first, a CNN model based on dense layers and hybrid pooling is used to segment the image and then improve the features. Each region in the image is approximated using the GWO algorithm. Investigations showed that enhancing the magnified image regions by this method can reduce RMSE by 4.19%. As a result, the use of an additional optimization step for the post-processing of magnified images by CNN models makes it possible to generate outputs with higher quality and fewer errors. The performance of the method was evaluated based on the color images of three sets, Set5, Set14, and Urban100, and the results were compared with previous works. The results showed that our solution enlarges the images of these sets 4 times with MAE = 8.89 and RMSE = 0.5646, which has a reduction of at least 2.32 and 1.88 percent, respectively, compared to the previous methods. In addition, the model developed in this work can improve the PSNR and SSIM measures by at least 1 percent and 0.5 percent, in more accurate image zooming compared to the other methods, and these results confirm the effectiveness of the techniques used in this method.
Acknowledgment
The authors would like to acknowledge Deanship of Graduate Studies and Scientific Research, Taif University for funding this work.
CRediT authorship contribution statement
Haoyu Yang: Writing- review and editing, writing - original draft, Conceptualization, Project administration, Supervision. Entesar Gemeay: Review and editing, Visualization, writing - original draft, Methodology. Mohamad A. Alawad: Data curation, Validation, Writing- review and editing. Mohamed Alkaoud: Review and editing, Software. Sangkeum Lee: writing - original draft, Software, Review and editing, Methodology. Shaimaa Ahmed Elsaid: Writing- review and editing, Validation.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data availability
All data generated or analyzed during this study are included in this published article.
Declaration of Generative AI and AI-assisted technologies in the writing process
The authors confirm that there was no use of Artificial Intelligence (AI)-Assisted Technology for assisting in the writing or editing of the manuscript and no images were manipulated using AI.
References
- Real-world single image super-resolution: A brief review. Information Fusion. 2022;79:124-45. https://doi.org/10.1016/j.inffus.(2021).09.005
- [Google Scholar]
- Second-order attention network for single image super-resolution. (2019) IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Long Beach, CA, USA, pp. 11057-11066. https://doi.org/10.1109/cvpr.(2019).01132
- Soft-edge assisted network for single image super-resolution. IEEE Trans. Image Process.. 2020;29:4656-4668. https://doi.org/10.1109/TIP.(2020).2973769
- [Google Scholar]
- Using super-resolution algorithms for small satellite imagery: A systematic review. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing. 2022;15:3292-3312. https://doi.org/10.1109/jstars.(2022).3167646
- [Google Scholar]
- A probabilistic U-net for segmentation of ambiguous images. Adv. Neural. Inf. Process Syst. arXiv:1806.05034. https://doi.org/10.48550/arXiv.1806.05034
- Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell.. 2019;41:2599-2613. https://doi.org/10.1109/TPAMI.(2018).2865304
- [Google Scholar]
- Image super-resolution: A comprehensive review, recent trends, challenges and applications. Information Fusion. 2023;91:230-260. https://doi.org/10.1016/j.inffus.(2022).10.007
- [Google Scholar]
- An improved gray wolf optimization algorithm to solve engineering problems. Sustainability. 2021;13:3208. https://doi.org/10.3390/su13063208
- [Google Scholar]
- SwinIR: Image restoration using swin transformer. (2021) IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, pp. 1833-1844. https://doi.org/10.1109/iccvw54120.(2021).00210
- Video super-resolution based on deep learning: A comprehensive survey. Artif. Intell. Rev.. 2022;55:5981-6035. https://doi.org/10.1007/s10462-022-10147-y
- [Google Scholar]
- Single image super-resolution methods: A Survey. arXiv preprint arXiv:2202.11763, https://arxiv.org/abs/2202.11763
- BayesOpt: A Bayesian optimization library for nonlinear optimization, experimental design and bandits. J. Mach. Learn Res.. 2014;15:3735-3739. https://www.jmlr.org/papers/volume15/martinezcantin14a/martinezcantin14a.pdf
- [Google Scholar]
- Grey Wolf optimizer. Advances in Engineering Software. 2014;69:46-61. https://doi.org/10.1016/j.advengsoft.(2013).12.007
- [Google Scholar]
- Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999. https://arxiv.org/abs/1804.03999
- Deep learning algorithms for single image super-resolution: A systematic review. Electronics. 2021;10:867. https://doi.org/10.3390/electronics10070867
- [Google Scholar]
- Medical image super-resolution reconstruction algorithms based on deep learning: A survey. Comput. Methods Programs Biomed.. 2023a;238:107590. https://doi.org/10.1016/j.cmpb.(2023).107590
- [Google Scholar]
- SC-NAFSSR: Perceptual-oriented stereo image super-resolution using stereo consistency guided NAFSSR. (2023) IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, pp. 1426-35. https://doi.org/10.1109/cvprw59228.(2023).00147
- Gray wolf optimization and image enhancement with NLM algorithm for multimodal medical fusion imaging system. Biomed Signal Process Control. 2023;85:104950. https://doi.org/10.1016/j.bspc.(2023).104950
- [Google Scholar]
- Transpose convolution based model for super-resolution image reconstruction. Appl. Intell.. 2023;53:10574-10584. https://doi.org/10.1007/s10489-022-03745-4
- [Google Scholar]
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. (2016) IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Las Vegas, NV, USA, pp. 1874-1883. https://doi.org/10.1109/cvpr.(2016).207
- Hybrid pooling for enhancement of generalization ability in deep convolutional neural networks. Neurocomputing. 2019;333:76-85. https://doi.org/10.1016/j.neucom.(2018).12.036
- [Google Scholar]
- Deep learning for image super-resolution: A survey. IEEE Trans. Pattern Anal. Mach. Intell.. 2021;43:3365-3387. https://doi.org/10.1109/TPAMI.(2020).2982166
- [Google Scholar]
- Prostate segmentation with encoder-decoder densely connected convolutional network (Ed-Densenet). (2019) IEEE 16th International Symposium on Biomedical Imaging (ISBI) Venice, Italy, pp. 434-437. https://doi.org/10.1109/isbi.(2019).8759498
- Image super-resolution using very deep residual channel attention networks. In: Lecture Notes in Computer Science, Computer Vision – ECCV. Cham: Springer International Publishing; 2018. p. :294-310.
- [Google Scholar]
