ArticlesAll Issue
ArticlesA Multi-Scale U-Shaped Attention Network-Based GAN Method for Single Image Dehazing
• Liquan Zhao1,*, Yupeng Zhang1, and Ying Cui2

Human-centric Computing and Information Sciences volume 11, Article number: 38 (2021)
https://doi.org/10.22967/HCIS.2021.11.038

Abstract

Image dehazing can be considered as a preprocessing step in high-level vision tasks. Hazy images directly affect automatic driving and traffic monitoring, etc. To improve the quality of dehazed images, an end-to-end dehazing network based on generative adversarial network is designed. In generator network, a U-shaped network structure is designed to extract more features at multiple scales instead of at single scales, and a skip connection is used to connect the shallow features with deep features to reduce feature information loss. Besides, a residual network module composed of attention module is designed to replace convolution module, to extract more useful information, and four cascaded dilated convolution modules are also designed to increase the receptive field at the lowest level of the network. In discriminator network, to make it pay more attention to the details of the image, a multi-scale discriminator is designed to replace the single-scale discriminator used in traditional discriminator network. The normal GAN only uses adversarial loss function as loss function. To further improve dehazing performance, it also proposes a new loss function, which incorporates adversarial loss function, multi-scale pixel loss function and feature loss function. The simulation results presented are based on the RESIDE dataset along with our own transmission equipment dataset. The simulations show that our proposed method achieves better dehazing performance than the CycleGAN, AOD-Net, GCANet, and FFA-Net methods for both synthetic images with haze and real images with haze.

Keywords

Image Dehazing, Generative Adversarial Networks, U-Shaped Network

Introduction

Image quality plays an important role in transmission equipment monitoring, satellite remote sensing monitoring, highway visual monitoring, and unmanned aerial vehicle inspecting, etc. [1, 2]. The quality of images acquired in outdoor environments depends to a large extent on the meteorological conditions. Images produced in hazy weather conditions are often subject to blurring, loss of fine details, image contrast decrease, and color distortion. These defects are impediments to technical objectives such as target detection, tracking and visualization [36]. For example, if an online transmission equipment monitoring system produces images during hazy weather, it can be difficult for engineers to detect faults along the transmission line, or in the insulator, clamp, etc., by means of an artificial intelligence system or human eye observation. Image dehazing aims to render the image a closer representation of a haze-free image by removing the haze effects from the corrupted image. This can be considered as a preprocessing step in high-level vision tasks.
Many dehazing methods [714], which involve estimating the global atmospheric light and the medium transmission map, have been proposed. Each of these requires some prior information or preconceptions. However, prior information might not easily be obtainable during real-world application, and prior assumptions might not accurately reflect real-world conditions. For these reasons, the scope for real-world application of the above dehazing methods is limited. With the development of artificial intelligence, deep learning methods have been tested in haze removal, such as AOD-Net, GCANet, etc. Compared with the traditional dehazing methods, methods based on deep learning directly regress the intermediate transmission map or the final haze-free image.
A single image dehazing method based on generative adversarial networks (GANs) has been proposed in an effort to further improve dehazing performance [15, 16]. However, single scale image hazing can only learn limited image information, so the recovered image might exhibit greater distortion. To solve this problem, a U-shaped network structure has been proposed, which can extract features at multiple scales. The left side of U-shape network is used to extract features [17]. Different features are extracted at different resolutions via downsampling. The high-resolution stage extracts fine features, and the low-resolution stage extracts rough features, such as the global structure of the image. On the right side of U-shape network is the feature fusion part. Therefore, the structure can learn more image information at multiple scales via both downsampling and upsampling. In the designed U-shaped network structure, skip connection is also used to connect the shallow features with deep features to avoid loss of information. We also use four cascaded dilated convolution modules at the lowest scale of the network, which can increase the receptive field without increasing the number of parameters, so as to better extract the global information of the image. In an image, some targets contain more important information than others. For example, in transmission equipment monitoring, the insulator, transmission line and clamp contain information of greater importance than, for example, the sky. In haze removal, the definition of these targets should be a priority. Therefore, we decided to introduce the channel attention module to the network, which, unlike previous networks, does not treat each feature equally. This enhanced network focuses more on the features with high information content and ignores irrelevant features, thus improving the performance of the network. In order to avoid gradient explosion and network degradation, we also introduce a residual structure into our network. Further, we also propose a new loss function on the basis of adversarial loss for improving dehazing performance.
The main contributions of this paper are as follows:

• 1. We designed a new generator with a U-shaped network structure. The traditional method only extracts features at a single scale. Our new network can extract features at multiple scales. It can obtain more features, and both local and global information from an image at the same time. A skip connection method is also proposed to connect the shallow features with deep features so as to reduce information loss in the U-shaped network structure. Compared with previous methods, it has great advantages in extracting more image details than other methods. In our simulation experiment, our method has achieved the best dehazing effect in synthesis and real-word scenes.
• 2. In the U-shaped network structure we designed, we propose a residual module which consists of cascaded attention modules and residual connection block, and use the residual module to replace the convolution module that widely used in network. Compared with normal convolution module, the new residual module can make the network pay more attention on the channel information which is much more important. Besides, local residual connection in attention module also allows the less important information such as thin haze region and low frequency to be bypassed, let main network architecture focus on more effective information. Residual connection block in residual module solves the gradient explosion problem with the deepening of the network and further improves the capacity of the network.
• 3. To further improve the dehazing performance and the stability of training, we propose multi-scale pixel loss and feature loss, and use the two proposed loss function and adversarial loss that used in normal GAN to construct new loss function. Compared with original loss function, the method based on new loss function has better dehazing performance than the method based on adversarial loss. Inthe end, we performed ablation analysis of the proposed loss function to verify its effectiveness.
In this section we have outlined the theoretical background of haze removal and our contributions to the field. In Section 2, we shall review related work on haze removal. In Section 3, we explain our proposed method in greater detail. In Section 4, we illustrate and discuss our experimental results. In Section 5, we provide a summary on our experimental work and its implications.

Related Work

Dehazing methods can be divided into two broad categories: traditional based (a priori) methods, and modern learning-based methods. The methods under the former banner remove haze effects by estimating the transmission map and the global atmospheric light. The dehazing method proposed by Fattal [7] is based on the assumption that the surface shading and transmission functions are, statistically, locally uncorrelated; the transmission estimation is constructed on this assumption. However, it is only suitable for images with low to moderate haze. He et al. [8] proposed the DCP method, which is a dark channel-based statistical method for outdoor haze-free images. It directly estimates the thickness of the haze in the haze image based on the low intensity of some pixels on at least one channel. It achieves an excellent dehazing effect; however, when the objects in the scene share a high degree of similarity with atmospheric light, the dark channel becomes invalid. Tarel and Hautiere [9] proposed a new algorithm based on median filter (the fast visibility restoration [FVR] method), which introduced a new influence factor: atmospheric veil, and inferred the atmospheric veil by retaining the obtuse angle as the median filter. This method has a good dehazing effect, but it is not accurate enough to restore the color of the sky. Zhu et al. [10] proposed a linear color attenuation-based method (the CAP method), which restores depth information of the haze-affected image by gauging the difference between the brightness and saturation of the pixels in the image. It can achieve a good dehazing effect, but it is not suitable for images in which there is an uneven distribution of haze. Chen et al. [11] proposed gradient residual minimization (GRM) method, which is based on the smoothing algorithm of image-guided and depth-edge-aware to refine the initial atmospheric transmission map generated by local prior information. They also proposed GRM to restore haze-free image and minimize the visual artifacts. This method can eliminate all kinds of artifacts, but its color restoration is not very accurate. Zhu et al. [12] proposed an algorithm based on image fusion to enhance the performance and robustness of image dehazing. Based on a set of gamma-corrected underexposed images, the pixel weight map is constructed by analyzing the global and local exposure to guide the fusion process. This method is superior to other traditional methods in achieving haze removal efficiently and effectively, and has achieved good visual effect. These priori-based methods rely on prior information or underlying assumptions. However, a priori information cannot easily be obtained in real-world applications, and underlying theoretical assumptions might not translate accurately into practice. For these reasons, the scope for application of priori-based methods in practice is greatly limited.
The other type of dehazing method is the learning-based method. Examples include the MSCNN method [18], the DehazeNet method [19], AOD-Net method [20], GCANet method [21], and FFA-Net method [22], etc. They are based on deep learning, and have been widely used in image dehazing due to their good dehazing performance and wide applicability. The DehazeNet method proposed by Cai et al. [19] uses a deep convolutional neural network to learn the medium transmission map, and then uses an atmospheric scattering model to realize haze removal. Compared with conventional methods, it achieves a better dehazing effect; however, its network is still not accurate enough to estimate the medium transmission map, and so the haze is not entirely eliminated. The AOD-Net method proposed by Li et al. [20], uses convolutional neural networks to directly generate clear images, without estimating medium transmission maps and atmospheric light. Its relatively lightweight network can enable good dehazing performance and speedy training process, but it also darkens the color of the dehazed image and undermines its authenticity in some cases. Zhang et al. [18] proposed an image dehazing method based on multi-level fusion and attention guided convolution neural network. They designed a multi-level fusion module, which is able to adaptively employ different levels of features and use the complementation among them to effectively recover clear images from hazy image. Besides, they also designed an efficient residual mixed-convolution attention module with an attention block. The mixed convolutional operations make this network efficient, and the attention block drives this network to focus on more important features. Chen et al. [21] introduced dilated convolution into the GCANet method, and used smooth dilation technology to eliminate gridding artifacts caused by dilated convolution. They also used a gated sub-network to fuse features at different levels, achieving very good results. However, with images affected by thin haze, this method could eliminate key information from the images while removing the haze. Qin et al. [22] introduced an attention mechanism into the FFA-Net network, and used a feature fusion module for fusing different levels of information. Their results are an improvement on those of earlier methods, although the dehazing effect of the network on real images is not obvious. Zhang and Dong [23] proposed a new image dehazing method based on reinforcement learning (RL) and established a deep Q-learning network to learn the value function of image dehazing. It combines the simplicity of traditional a priori dehazing method and the generalization ability of neural network, and achieves good dehazing effect. Zhang et al. [24] proposed a pyramid channel-based feature attention network (PCFAN), which uses the channel attention mechanism to remove haze by using the complementarity between different levels of features in a pyramid way. This method has achieved good dehazing effect, but it cannot accurately restore the color of the image. In recent years, GAN has achieved promising results in image synthesis [2527]. Researchers have also proposed a number of methods using GANs to dehaze images [28-32]. For example, Engin et al. [28] proposed the Cycle-Dehaze method, which can be trained in an unpaired manner and which achieves good dehazing effect; however, it sometimes causes image distortion during the dehazing process. Dong et al. [29] proposed the FD-GAN method, which is an end-to-end GAN with fusion discriminator. It integrates the frequency information into the dehazing network as additional priors and constraints. It can produce good dehazing results visually. Shyam et al. [30] proposed an encoding-decoding GAN integrating the spatially aware channel attention mechanism for single image dehazing and use the high-frequency and low-frequency components as a priori to determine whether a given image is true or fake. This method can well preserve the color and structure characteristics of the image. Qu et al. [31] proposed the EPDN (enhanced pix2pix dehazing network) method, which converts the image dehazing problem into the image-to-image conversion problem, embeds a GAN in the architecture, and achieves a very good dehazing effect. However, this method does not achieve the desired results if the image is highly blurred.
Therefore, in an effort to further improve the dehazing performance, our method combines the advantages of multi-scale training and attention mechanism, and uses residual networks to solve the gradient explosion problem caused by network depth.

Multi-Scale U-Shaped Dehazing Network based on GAN

We propose a new dehazing method on the basis of the GANs, which can reduce the influence of the haze weather on image quality, especially in regard to transmission equipment monitoring. The method consists of a generative network and a discriminative network. The generative network is used to remove the haze effects. The output of the generative network is the dehazed image. The discriminative network is used to determine whether the dehazed image by generative network is haze-free image. The difference between the dehazed image and haze-free image is measured by loss function. Therefore, we firstly introduce our proposed generative network, secondly discriminative network and loss function in the end.

Proposed Generative Network
For the generative network, we employ a U-shaped structure, which can learn both high-level and low-level information of the image at the same time. Firstly, one convolution layer and ReLU activation function are used to extract local features and increase the number of channels of the feature map. The extracted feature map is used as the input of a residual module. The residual module consists of three cascaded attention modules and one convolution layer. The output of the residual module is downsampled after one convolution layer, that is, the size of the feature map is changed to one-half of the original and the number of channels changed to twice of the original. After that, it passes through two residual modules with 128 and 256 channels, respectively. Then, four cascaded dilated convolution modules are also used to increase the receptive field at the lowest level of the network. These enable the network to extract more effective information without increasing the number of parameters. Next, it is upsampled through a convolution, that is, the size of the feature map is changed to twice of the original and the number of channels changed to one-half of the original. Then, a convolution and ReLU activation function are used to reduce the number of channels of the feature map, which then passes through the residual module. Similarly, a convolution, ReLU activation function, and a residual module are also used in the next scale. Finally, a convolution and ReLU activation function is used for feature fusion, and then a convolution and Tanh activation function is used to convert the feature image into RGB image.
In the U-shaped network, the downsampling method can increase the receptive field of the network, enabling it to learn more low-frequency information, such as the overall outline of the image. The upsampling method can make network learn more high-frequency information. And the low-frequency information and high-frequency information are merged via the skip connection, which can realize image dehazing while retaining to more image original information. The whole structure of our proposed generative network is shown in Fig. 1.
Fig. 1. The generative network.

For the generative network, we have designed a residual module which is used to extract features in the U-shaped network, without having to rely on multiple simple convolution modules. The residual module can increase the depth of the network in order for it to learn more effective information, while also avoiding gradient disappearance or gradient explosion. The proposed residual module is shown in Fig. 2. It contains three cascaded attention modules and a convolution module. The shallow features and deep features are mixed by way of element-wise summation. The output feature of the residual module contains more information on different features.
Fig. 2. The residual module.

Fig. 3. Attention module.

The attention module used in the residual module is shown in Fig. 3. It consists of two convolution modules and a channel attention module. Firstly, it uses two convolution modules to extract features. Secondly, the channel attention module is used to make the network extract more useful information and reduce redundant information. Finally, the input features and output features are added through element-wise summation, so that the network can learn more effective information while retaining the original features. Its output can be expressed as follows:

$H_{i+1}=H_i+G(W_i^2·δ(W_i^1·H_i))$(1)

where $H_i$ and $H_{i+1}$ represent the input and the output of the attention module, respectively; $W_t^1$ and $W_t^2$ are the convolution kernels of the firstly and second convolution layer, respectively; $δ()$ is the ReLU activation function; and, $G(∙)$ is the channel attention module.
Fig. 4. Channel attention module.

During the process of haze removal, the detailed information of the recovered image plays a more important role. In order to make the network focus more on the features which contain more relevant information, the channel attention mechanism is used to adaptively adjust channel features by considering the interrelationship between channels. The channel attention module is shown in Fig. 4. Firstly, it integrates the spatial information of the input with the assistance of the average pooling module and the maximum pooling module. Secondly, the output of maximum pooling and average pooling are inputted into the weight-sharing convolution block, and the two outputs are obtained. In the end, they are added up via element-wise summation in order to produce the attention feature map. This allows the feature that contains more information to be fully learned. The output of channel attention module can be expressed as follows:

$\begin{eqnarray} G(C_i) &=&C_i·δ{Conv[Avg(C_i)]+Conv[Max(C_i)]} \\ &=&C_i·δ{W_i^b·[W_i^a·Avg(C_i)]+W_i^b·δ[W_i^a·Max(C_i)]} \end{eqnarray}$ (2)

where $C_i$ is the input of the channel attention module; $δ()$ is ReLU activation function. Conv() is the convolution layer that corresponds to the weight-sharing convolution module; $W_i^a$ and $W_i^b$ are the convolution kernels of the firstly and second convolution layers, respectively; $Avg{C_i}$ and $max{C_i}$ denote the outputs of the average pooling layer and maximum pooling layer, respectively. The output of the average pooling layer and the maximum pooling layer can be calculated by the following formulas:

$Avg(x_c)=\frac{1}{H×W} \displaystyle\sum_{i=1}^{H}\displaystyle\sum_{j=1}^{W}x_c(i,j)$ (3)

$Max(x_c)=max x_c(i,j)$(4)

where $x_c (i,j)$ represents feature map in the $c^{th}$ channel, H×W represents the size of the feature map, and $σ()$ is the sigmoid activation function. The attention weightings range from 0 to 1. Finally, the attention weighting and the input feature are multiplied element by element, which allows the features with high information content to be more fully learned.
Fig. 5. Discriminative network.

Discriminative Network
We use the Markovian discriminative network to determine whether the input image is fake or real. The discriminative network is shown in Fig. 5. It consists entirely of convolution layers. Its input contains images of three scales: the original image size, the half size of the original image, and the quarter size of the original image. We improve the discriminative ability of the discriminator by operating on three scales. We input the images of three scales into the discriminant network, and get the output of three scales, and then use our proposed loss function to calculate the adversarial loss function of each scale, and finally add them to get the final multi-scale adversarial loss function. The output of the Markovian discriminative network is a matrix with one channel. The mean values of matrix components serve as the input of adversarial loss function for optimizing the whole network. Each component of the matrix expresses a receptive field of the original image. Therefore, it can absorb more detailed information than the traditional discriminative network, and realize better dehazing performance.

Loss Function
In order to further improve the performance of image dehazing, we propose the following loss function:

$Loss_G=\min\limits_{G}L_{adv}+L_{pix}+L_{per}$(5)

$Loss_D=\max\limits_{D}L_{adv}$(6)

where $Loss_G$ and $Loss_D$ represent the loss functions of the generator and discriminator, respectively; and, $L_{adv}$, $L_{pix}$, and $L_{feat}$ represent the adversarial loss, pixel loss and feature loss of the network. In order to preserve more detailed information from the original image, the WGAN-GP loss function is used in the adversarial loss [33]. This can be expressed as follows:

$L_{adv}=E[D(I_{pre})]-E[D(I_{gt})]+λD[(||∇D(I_{pen})-1||^2)]$(7)

where $I_{pre}$ represents the dehazed image, and $I_{gt}$ represents the haze-free image; $D$ represents the discriminator, $I_{pen}$ is expressed as follows:

$I_{pen}=a×I_{pre}+(1-a)×I_{gt} (a∈(0,1))$(8)

For $L_{pix}$, we use L2 loss to minimize the squared error between the dehazed image and the haze-free image, thus rendering the output image a closer representation of the haze-free image, as well as speeding up the convergence speed. It can be expressed as follows

$L_{pix} = λ_0·\displaystyle\sum_{i=1}^{m}(I_{gt}^{1/4}-I_{pre}^{1/4})2+ λ_1·\displaystyle\sum_{i=1}^{m}(I_{gt}^{1/2}-I_{pre}^{1/2})2+ λ_2·\displaystyle\sum_{i=1}^{m}(I_{gt}-I_{pre})2+$ (9)

where $I_{pre}^{(1/4)}$ represents the image generated by the generator at the quarter scale. $I_{pre}^{1/2}$ represents the image generated by the generator at the half scale. $I_{gt}^{1/4}$ represents the haze-free image with the same size as $I_{pre}^{(1/4)}$, and $I_{gt}^{1/2}$ represents the haze-free image with the same size as $I_{pre}^{1/2}$. $λ_0$, $λ_1$, and $λ_2$ are hyper parameters. In our experiment, $λ_0$ is set to 0.6, $λ_1$ is set to 0.8, $λ_2$ is set to 1.0.
We also use the feature loss to measure the global difference between the features of the dehazed image and those of the haze-free image. This allows the network to better extract the global features of the image. The dehazed image and haze-free image are used as input of the pre-trained VGG16 network, and the L2 loss is then used to calculate the feature loss. It can be expressed as follows:

$L_{per}=\displaystyle\sum_{i=1}^{m}[VGG(i_{pre})-VGG(I_{gt})]^2$(10)

where $VGG(∙)$ represents the mapping corresponding to the pre-trained VGG16 network.

Simulation and Discussion

In this section, we test the dehazing performance of our proposed method and other four methods (CycleGAN, AOD-Net, GCANet, and FFA-Net) on RESIDE dataset which is a public dataset and our transmission equipment image dataset. Firstly, we introduce the two datasets and metrics. Secondly, we test the dehazing performance for all methods on RESIDE dataset and our transmission equipment image dataset, respectively. Finally, ablation study and user study are given to test the dehazing performance from different aspect, respectively

Datasets and Metrics
We employ two datasets for this test of our proposed method. The first dataset is the RESIDE dataset supplied by Li et al [34]. It has 13,990 images in the training set and 500 images in the test set. The second dataset (our own compilation) contains 1,750 transmission equipment images that we selected from an online transmission line monitoring system. In the simulation we use 1,500 images as the training set and 250 images as the test set. We resize the transmission equipment image into size 360×240 by manual cropping, and use the (1) to generate synthetic images on the basis of haze-free images. The atmospheric light A is randomly selected between [0.6, 0.8], and the scattering parameter $β$ is randomly selected between [0.1, 0.2] in (1). We also use haze images obtained in the real environment to test the performances of different methods. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are also used for measuring dehazing performance [35]. The PSNR can be expressed as follows:

$PSNR = 10 \left(log_{10}(\frac{(MAX_1)^2}{MSE})\right)$(11)

where $MAX_I$ is the maximum value of image pixel coloration. The MSE is the mean square error, which is expressed as follows:

$MSE=\frac{1}{mn}\displaystyle\sum_{i=1}^{m-1}\displaystyle\sum_{j=1}^{n-1}||I(i,j)-K(i,j)||^2$(12)

where m and n represent the size of the image, Irepresents the dehazed image, and $K$ represents the haze-free image. The SSIM [35] measures image similarity from three aspects: luminance, contrast, and structure. The luminance function, contrast function and structure function can be expressed as follows:

$l(x,y)=\frac{2μ_xμ_y+c_1}{μ_x^2μ_y^2+c_1}$(13)

$c(x,y)=\frac{2σ_xσ_y+c_2}{σ_x^2σ_y^2+c_2}$(14)

$s(x,y)=\frac{σ_{xy}+c_3}{σ_xσ_y+c_3}$(15)

where $μ_x$ and $μ_y$ represent the mean values of $x$ and $y$; $σ_x$ and $σ_y$ represent the standard deviation of $x$ and $y$; and, $σ_{xy}$ represents the covariance of $x$ and $y$. The SSIM can be expressed as follows:

$SSIM(x,y)=[l(x,y)]^α[c(x,y)]^β[s(x,y)]^γ$(16)

We set $α = β = γ = 1$. In this simulation system, we use Ubuntu 18.04 system, and an Intel Xeon e5-2678 V3 processor for the simulation. The GPU is an NVIDIA GeForce GTX 2080ti, and the deep learning framework is PyTorch.

Simulation on the RESIDE Dataset
Firstly, we randomly select five test images from the RESIDE dataset to compare the performance of our proposed dehazing method with the following four methods: CycleGAN, AOD-Net, GCANet, and FFA-Net. The test images, dehazed images and haze-free images are shown in Fig. 6. The first (top) row contains the results of a piano image. It is obvious that the dehazed image obtained by the AOD-Net method still contains much haze. In the dehazed images obtained by the CycleGAN and FFA-Net methods, the wall color is distortion. For the second image (dining room), it is obvious that the colors of the wall and floor, and desktop are distorted in the dehazed image obtained by CycleGAN. In the dehazed image obtained by AOD-Net, there is still haze near the door. For the third image, the color of the wall is distorted by CycleGAN and GCANet. For the fourth and fifth images, there is some haze near the wall in the CycleGAN and AOD-Net results. The images obtained by our proposed method are a closer replication of the haze-free images, compared to the images obtained by other methods.
Fig. 6. Dehazing results for RESIDE dataset.

Secondly, to compare the dehazing performance of different methods for different haze thickness, we randomly select one figure from the dataset, and change the thickness of haze by setting atmospheric light A to 0.7, 0.8, 0.9 and 1.0. The images with different haze thickness and dehazed images obtained by different methods are shown in Fig. 7. Although the qualities of dehazed images become worse with the increase of haze thickness for all methods, the dehazed images obtained by our proposed method still have detail restoration and color restoration than others. The values of PSNR and SSIM for different methods are shown in Fig. 8. From the Fig. 8, we can see that our proposed method has the largest PSNR and SSIM under the same atmospheric light. It shows that our proposed method has better dehazing performance than others.
Fig. 7. Images with different haze thickness and dehazed images.

Fig. 8. Dehazing effect of different methods for different haze thickness: (a) PSNR and (b) SSIM.

Thirdly, we use all test images to quantitatively evaluate the performance of all methods. The results are shown in Fig. 9; Fig. 9(a) and 9(b) show the PSNR and SSIM of different methods, respectively. The PSNR results are 19.2854 dB, 19.4621 dB, 26.3242 dB, 32.3857 dB, and 36.6062 dB for CycleGAN, GCANet, AOD-Net, FFA-Net, and our method, respectively. The SSIM results are 0.7616, 0.8356, 0.9441, 0.9802, and 0.9879 for CycleGAN, GCANet, AOD-Net, FFA-Net, and our method, respectively. Our proposed method obtains the largest PSNR and SSIM values among all five methods. This alone demonstrates that the proposed method has superior dehazing performance.
Fig. 9. Histogram of dehazing results for RESIDE dataset: (a) PSNR and (b) SSIM.

Finally, we use real haze images as test images to test the dehazing performance for different methods. The real haze images and dehazed images obtained by different methods are shown in Fig. 10. For the first image (top row), it is evident that there is residual haze in the dehazed images obtained by CycleGAN, AOD-Net and FFA-Net, and in the dehazed images produced by CycleGAN and AOD-Net the leaves cannot be distinguished. The sky color is distorted in the GCANet image. For the second image (low-rise buildings), the finer details of the trees are lost in the CycleGAN and AOD-Net images. The color of the wall of the left low-rise building is distorted in the FFA-Net image. For the third image (skyscrapers), again, the finer details of the trees are missing in the CycleGAN and AOD-Net images. The right low-rise building is distorted in the GCANet image. There is also distortion between the right low-rise building and high-rise building in the FFA-Net image. For the fourth image (harvest), the color is distorted in the CycleGAN, AOD-Net and GCANet images. And there is still a lot of haze in the results of FFA-Net, which cannot remove all haze like our method for the fourth image.
In summary, Fig. 10 shows that the hazed images obtained by our method preserve more information than the other methods. This means that our proposed method achieves the best performance in this simulation.
Fig. 10. Real haze images and dehazed images.

Fig. 11. Partial transmission equipment images for training.

Simulation on the Transmission Equipment Image Dataset
We have also used a selection of transmission equipment images sourced from an online transmission line monitoring system to test the performance of the five different methods including our proposed method. Partial transmission equipment images for training and testing are shown in Fig. 11. The histograms of the results are shown in Fig. 12; Fig. 12(a) shows the PSNR of various method and Fig. 12(b) shows the SSIM of different methods for the transmission equipment images dataset. The PSNR results are 18.9067 dB, 18.4119 dB, 32.1014 dB, 32.7859 dB, and 35.8375dB for CycleGAN, AOD-Net, GCANet, FFA-Net, and our method, respectively. The SSIM results are 0.6322, 0.8371, 0.9608, 0.9648, and 0.9782 for CycleGAN, AOD-Net, GCANet, FFA-Net, and our method, respectively. Our proposed method has the largest PSNR and SSIM results out of all the methods. This shows that the proposed method achieves the best dehazing performance among all methods considered.

Fig. 12. Histogram of dehazing results for transmission equipment image dataset: (a) PSNR and (b) SSIM.

We use real haze images to test the performance of different methods. The real images and dehazed images are shown in Fig. 13. It is obvious that there is huge distortion in dehazed image obtained by CycleGAN. There is still much haze around the insulator, and color distortion of the tower in the dehazed image obtained by AOD-Net. There is also much haze around the transmission line and the insulator in the dehazed image obtained by the GCANet. Although there is still haze in the dehazed images obtained by FFA-Net and our method, the residual haze in them is less severe than in those produced by other methods. Compared with the image produced by the FFA-Net method, the tower color is brighter, and the time image is clearer in the dehazed image obtained by our method.

Fig. 13. Real haze transmission equipment image and dehazed images.

Ablation Study
In order to analysis our proposed loss function performance, we use $L_{adv}$+$L_{per}$, $L_{adv}$+$L_{pix}$, and $L_{adv}$+$L_{pix}$+$L_{per}$ as the loss function in our proposed network. We test our proposed methods with different loss functions on the transmission equipment image dataset. The results are shown in Table 1. The PSNR results are 34.1683 dB, 35.0754 dB, and 35.8375 dB for proposed method, based on $L_{adv}$+$L_{per}$, $L_{adv}$+$L_{pix}$, and $L_{adv}$+$L_{pix}$+$L_{per}$, respectively. The SSIM results are 0.9705, 0.9745, and 0.9782 for our proposed method based on $L_{adv}$+$L_{per}$, $L_{adv}$+$L_{pix}$and $L_{adv}$+$L_{pix}$+$L_{per}$, respectively. Our method based on attains larger PSNR and SSIM results, compared with those of the other methods. This alone is proof that the proposed loss function has an expansionary effect on the performance of the haze removal.

Table 1. Results of different losses for transmission equipment image dataset
$L_{adv}$+$L_{per}$ $L_{adv}$+$L_{pix}$ $L_{adv}$+L_{pix}$+$L_{per}\$
PSNR 34.1638 35.0754 35.8375
SSIM 0.9705 0.9745 0.9782

UserStudy
In order to verify the effectiveness and superiority of our model, we conducted a user study. A total of 30 volunteers participated in the user study. The CycleGAN, and AOD-Net methods are significantly worse than our method. Therefore, we only compare GCANet and FFA-Net with our method. We randomly selected 100 haze images from the testset of the RESIDE dataset, and obtained the dehazing results of GCANet, FFA-Net, and our method, respectively. We construct 100 groups of data and display two pictures in each group of data at the same time, one of which is generated by GCANet method and the other is generated by our method. The same experiment was carried out for FFA-Net. Each volunteer randomly selected 30 groups from 100 groups of data and selected visually more satisfactory (more natural) images in each group. In this way, the satisfaction of each volunteer with each method is calculated. Then sum and average the satisfaction of each volunteer to get the final satisfaction. The same method is used for transmission equipment image dataset. The final result is shown in Fig. 14: Fig. 14(a) shows the comparison results on RESIDE dataset and Fig. 14(b) shows the comparison results on transmission equipment image dataset. Obviously, our method has better visual performance than GCANet and FFA-Net methods.

Fig. 14. Results of user study for (a) RESIDE dataset and (b) transmission equipment image dataset.

Conclusion

In this paper, we have proposed a dehazing method based on GANs. Our designed generative network stricture is a U-shaped network structure, which can extract features at multiple scales. The skip connection is also used in the U-shaped network structure to combine the shallow features with deep features. A new residual network consisting of a channel attention module, cascaded dilated convolutions, and new loss function, are also used in the U-shaped network to extract more effective features and increase image integrity. We employ the Markovian discriminative network as the discriminator. Compared with four existing dehazing methods (CycleGAN, AOD-Net, GCANet, and FFA-Net), our proposed method attains larger PSNR and SSIM values on the different datasets and better dehazing effect with real haze images. Therefore, our proposed method achieves the best performance in haze removal, compared with these four existing methods.
In future work, we will develop a dehazing method based on lightweight GANs, which could be deployed on embedded devices with limited computer power.

Author’s Contributions

Conceptualization, LZ. Investigation and methodology, LZ, YZ. Resources, YC. Writing of the original draft, LZ, YZ. Writing of the review and editing, LZ, YZ, YC. Data Curation, LZ, YZ.

Funding

This work was supported by the National Natural Science Foundation of China (No. 61271115), Research Foundation of Education Bureau of Jilin Province (No. JJKH20210095KJ).

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Liquan Zhao was born in Heilongjiang province in 1982. He received the B.S degree in Electrical & Information Engineering from Harbin University of Science and Technology, Harbin, China, in 2005 and the Ph.D. degree in Communication and Information System at Harbin Engineering University, Harbin, China, in 2009. From 2009, he was an associate professor at Northeast Electric Power University, Jilin, China. His research interests include deep learing and blind source separation

Yupeng Zhang was born in Jilin province in 1997. He received the bachelor’s degree in electronic information engineering from the Northeast Electric Power University, Jilin, China, in 2019. He is working toward themaster’s degree in School of Electrical Engineering, Northeast Electric Power University, Jilin, China. His research interests include deep learning and Generative Adversarial Networks

Ying Cui was born in Heilongjiang province in 1987. He received the B.S degree in Electrical & Information Engineering from Daqing Normal University, Daqing, China, in 2011 and the Ph.D. degree in Electrical Engineering and Automation at Harbin Institute of Technology University, Harbin, China, in 2020. From 2020, he was an Engineer at Guangdong Electric Power Corporation Zhuhai Power Supply Bureau, Zhuhai, China. His research interests include artificial intelligence.

References

[1] X. Tao, D. Zhang, Z. Wang, X. Liu, H. Zhang, and D. Xu, “Detection of power line insulator defects using aerial images analyzed with convolutional neural networks,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 50, no. 4, pp. 1486-1498, 2018.
[2] J. Li, Y. Pei, S. Zhao, R. Xiao, X. Sang, and C. Zhang, “A review of remote sensing for environmental monitoring in China,” Remote Sensing,vol. 12, no. 7, article no. 1130, 2020.https://doi.org/10.3390/rs12071130
[3] D. Cao, Z. Chen, and L. Gao, “An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks,” Human-centric Computing and Information Sciences, vol. 10, article no. 14, 2020. https://doi.org/10.1186/s13673-020-00219-9
[4] M. Zhang, R. Fu, Y. Guo, L. Wang, P. Wang, and H. Deng, “Cyclist detection and tracking based on multi-layer laser scanner,” Human-centric Computing and Information Sciences, vol. 10, article no. 20, 2020. https://doi.org/10.1186/s13673-020-00225-x
[5] C. Guo, B. Fan, Q. Zhang, S. Xiang, and C. Pan, “AugFPN: improving multi-scale feature learning for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020, pp. 12592-12601.
[6] G. Braso and L. Leal-Taixe, “Learning a neural solver for multiple object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020, pp. 6247-6256.
[7] R. Fattal, “Single image dehazing,” ACM Transactions on Graphics (TOG), vol. 27, no, 3, pp. 1-9, 2008.
[8] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2341-2353, 2011.
[9] J. P. Tarel and N. Hautiere, “Fast visibility restoration from a single color or gray level image,” in Proceedings of 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 2009, pp. 2201-2208.
[10] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm using color attenuation prior,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3522-3533, 2015.
[11] C. Chen, M. N. Do, and J. Wang, “Robust image and video dehazing with visual artifact suppression via gradient residual minimization,” in Computer Vision – ECCV 2016. Cham, Switzerland: Springer, 2016, pp. 576-591.
[12] Z. Zhu, H. Wei, G. Hu, Y. Li, G. Qi, and N. Mazur, “A novel fast single image dehazing algorithm based on artificial multiexposure image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, 2020.https://doi.org/10.1109/TIM.2020.3024335
[13] S. C. Pei and T. Y. Lee, “Nighttime haze removal using color transfer pre-processing and dark channel prior,” in Proceedings of 2012 19th IEEE International Conference on Image Processing, Orlando, FL, 2012, pp. 957-960.
[14] G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan, “Efficient image dehazing with boundary constraint and contextual regularization,” in Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 2013, pp. 617-624.
[15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, pp. 2672-2680, 2014.
[16] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019, pp. 4401-4410.
[17] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,”in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2005. Cham, Switzerland: Springer, 2015, pp. 234-241.
[18] X. Zhang, T. Wang, W. Luo, and P. Huang, “Multi-level fusion and attention-guided CNN for image dehazing,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.https://doi.org/10.1109/TCSVT.2020.3046625
[19] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: an end-to-end system for single image haze removal,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5187-5198, 2016.
[20] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “AOD-Net: all-in-one dehazing network,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 4770-4778.
[21] D. Chen, M. He, Q. Fan, J. Liao, L. Zhang, D. Hou, L. Yuan, and G. Hua, “Gated context aggregation network for image dehazing and deraining,” in Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, 2019, pp. 1375-1383.
[22] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “FFA-Net: feature fusion attention network for single image dehazing,” in Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, 2020, pp. 11908-11915.
[23] Y. Zhang and Y. Dong, “Single image dehazing via reinforcement learning,” in Proceedings of 2020 IEEE International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 2020, pp. 123-126.
[24] X. Zhang, T. Wang, J. Wang, G. Tang, and L. Zhao, “Pyramid channel-based feature attention network for image dehazing,” Computer Vision and Image Understanding, vol. 197-198, article no. 103003, 2020.https://doi.org/10.1016/j.cviu.2020.103003
[25] K. Lata, M. Dave, and K. N. Nishanth, “Image-to-image translation using generative adversarial network,” in Proceedings of 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2019, pp. 186-189.
[26] T. C. Wang, M. Y. Liu, J. Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional GANs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 8798-8807.
[27] I. S. Na, C. Tran, D. Nguyen, and S. Dinh, “Facial UV map completion for pose-invariant face recognition: a novel adversarial approach based on coupled attention residual UNets,” Human-centric Computing and Information Sciences, vol. 10, article no. 45, 2020.https://doi.org/10.1186/s13673-020-00250-w
[28] D. Engin, A. Genc, and H. Kemal Ekenel, “Cycle-dehaze: enhanced CycleGan for single image dehazing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, 2018, pp. 825-833.
[29] Y. Dong, Y. Liu, H. Zhang, S. Chen, and Y. Qiao, “FD-GAN: generative adversarial networks with fusion-discriminator for single image dehazing,” in Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, 2020, pp. 10729-10736.
[30] P. Shyam, K. J. Yoon, and K. S. Kim, “Towards domain invariant single image dehazing,” in Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2021, pp. 9657-9665.
[31] Y. Qu, Y. Chen, J. Huang, and Y. Xie, “Enhanced pix2pix dehazing network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019, pp. 8160-8168.
[32] H. Zhang and V. M. Patel, “Densely connected pyramid dehazing network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 3194-3203.
[33] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” Advances in Neural Information Processing Systems, vol. 30, pp. 5767-5777, 2017
[34] B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,” IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 492-505, 2018.
[35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, 2004.

Liquan Zhao1,*, Yupeng Zhang1, and Ying Cui2, A Multi-Scale U-Shaped Attention Network-Based GAN Method for Single Image Dehazing, Article number: 11:38 (2021) Cite this article 4 Accesses