ArticlesAll Issue
ArticlesIntelligent Deep Learning and Improved Whale Optimization Algorithm Based Framework for Object Recognition
• Nazar Hussain1, Muhammad Attique Khan1, Seifedine Kadry2, Usman Tariq3, Reham R. Mostafa4, Jung-In Choi5,*, and Yunyoung Nam6,*

Human-centric Computing and Information Sciences volume 11, Article number: 34 (2021)
https://doi.org/10.22967/HCIS.2021.11.034

Abstract

In pattern recognition, object recognition is an important research domain due to major applications such as autonomous driving, robotics, and visual surveillance. Many computer vision techniques are introduced in the literature. Several challenges exist, such as similar shapes of different objects and imbalanced datasets. They also face irrelevant feature extraction, which degrades the recognition accuracy and increases the computational time. In this article, we proposed a fully automated computer vision pipeline for object recognition. In the proposed method, initially perform the data augmentation to balance the object classes. In the later step, a convolutional neural network (DenseNet201) was considered and modified according to the selected dataset (Caltech101). The modified model is trained by transfer learning and extracts features. The extracted features include a few redundant information removed using an improved whale optimization algorithm (WOA). Final features are classified using several supervised learning algorithms for final recognition. The experimental process was carried out using the augmented Caltech101 dataset and accomplished an accuracy of 93%. Comparison with the benchmark methods illustrated that the implanted accuracy is considerably improved.

Keywords

Deep Learning, Features Optimization, Features Classification,Object recognition, Transfer Learning

Introduction

Object recognition is the most sophisticated and challenging pattern recognition and computer vision domain due to the vast application of video surveillance, cognitive computing, and machine intelligence [1, 2]. Researchers are developing and adopting multiple methods and techniques for efficient object recognition, like object comparison, object shape, and minor differences between multiple objects [3, 4]. An efficient and robust technique is essential for object recognition by overcoming the object's shape, texture, illumination, and color variations [5]. Video surveillance system efficiency and robustness depend upon the effectiveness of object recognition [6]. Besides visual surveillance, numerous object classification applications include face recognition, biometric verification, pedestrian tracking, video watermarking, scene understanding, and the most modern and trending autonomous vehicles and drones[7-10]. Sustainable and robust object detection and recognition systems reduced the impact of complex background, illumination, color, an object moving in a different direction, and objects of different shapes with the same color on a robust classification system. The traditional classification models did not withstand the complication of objects [11].
Multiple methods in computer vision have been presented to reduce the impact of complex objects classification performance [12]. Researchers tried to explain an optimal technique to overcome all the challenges in object classification. Traditional classification methods like handcrafted features (HCF) are utilized for object classification, but the complex object challenge reduced HCF implementation [13]. HCF includes local binary pattern (LBP),histogram of oriented gradient (HoG), scale-invariant feature transform (SIFT), speeded up robust features (SURF), and texture features (Haralick). The latest method presented hybrid features set to express object representation [14, 15].
Recently, the concept of deep learning has been presented, which showed robustness with the decrease in computational time [16]. Multiple convolutional neural networks (CNNs) trained over a million images have been presented. Pretrained CNN models includes AlexNet [17], VGGNet [18], ResNet (ResNet-50, ResNet-152, and ResNet-102) [19], DenseNet[20] and InceptionV3 [21]. The mentioned CNN models are trained on challenging image dataset ImageNet. The development of CNN models contributed a lot to achieving competitive accuracy to a certain extent [22]. A new technique was created to reach the acceptable accuracy called feature fusion, which combines the different feature spaces into one feature map. Feature fusion is showing much success in pattern recognition tasks. Many fusion techniques are proposed to combine the features of multiple sources [23, 24]. The feature fusion technique achieved the desired accuracy but increased the cost of computational. The recent research presented in pattern recognition shows that the addition of irrelevant features that were not relevant to recognition tasks, are decreases the recognition accuracy [25]. According to the best of our knowledge, removing irrelevant features from the fused feature map will provide better recognition accuracy and cuts the computational time [26].
Feature optimization methods divided into three types: filter based, embedded based andwrapper based [27]. In the filter-based feature selection method, features are selected from features set independently. In wrapper-based selection, the features are selected using the assumed prediction power of features. In the embedded selection method, features are selected during the training process. This approach selects the feature from optimal values (features) using the filter and wrapper-based methods [28, 29]. The most common feature selection techniques include principal component analysis (PCA), Pearson correlation coefficient (PCC), linear discriminant analysis (LDA), whale optimization algorithm (WOA), entropy controlled, particle swarm optimization (PSO),genetic algorithm (GA), and name a few more [30-33].
Many challenges exist in the recognition of objects from static images. The major complexities are variation in color, cluttered background, occlusion, and different light conditions. The database size utilized for model training also impacts the system robustness. Moreover, the imbalanced datasets minimize the recognition accuracy. In this work, we came up with a new automated system for the classification of objects. Our significant contributions in this work are as follows:

Data augmentation is performed by employing four different operations to remove the imbalance challenge among object classes.

Modified a pre-trained DenseNet201 deep CNN model based on the augmented Caltech101 dataset. The modified model is trained through transfer learning and extracts deep features from the dense layer.

Improve WOA performance in terms of fitness function and last step data redundancy.

Optimized features are executed using supervised learning algorithms and compared with the recent state of the art methods based on accuracy.

The manuscript organization as follows: related work is discussed in Section 2. The proposed methodology, which includes data augmentation, deep learning, and features selection using improve WOA, is presented in Section 3. Results of the proposed framework are presented in Section 4. Finally, the Section 5 comprises the conclusion of this work.

Related Work

Researchers adopted several techniques for robust object classification in the domain of machine learning and computer vision. Object detection and recognition is the most developing domain of computer vision and image processing. The applications of object classification include target recognition, video surveillance, autonomous vehicles framework, scene understanding, and many more. Various feature fusion methods are illustrated in the literature. Feature fusion is a process of concatenating different feature spaces into one feature vector to achieve higher accuracy. Loussaief and Abdelkrim[34] addressed the feature extraction problem using a bag-of-features (BoF). Later, the pre-trained deep CNN model AlexNetwas utilized with BoF for object recognition and attained competitive accuracy on challenging dataset Caltech101. Transfer learning was employed using the pre-trained deep CNN model VGG16. The pre-trained model was trained from scratch and achieved an accuracy of 91.66% on the challenging classification dataset Caltech101. Cengil et al. [35] utilized a deep CNN model for object classification. An open-source libraryconvolutional architecture for fast feature embedding (CAFFE) network with CNN model used as a classification application. The classification application utilized GPU facility for robust classification and achieved competitive accuracy on the Caltech101 dataset. Pretrained deep learning model ResnetNet-50 and VGG16 used associative memory blocks to extract deep features. Unsupervised clustering was performed on associative memory banks by utilizing K-mean clustering.
Guo et al. [36] came up with a novel technique based on deep CNN for the robust classification of hyperspectral images (HSI). In the first phase, the inputs of recurrent layers are fused and used as input to the next layer in each convolution block to extract prominent features. In the second phase, the spectral features are extracted from deep layers. Later, a 1×1 convolution layer is utilized for feature map construction, and classification is performed. Object recognition [37] was performed with the help of pre-trained deep learning models (ResNet-50 and ResNet-152). PCA applied on deep features for feature optimization. Caltech101 challenging dataset utilized to assess the performance of the presented model. A differential feature fusion convolutional neural network (DFF-Net) was presented for object detection. The detection was performed in two phases, prior detection and detection. In prior detection boxes created for detection phase. Detection boxes are utilized at the detection phase for robust detection.
DFF-Net performance evaluated at railway traffic dataset and achieved competitive accuracy. Jemilda and Baulkani[38] reduced the computational cost utilizing the salp swarm algorithm, removing irrelevant features and enhancing accuracy. Later, a kernel-based support vector machine (K-SVM) was developed to classify fused feature vectors with high dimensions. Deep CNN models VGG16 and VGG19 were utilized for deep features extraction. Deep features extraction performed using the fully connected (FC) layers fc6, fc7, and fc8. PCA applied on extracted features for feature optimization. The optimized features classified using ensemble classification. The presented model evaluated on the benchmark dataset Caltech101 and achieved competitive accuracy.
A novel deep learning architecture [39] was introduced called inception recurrent neural network. The proposed technique implanted fusion of the Inception [21] and recurrent network to enhance the robustness of a deep neural network for efficient object recognition and attained competitive performance on challenging object recognition datasets. Nazar et al. [23] enhance the recognition algorithm performance by fusing classical features with deep CNN features extracted using Inception V3 [21] after performing feature optimization using joint entropy. The presented method employed on benchmark recognition dataset Caltech101 and achieved competitive classification accuracy. Researchers utilized [2] pre-trained deep CNN models InceptionV3 [21] and VGG19 [40] for object classification. Multilayer features extracted from both pre-trained models and extracted features concatenated. The redundancy of extracted features was removed using logistic regression-controlled entropy. Classification performed on extracted features and achieved competitive performance on benchmark datasets. Rashid et al. [41] presented a method for object detection and recognition by fusing SIFT features with deep CNN features extracted using AlexNet[42] and VGG [40]. The fused features optimization was performed by employing Reyni entropy which removed the redundant features. Ensemble classification performed on selected features and achieved robust detection and classification performance on benchmark datasets.

Proposed Methodology

This section presented the proposed intelligent deep learning and improved WOA for object recognition. The main flow of the proposed framework is illustrated in Fig. 1. In this figure, it is shown that the original database is enhanced using a data augmentation approach. The next step selected a pre-trained deep CNN model DenseNet201 and modified it in terms of the classification layer. This modified model training performed using transfer learning and deep feature extraction performed from the dense layer (global average pool). In the later step, the robust features are selected using improved WOA. The best-selected features are finally fed into machine learning algorithms for final classification. Each step is illustrated in Fig. 1.

Fig. 1. Proposed framework of Object recognition using deep learning and Improved WOA.

Database Augmentation
Data augmentation is playing vital role in the domain of machine learning applications by increasing thesize of entire dataset. The main purpose of this step is to train a deep network with more observations. CNN model performed better when trained on maximum number of observations. In this step, we performed data augmentation to improve the dataset. The original object classes of the Caltech101 dataset are highly imbalanced; therefore, it is essential to balance this dataset for the training of a model. Through an imbalanced dataset, the training is not authentic, and accuracy is decreased. In this work, we performed four operations to increase the dataset. These operations are left flip, right flip, auto flip, and rotation in 90°.
After these operations, each class in our dataset increases to 700 images (each class). This dataset is still insufficient for accurate training; therefore, we consider an API to connect the proposed model with the internet. Through API, the relevant object images are searched and save in the database. This database is called an augmented database. Using this augmented dataset, we utilized the 70:30 approach for the training and testing process of the proposed framework. Visually, this process is showing in Fig. 2.
Fig. 2. Proposed IoT based database augmentation process.

Deep Features Extraction
In recent times deep learning methods has been proved robust in object detection and recognition tasks[12, 43]. A simple deep CNN model has a convolution layer, batch normalization layer, and pooling layer. Other layers include an activation function like ReLU and a feature extraction layer like a FC layer. The first layer of any CNN model is utilized as the input layer, followed by convolution layers that execute the convolution operations on the input image. After convolution operation dot product of weights and smaller regions is computed. ReLU layers perform the activation function by removing the inactive neurons. The best active neurons helped the FC layer extract features, and extracted features are classified using the softmax layer at the last phase [44].

DenseNet201
The deep CNN model DenseNet201 has a depth of 201 layers. The network is trained extensively on challenging image database ImageNet [45]. The input dimension of the image to the DensNet-201 input layer is 224×224. The architecture of DensNet201 is showing inFig. 3. All layers of pre-trained deep CNN DenseNet201 directly connect to decrease the information loss. The input of the first layer is transferred to the second layer, and so on. The last layer is having n number of inputs and information passed on by all previously connected layers. The information of the current layer is transfer to M-n upcoming layers and the output of all layers are concatenated (M(M+1))/2 to formulate a network. DenseNet201 has fewer parameters as compared to Inception V3 [21]. The information flow from one layer to another requires the maintainability of information to decrease information loss. The feature spaces at different layers are easily differentiated, and classifier prediction is based on all feature spaces. DenseNet has fewer chances of overfitting because the layers of the network have gradient loss function.
Fine Tuning: In this step, the original pre-trained models, as showing in Fig. 2, are fine-tuned. The last three layers are replaced with three new layers—new FC layer, new softmaxlayer, and new classification layer. Caltech101 datasetutilized in this work, which includes 101 object classes. After this fine-tuning process, train this new modified model by employing transfer learning [46]. The process of transfer learning is shown in Fig. 4. The figure shows that the source data is ImageNet dataset, source model is DenseNet201, and source labels are 1000. The knowledge of the source model is transfer on the fine-tuned model and trained on target data (Augmented Caltech101). Hyperparameters selection performed during training process, such as the number of epochs is 200, per epoch number of iterations are 10, the learning rate is 0.001, mini-batch size is 64, learning function is stochastic gradient descent (SGD), and learning factor is 10. After training this fine-tuned model, the activation is applied on the global average pooling layer for features extraction. The obtained deep feature vector size is N×1920, which further optimized using improved whale optimization algorithms.
Fig. 3. Architecture of DenseNet201 pre-trained model.

Fig. 4. Whale Optimization Algorithm Based Feature Optimization

WOA [47] is utilized for feature optimization to reduce redundancy and irrelevant features. The optimization process comprises two steps. In the first step, the spiral position is updated, and the prey is circled. In the second step, a random search is performed for prey. Mathematical modeling of the presented step is as follows.

Encircling prey Whales find the location of prey and surround them. In the search space, the location of prey is unidentified. The WOA suppose the leading element is an optimal prey. The remaining search agents try to become an optimal agents by changing their location. The behaviors of search agents in presented as:

$\overrightarrow{Y}(u+1)=\overrightarrow{Y^*}(u)- \overrightarrow{B}∙\overrightarrow{E},$(1)

$\overrightarrow{E}=|\overrightarrow{C}∙\overrightarrow{Y^*}(u)-\overrightarrow{Y}(u)|,$(2)

where $\overrightarrow{Y^*}$(u) defines the whale's optimal location after iteration u. The current location of the whale is $\overrightarrow{Y}$(u+1) and the distance between whale and prey is presented by a distance vector $\overrightarrow{E}$. || represents the absolute value. The coefficient vectors $\overrightarrow{B}$ and $\overrightarrow{C}$ are calculated as:

\overrightarrow{B} =2∙\overrightarrow{b}∙\overrightarrow{s} +\overrightarrow{b}(3)

\overrightarrow{C}=2∙\overrightarrow{s}(4)

The value of $\overrightarrow{b}$ is reduced to apply to shrink and oscillating rang of $\overrightarrow{B}$ is reduced to $\overrightarrow{b}$. The value of $\overrightarrow{B}$ goes from (-b,b), and the value of b is reduced from 2 to 0 iterations. The location of the best agent and initial location of an agent is determined by selecting a random value of $\overrightarrow{B}$ between (-1,1).

Spiral position updating
The helix formation for prey tracking of whales is defined by calculating the distance between whale location (Y,Z) and prey location (Y^*,Z^* ). The forward movement towards prey is express as

$\overrightarrow{Y}(u+1)= e^{bk}∙cos⁡(2πk)∙\overrightarrow{E^*}+\overrightarrow{E^*}(u)$(5)

$\overrightarrow{E^*}=|\overrightarrow{Y^*}(u)-\overrightarrow{Y}(u)|$(6)

The shape of the logarithmic spiral is identified by constant b, and random number ranges from [-1,1]. The spiral movement of whales enables the whales to change their location while performing reduction. The chance of selection is 50% between spiral and shrinking encircling.

$\overrightarrow{Y}(u+1)= \cases{ \overrightarrow{Y^*}-\overrightarrow{B}∙\overrightarrow{E} & if p<0.5,\cr e^bk∙cos⁡(2πk)∙\overrightarrow{E^*}+\overrightarrow{Y^*}(u), & if p≥0.5, }$ (7)

Random number p ranges from 0 to 1. Prey search The prey searching stage is also known as the exploration process, which depends on the variations of vector $\overrightarrow{B}$. Whale performs a random search to find the prey according to the location. The locality of the whale makes the search agent run away from the searching whale. WOA utilizes the vector $\overrightarrow{B}$ with random values less than or greater than 1. In the exploration process, the search agent is selected randomly. The random selection makes the WOA a global search agent by reducing the local optimization problem. The global search is defined as:

$\overrightarrow{Y}(u+1)=∙\overrightarrow{Y_{rand}} -\overrightarrow{B}∙\overrightarrow{E},$(8)

$\overrightarrow{E}=|\overrightarrow{C}∙\overrightarrow{Y_{rand}}-\overrightarrow{Y}|,$(9)

where $∙\overrightarrow{Y_{rand}}$ represents a random whale selected from the given population. The WOA initializes the algorithm by giving random outcomes to the whale population by supposing the optimal solution of the function with minimum or maximum value.
The selected features of this step are analyzed by fitness function and noted that few features are still redundant andeffecting the accuracy of final classification. The ensemble subspace discriminant (ESD) classifier used as a fitness function and error is computed for each iteration. Therefore, we added one new step name,extra features approval (EFA). This step is based on the standard error of the mean (SEM). The value of SEM is passed in a threshold function for the final recognition. Formulation of this function as follows:

$SEM(y_i)=]frac{σ}{N}, σ=\sqrt{σ^2}, σ^2=E[(y_i-μ)^2]$(10)

$Selection=F(i)= \cases{ F_i^s for y_i≥SEM(y_i)\cr Remove, Elsewhere) }$ (11)

The final selected features F_i^s are passed into an ensemble subspace classifier for the final recognition. The process of feature selection is also given in Algorithm 1.
Algorithm 1. Improved whale optimization algorithm
Initial population $Y_i$ where (i=1,2,3,……n)
Fitness calculation for each solution
$Y^*$=best search agent
While(u<Max_iteration)
For every solution
Updated b,B,C,M and p
If1 (p<0.5)
If2 (/B/<1)
Update the current search agent location with respect to equation 1.
Else if2(/B/>1)
Random search agent selection $(Y_{rand})$
Current search agent location changes according to equation 7.
End if2
Else if1(p≥0.5)
Current search agent location changes according to equation 5.
End if1
End
Inspect movement of search agent if search agent goes away from search location and changes it.
Change $Y^*$ if the better solution is presented u=u+1
End While
Return $Y^*$
Refine using Eq. (10)-(11)
Output:$F_i^s$← Best Feature Vector

Results and Comparison

The propound classification technique is implanted on challenging image classification dataset Caltech101 [48]. After the first step (data augmentation), the dataset was refined. Originally, this dataset consisted of both grey and RGB images, which are captured at different environmental conditions. The presence of grey and RGB images makes the dataset challenging for recognition tasks. Extensive experiments were performed to evaluate the proposed classification technique with different training and testing ratios at (70:30, 80:20, and 50:50) at 10-fold, 15-fold, and 20-fold. Classification performed using various machine learning classifiers and top-rated selected based on their accuracy. The performance of the classifiers computed using accuracy, false-negative rate (FNR), and computational time. The Intel Core i7 8th generation with 16GB of ram and 8GB GPU configured system utilized for the classification task. The classification simulation was performed on MATLAB 2020a.

Results
Several experiments are executed to validate the robustness of propound method. During the experiments, different training and testing rations are considered with cross-validation ratios. The main objective of several experiments is to examine the variations in accuracy and rendering performance of the proposed framework.

Experiment 1: In this experiment, the proposed framework employed a 70:30 approach and 10-fold cross-validation. The classification results on different classifiers are given in Table 1. Different classifiers are employed to perform the recognition task, and a robust one is selected based on accuracy. Various performance evaluation measures like accuracy, FNR, and computational time are calculated to validate the proposed framework. An ESD classifier of 92.9% achieves the best classification accuracy with 7.3% FNR in 661 seconds. The accuracy of ESD is further validated using the confusion matrix, showing in Fig. 5. C-KNN achieved less computational as compared to other classifiers and classified the data into relevant classes in 20.968seconds. C-KNN attained the lowest accuracy of 82.9%, and the C-SVM classifier used the highest computational time of 3,374seconds compared to other classifiers.

Table 1. Classification results of proposed framework of object recognition at 10-fold cross-validation using 70:30 approach
Method Performance evaluation measures
Accuracy (%) FNR (%) Times (s)
ESD (proposed) 92.9 7.1 661
LDA 88.3 11.7 103
L-SVM 85 15 2,467
Q-SVM 86 14 3,374
C-SVM 85.9 14.1 3,328
E-KNN 84.3 15.7 218
C-KNN 82.9 18 20.968
W-KNN 83.3 16.7 36.347
G-naïve Bayes 84.5 15.5 177.02

Fig. 5. Confusion matrix of our method for ESD classifier at 10-fold cross-validation and 70:30 approach

Experiment 2: In the second experiment, the proposed framework is validating on 15-fold cross-validation using the 70:30 approach. Different cross-validation levels impact the system robustness and computation cost. Multiple classifiers were implemented to examine the classification accuracy of the propound framework. Classification results using different classifiers are given in Table 2. The results express the best classifier ESD with an accuracy of 93% in 651seconds computation time having 7.2% FNR. Based on the computational time, W-KNN is the best classifier of 23.332seconds. According to this experiment, it is noted that a minor increase has occurred in the accuracy.

Table 2. Proposed method classification results at 15-fold cross-validation using 70:30 approach
Method Performance evaluation measures
Accuracy (%) FNR (%) Times (s)
ESD (proposed)  93 7.2 651
LDA 88.3 11.7 109
L-SVM 85 15 2,317
Q-SVM 86 14 3,360
C-SVM 85.9 14.1 3,278
E-KNN 84.3 15.7 204
C-KNN 82 18 23.332
W-KNN 83.3 16.7 39.34
G-naïve Bayes 84.5 15.5 177.02
Experiment 3: In this experiment, the proposed framework is implemented on an 80:20 ratio for training and testing at 10-fold cross-validation. The classification results using this approach have been presenting in Table 3. Different classifiers are executedto analyze the classification accuracy. Classification accuracy, FNR, and computational time are calculated to validate the robustness of a proposed framework. The results show that ESD is the robust classifier with an accuracy of 93.7%, and the worst classifier having the lowest accuracy of 84.2% is C-KNN. The confusion matrix of robust classifier in terms of accuracy is showing in Fig. 6. In this experiment, it is noticed that the accuracy is improved, but a rise occurs in the computational time as compared to Experiment 1 and Experiment 2.

Table 3. Classification results of the proposed framework at 10-fold cross validation using 80:20 approach
Method Performance evaluation measures
Accuracy (%) FNR (%) Times (s)
ESD (proposed)  93.7 6.3 855
LDA 91.1 8.9 97.534
L-SVM 86.3 13.7 2,649.50
Q-SVM 86.6 13.4 3,438.30
C-SVM 87.4 12.6 3,714.40
E-KNN 85.7 14.3 286.8
C-KNN 84.2 15.8 25.338
W-KNN 84.6 15.4 24.195
G-naïve Bayes 85.4 14.6 176.47

Fig. 6. Confusion matrix of the proposed framework at 10-fold cross-validation using 80:20 approach.

Experiment 4: In this experiment, the proposed framework is validating using 10-fold cross-validation and n 80:20 approach. Several classifiers are applied, and ESD achieves the best accuracy of 93.6% with 6.4% FNR, as shown in Table 4. The rest of classifiers like LDA, L-SVM, Q-SVM, C-SVM, E-KNN, C-KNN, W-KNN, and G-naïve Bayes accuracies are 90.8%, 86.3%, 87%, 86.4%, 85.6%, 84%, 84.4%, and 85% correspondingly. W-KNN achieved the best classification time of 28.936seconds. Similarly, the 50:50 approach is employed at 10-fold cross-validation for the evaluation of the proposed framework. Results are given in Table 5, and ESD achieves the best accuracy of 90.5%. As compared with Table 4, it is noticed that the proposed accuracy is decreased. Based on the computation time, W-KNN executed fast in 16.43seconds.

Table 4. Classification results at 15-fold cross-validation using 80:20 approach
Method Performance evaluation measures
Accuracy (%) FNR (%) Times (s)
ESD  93.6 6.3 856.8
LDA 90.8 8.9 117.19
L-SVM 86.3 13.7 2,649.50
Q-SVM 87 13.4 3,320
C-SVM 86.4 12.6 5,253.70
E-KNN 85.6 14.3 311.4
C-KNN 84 15.8 43.747
W-KNN 84.4 15.4 28.936
G-naïve Bayes 85 14.6 259.35

Table 5. Classification results of proposed framework using 50:50 approach at 10-fold cross-validation
Method Performance evaluation measures
Accuracy (%) FNR (%) Times (s)
ESD  90.5 9.5 786.64
LDA 79.1 20.9 44.334
L-SVM 81.7 18.3 2,082.70
Q-SVM 82.3 17.7 2,829.20
C-SVM 82.6 17.4 2,611.40
E-KNN 83.3 17.7 156.44
C-KNN 79.6 20.4 19.32
W-KNN 79.3 20.7 16.43
G-naïve Bayes 81.2 18.8 225.63

Discussion
The extensive experiments presented the performance of our proposed classification framework using different training and testing ratios at different cross-validation levels. The proposed framework classification performance and comparison with state-of-the-art methods are elaborated in thissection. We utilized several training and testing approaches like 70:30, 80:20, and 50:50 and utilized different cross-validation levels like 10-fold and 15-fold. The best accuracy of 93.7%, achieved at 10-fold cross-validation using the 80:20 approach. ESD classifier achieved better results for all experiments. Also, the performance ESD classifier is compared with several relevant techniques. Hence, the proposed framework gives better results at10-fold cross-validation using the 80:20 approach.
The proposed method is compared with relevant techniques based on accuracy and computational time. Gopalakrishna et al. [49] came with a deep CNN layers fine-tuning method to classify the objects with 91.66% accuracy. Liu and Mukhopadhyay [50] utilized memory banks with up supervised learning techniques for object classification and achieved an accuracy of 91%. Rashid et al. [13] presented a deep CNN and SIFT feature fusion technique to classify the objects into their related classes and achieved 89.70% accuracy. Liu et al. [51] utilized mid-level CNN layers features for classification. Feature fusion performed on the middle CNN layers feature to perform recognition and achieved an accuracy of 92.20%. Hussain et al. [23] presented a classification method using classical and deep CNN feature fusion. The fusion method attained an accuracy of 90.1%. Our proposed utilized deep CNN features optimized using WOA. The implanted method attained an accuracy of 93.7% by outperforming the existing techniques using the Caltech101 dataset. Moreover, an overall representation is showing in Fig.7, which shows the proposed method performs better on the ESD classifier. In the last, Fig. 8 showing the impact of feature optimization step. This figure shows that the proposed optimization step increases the performance of each classifier by 3% on average.

Fig. 7. Comparison of results at different training and testing ratios at 10-fold cross-validation.

Fig. 8. Change in accuracy after feature optimization step.

Conclusion

An IoT based deep learning framework is presented in this work for object recognition from the static images. In the proposed framework, initially, data augmentation is applied to increase the strength of training purposes. The original dataset is highly imbalanced, and images in each class are not enough for successful training. The augmented dataset was considered the training purpose and showing better results as compared to the original dataset. In selecting a deep learning model, we choose the pre-trained Densenet201 CNN model based on fewer parameters. The features extracted from this model are in the higher dimension; therefore, we propose an improved WOA to select the best features. The features chosen from improved WOA shows improved accuracy for the 80:20 approach. Also, the proposed framework shows better results for the 70:30 approach and 10-fold cross-validation. The key limitation of this work is the data augmentation step. Using this step, some features are redundant, which are tried to remove in the feature selection step, but still, it exists a high chance of redundant features. In future studies, the proposed framework will be considered for real-time object recognition. Moreover, EfficientNet will be considered for feature extraction due to its lightweight architecture.

Acknowledgements

Authors are thankful to Computer Vision Lab, HITEC University, Taxila, Pakistan.

Author’s Contributions

Conceptualization, NH, MAK.Funding acquisition, JC, YN.Investigation and methodology, MAK, SK, UT, RRM.Project administration, SK, YN.Supervision, SK, YN.Writing of the original draft, MAK, SK.Writing of the review and editing, MA Khan, J. Choi and Y. Nam; Software, N. Hussain and MA Khan; Data Curation, JC, YN. All the authors have proofread the final version.

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2017R1D1A1B03035833, 2018R1D1A1B07042967) and the Soonchunhyang University Research Fund.

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Name: Nazar Hussain
Affiliation: HITEC University
Biography: Nazar Hussain earned his Master degree in Object Recognition using Deep Learning from COMSATS University Islamabad, Pakistan. He is currently research associate at Department of Computer Science, HITEC University Taxila, Pakistan. His primary research focus in recent years is Object recognition, Pedestrian identification, and Agriculture Plants. He has above 5 publications in well reputed journals.

Affiliation: HITEC University
Biography: Muhammad Attique Khan earned his Master and Ph.D degree in Human Activity Recognition for Application of Video Surveillance and Skin Lesion Classification using Deep Learning from COMSATS University Islamabad, Pakistan. He is currently Lecturer of Computer Science Department in HITEC University Taxila, Pakistan. His primary research focus in recent years is medical imaging, COVID19, MRI analysis, Video Surveillance, Human Gait Recognition, and Agriculture Plants. He has above 130 publications that have more than 2250 citations and impact factor 340+ with h-index 29. He is reviewer of several reputed journals.

Affiliation: Noroff University College
Biography: SeifedineKadry (Senior Member, IEEE) received the bachelor’s degree from Lebanese University, in 1999, the M.S. degree from Reims University, France, in 2002, the EPFL (Lausanne) and Ph.D. degrees from Blaise Pascal University, France, in 2007, and the HDR degree from Rouen University, in 2017. His current research interests include data science, education using technology, system prognostics, stochastic systems, and applied mathematics. He is an ABET Program Evaluator of computing, and an ABET Program Evaluator of Engineering Tech. He is a Fellow of IET, IETE, and IACSIT. He is a Distinguish Speaker of IEEE Computer Society.

Name: Usman Tariq
Affiliation: Prince Sattam Bin Abdulaziz University
Biography: Usman Tariq is currently a Skilled Research Engineer with a Ph.D. degree in information and communication technology in computer science with Ajou University, South Korea. His strong background is in ad hoc networks and network communications. He is experienced in managing and developing projects from conception to completion. He has worked in large international scale and long-term projects with multinational organizations. He is also attached with the College of Computer Engineering and Science, Prince Sattam bin Abdul-Aziz University as an Associate Professor. His research interests include span networking, object tracking, and security fields.

Name: Reham R Mostafa
Affiliation: Mansoura University
Biography: Reham R. Mostafa was born in Abu Dhabi, United Arab Emirates, in 1983. She received the B.Sc., M.Sc., and Ph.D. degrees in information systems from Mansoura University, Egypt, in 2005, 2009, and 2014, respectively. She is currently an Associate Professor with the Information Systems Department, Faculty of Computers and Information, Mansoura University. Her research interests include big data analytics, artificial intelligence, evolutionary algorithms, the Internet of Things, and data security.

Name: Jung-In Choi
Affiliation: AjouUniversity
Biography: Jung-In Choiearned his Master and Ph.D degree in Activity Recognitionfrom EwhaWomans University, Republic of Korea, in 2012, 2017.She is currently Lecturer ofApplied Artificial Intelligence in Ajou University Suwon, Republic of Korea.Her current research interests include activity recognition, data mining, and V2X.

Name: Yunyoung Nam
Affiliation: Soonchunhyang University
Biography:Yunyoung Namreceived the B.S., M.S., and Ph.D. degrees in computer engineering from Ajou University, Korea in 2001, 2003, and 2007 respectively. He was a Senior Researcher with the Center of Excellence in Ubiquitous System, Stony Brook University, Stony Brook, NY, USA, from 2007 to 2010, where he was a Postdoctoral Researcher, from 2009 to 2013. He was a Research Professor with Ajou University, from 2010 to 2011. He was a Postdoctoral Fellow with the Worcester Polytechnic Institute, Worcester, MA, USA, from 2013 to 2014. His research interests include image processing, pattern recognition, biomedical signal processing, and healthcare systems.

References

[1] H. B. Ly, T. T. Le, H. L. T. Vu, V. Q. Tran, L. M. Le, and B. T. Pham, “Computational hybrid machine learning based prediction of shear capacity for steel fiber reinforced concrete beams,” Sustainability, vol. 12, no. 7, article no. 2709, 2020.https://doi.org/10.3390/su12072709
[2] M. Rashid, M. A. Khan, M. Alhaisoni, S. H. Wang, S. R. Naqvi, A. Rehman, and T. Saba, “A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection,” Sustainability, vol. 12, no. 12, article no. 5037, 2020. https://doi.org/10.3390/su12125037
[3] F. Lin, D. Zhang, Y. Huang, X. Wang, and X. Chen, “Detection of corn and weed species by the combination of spectral, shape and textural features,” Sustainability, vol. 9, no. 8, article no. 1335, 2017.https://doi.org/10.3390/su9081335
[4] M. Bansal, M. Kumar, and M. Kumar, “2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors,” Multimedia Tools and Applications, vol. 80, no. 12, pp. 18839-18857, 2021. https://doi.org/10.1007/s11042-021-10646-0
[5] M. Bansal, M. Kumar, and M. Kumar, “2D object recognition techniques: state-of-the-art work,” Archives of Computational Methods in Engineering, vol. 28, no. 3, pp. 1147-1161, 2021.
[6] A. Dhillon and G. K. Verma, “Convolutional neural network: a review of models, methodologies and applications to object detection,” Progress in Artificial Intelligence, vol. 9, no. 2, pp. 85-112, 2020.
[7] M. A. Khan, Y. D. Zhang, M. Alhusseni, S. Kadry, S. H. Wang, T. Saba, and T. Iqbal, “A fused heterogeneous deep neural network and robust feature selection framework for human actions recognition,” Arabian Journal for Science and Engineering, 2021. https://doi.org/10.1007/s13369-021-05881-4
[8] H. Masood, A. Zafar, M. U. Ali, M. A. Khan, K. Iqbal, U. Tariq, and S. Kadry, “Optimization of correlation filters using extended particle swarm optimization technique,” Computational and Mathematical Methods in Medicine, vol. 2021, article no. 6321860, 2021. https://doi.org/10.1155/2021/6321860
[9] I. M. Nasir, M. Raza, J. H. Shah, M. A. Khan, and A. Rehman, “Human action recognition using machine learning in uncontrolled environment,” in Proceedings of 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 2021, pp. 182-187.
[10] Z. Huang, J. Zhang, and H. Shan, “When age-invariant face recognition meets face age synthesis: a multi-task learning framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7282-7291.
[11] A. A. Liu, H. Zhou, W. Nie, Z. Liu, W. Liu, H. Xie, Z. Mao, X. Li, and D. Song, “Hierarchical multi-view context modelling for 3D object classification and retrieval,” Information Sciences, vol. 547, pp. 984-995, 2021.
[12] L. Qiao, Z. Jing, H. Pan, H. Leung, and W. Liu, “Private and common feature learning with adversarial network for RGBD object classification,” Neurocomputing, vol. 423, pp. 190-199, 2021.
[13] M. Rashid, M. A. Khan, M. Sharif, M. Raza, M. M. Sarfraz, and F. Afza, “Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features,” Multimedia Tools and Applications, vol. 78, no. 12, pp. 15751-15777, 2019.
[14] S. H. Lee, W. F. Yu, and C. S. Yang, “ILBPSDNet: based on improved local binary pattern shallow deep convolutional neural network for character recognition,” IET Image Processing, 2021. https://doi.org/10.1049/ipr2.12226
[15] Q. Wu, Z. An, H. Chen, X. Qian, and L. Sun, “Small target recognition method on weak features,” Multimedia Tools and Applications, vol. 80, no. 3, pp. 4183-4201, 2021.
[16] S. Albahli, H. T. Rauf, A. Algosaibi, and V. E. Balas, “AI-driven deep CNN approach for multi-label pathology classification using chest X-Rays,” PeerJ Computer Science, vol. 7, article no. e495, 2021. https://doi.org/10.7717/peerj-cs.495
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017.
[18] M. Mateen, J. Wen, S. Song, and Z. Huang, “Fundus image classification using VGG-19 architecture with PCA and SVD,” Symmetry, vol. 11, no. 1, article no. 1, 2019. https://doi.org/10.3390/sym11010001
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 770-778.
[20] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 2017, pp. 2261-2269.
[21] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 2818-2826.
[22] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-ResNet and the impact of residual connections on learning,” in Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, 2017, pp. 4278-4284.
[23] N. Hussain, M. A. Khan, M. Sharif, S. A. Khan, A. A. Albesher, T. Saba, and A. Armaghan, “A deep neural network and classical features based scheme for objects recognition: an application for machine inspection,” Multimedia Tools and Applications, 2020. https://doi.org/10.1007/s11042-020-08852-3
[24] T. Saba, M. A. Khan, A. Rehman, and S. L. Marie-Sainte, “Region extraction and classification of skin cancer: a heterogeneous framework of deep CNN features fusion and reduction,” Journal of Medical Systems, vol. 43, article no. 289, 2019. https://doi.org/10.1007/s10916-019-1413-3
[25] A. Majid, M. A. Khan, M. Yasmin, A. Rehman, A. Yousafzai, and U. Tariq, “Classification of stomach infections: a paradigm of convolutional neural network along with classical features fusion and selection,” Microscopy Research and Technique, vol. 83, no 5, pp. 562-576, 2020.
[26] M. I. Sharif, J. P. Li, M. A. Khan, and M. A. Saleem, “Active deep neural network features selection for segmentation and recognition of brain tumors using MRI images,” Pattern Recognition Letters, vol. 129, pp. 181-189, 2020.
[27] H. T. Rauf, W. H. K. Bangyal, and M. I. Lali, “An adaptive hybrid differential evolution algorithm for continuous optimization and classification problems,” Neural Computing and Applications, vol. 33, no. 10841-10867, 2021.
[28] B. Jiang, C. Li, M. D. Rijke, X. Yao, and H. Chen, “Probabilistic feature selection and classification vector machine,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 13, no. 2, pp. 1-27, 2019.
[29] A. Almadhor, H. T. Rauf, M. A. Khan, S. Kadry, and Y. Nam, “A hybrid algorithm (BAPSO) for capacity configuration optimization in a distributed solar PV based microgrid,” Energy Reports, 2021. https://doi.org/10.1016/j.egyr.2021.01.034
[30] M. Rostami, K. Berahmand, and S. Forouzandeh, “A novel community detection based genetic algorithm for feature selection,” Journal of Big Data, vol. 8, article no. 2, 2021. https://doi.org/10.1186/s40537-020-00398-3
[31] J. Wang, M. Ye, F. Xiong, and Y. Qian, “Cross-scene hyperspectral feature selection via hybrid whale optimization algorithm with simulated annealing,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 2473-2483, 2021.
[32] D. Paul, A. Jain, S. Saha, and J. Mathew, “Multi-objective PSO based online feature selection for multi-label classification,” Knowledge-Based Systems, vol. 222, article no. 106966, 2021. https://doi.org/10.1016/j.knosys.2021.106966
[33] M. A. Khan, K. Muhammad, M. Sharif, T. Akram, and V. H. C. de Albuquerque, “Multi-class skin lesion detection and classification via teledermatology,” IEEE Journal of Biomedical and Health Informatics, 2021. https://doi.org/10.1109/JBHI.2021.3067789
[34] S. Loussaief and A. Abdelkrim, “Deep learning vs. bag of features in machine learning for image classification,” in Proceedings of 2018 International Conference on Advanced Systems and Electric Technologies (IC_ASET), Hammamet, Tunisia, 2018, pp. 6-10.
[35] E. Cengil, A. Cinar, and E. Ozbay, “Image classification with caffe deep learning framework,” in Proceedings of 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, 2017, pp. 440-444.
[36] H. Guo, J. Liu, Z. Xiao, and L. Xiao, “Deep CNN-based hyperspectral image classification using discriminative multiple spatial-spectral feature fusion,” Remote Sensing Letters, vol. 11, no. 9, pp. 827-836, 2020.
[37] A. Mahmood, M. Bennamoun, S. An, and F. Sohel, “Resfeats: residual network based features for image classification,” in Proceedings of 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017, pp. 1597-1601.
[38] G. Jemilda and S. Baulkani, “Integration of new moving object segmentation and classification techniques using optimal salp swarm-based feature fusion with linear multi k-SVM classifier,” EURASIP Journal on Image and Video Processing, vol. 2020, article no. 20, 2020. https://doi.org/10.1186/s13640-020-00511-9
[39] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, “Inception recurrent convolutional neural network for object recognition,” Machine Vision and Applications, vol. 32, article no. 28, 2021. https://doi.org/10.1007/s00138-020-01157-3
[40] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014 [Online]. Available: https://arxiv.org/abs/1409.1556.
[41] M. Rashid, M. A. Khan, M. Sharif, M. Raza, M. M. Sarfraz, and F. Afza, “Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features,” Multimedia Tools and Applications, vol. 78, no. 12, pp. 15751-15777, 2019.
[42] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097-1105, 2012.
[43] A. Lukman and C. K. Yang, “An object recognition system based on convolutional neural networks and angular resolutions,” Multimedia Tools and Applications, vol. 80, no. 10, pp. 16059-16085, 2021.
[44] F. Saeed, M. A. Khan, M. Sharif, M. Mittal, L. M. Goyal, and S. Roy, “Deep neural network features fusion and selection based on PLS regression with an application for crops diseases classification,” Applied Soft Computing, vol. 103, article no. 107164, 2021. https://doi.org/10.1016/j.asoc.2021.107164
[45] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, “ImageNet: a large-scale hierarchical image database,” in Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, Miami, FL, 2009, pp. 248-255.
[46] M. A. Khan, T. Akram, Y. D. Zhang, and M. Sharif, “Attributes based skin lesion detection and recognition: a mask RCNN and transfer learning-based deep learning framework,” Pattern Recognition Letters, vol. 143, pp. 58-66, 2021.
[47] S. Mirjalili and A. Lewis, “The whale optimization algorithm,” Advances in Engineering Software, vol. 95, pp. 51-67, 2016.
[48] J. Ponce, T. L. Berg, M. Everingham, D. A. Forsyth, M. Hebert, S. Lazebnik, et al., “Dataset issues in object recognition,” in Toward Category-Level Object Recognition. Heidelberg, Germany: Springer, 2006, pp. 29-48
[49] R. Gopalakrishnan, Y. Chua, and L. R. Iyer, “Classifying neuromorphic data using a deep learning framework for image classification,” in Proceedings of 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, 2018, pp. 1520-1524.
[50] Q. Liu and S. Mukhopadhyay, “Unsupervised learning using pretrained CNN and associative memory bank,” in Proceedings of 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 2018, pp. 1-8.
[51] X. Liu, R. Zhang, Z. Meng, R. Hong, and G. Liu, “On fusing the latent deep CNN feature for image classification,” World Wide Web, vol. 22, no. 2, pp. 423-436, 2019.

Nazar Hussain1, Muhammad Attique Khan1, Seifedine Kadry2, Usman Tariq3, Reham R. Mostafa4, Jung-In Choi5,*, and Yunyoung Nam6,*, Intelligent Deep Learning and Improved Whale Optimization Algorithm Based Framework for Object Recognition, Article number: 11:34 (2021) Cite this article 5 Accesses

• Recived29 January 2021
• Accepted2 August 2021
• Published30 August 2021