ArticlesAll Issue
• Omar M. Salim1, Khaled M. Fouad2,3, and Basma M. Hassan4,*

Human-centric Computing and Information Sciences volume 12, Article number: 18 (2022)
https://doi.org/10.22967/HCIS.2022.12.018

Abstract

Wireless sensor networks (WSNs) have garnered much attention in the last decades. Nowadays, the network contains sensors that have been expanded into a more extensive network than the internet. Cost is one of the issues of WSNs, and this cost may be in the form of bandwidth, computational cost, deployment cost, or sensors’ battery (sensor life). This paper proposes a dual-level sensor selection (DLSS) model used to reduce the number of sensors forming WSNs. The sensor reduction process is performed at two consecutive levels. First, a combination of the Fisher score method and ANOVA test at the filter level weighs all the network sensors and produces only a reduced set of sensors. Additionally, the grey wolf optimizer algorithm produces the optimum sensor subset, while an adaptive sensor recovery solution is proposed to extend the network lifetime even longer using sensors failure management. The proposed model performance is evaluated using four different datasets. In comparison with to other similar methods, the results indicated that the proposed model achieved a more efficient subset of sensors preserving a high accuracy rate.

Keywords

WSNs’ Lifetime, Filter-Wrapper Methods, Sensor Selection, Machine Learning, Accuracy, Adaptive Sensor Recovery

Introduction

Wireless sensor networks (WSNs) [1, 2] can be described as a networked set of sensor nodes, small-sized devices with very restricted resources such as memory and power supply. In traditional sensor networks, each node must monitor physical environmental prerequisites such as sound, temperature, pressure, humidity, motion, vibration, light, and many other measurements. In WSNs, sensor nodes jointly operate to achieve a specific purpose due to the nature of monitored parameters and the vast amount of sensors deployed. The information generated from the sensor network is completely interconnected, and reporting each sensor reading is a waste of energy resources [3]. Sensor selection is equal to the feature selection in data mining tasks, where the number of selected sensors is the number of selected features for the process. However, the main problem with big data is the massive set of features obtainable, and only a limited portion of them would only help distinguish samples belonging to various classes.
In contrast, many other sensors are unrelated, noisy, or excessive data. Unrelated sensor readings can be considered as noise in extensive data analysis, which maximizes the dataset dimension and computational complexity in classification and clustering processes, thereby reducing the classification accuracy rate. Based on this, it is vital to choose the appropriate sensors only, and redundant sensors should be dropped from the dataset in sensor selection since another subset of sensors supplies the same information produced by those redundant sensors [4]. Furthermore, noisy sensors that don’t add extra information about labels may also be dropped to increase the model efficiency. Accordingly, a method is needed to identify various sensors, compute relations among sensors, and select relevant sensors from a vast amount of datasets. Sometimes, sensor selection aims to conserve the energy use of sensors [5] and extend the lifetime duration of all network sensors rather than reduce the energy consumption of every single sensor. The performance of WSNs may be improved through the efficient management of sensor network settings that do not affect sensor readings [6]. The benefits of adopting machine learning algorithms in WSNs are explained through assistance in disposing of unneeded remodel issues. Accordingly, the benefits of adopting machine learning algorithms in WSNs are explained based on assistance in disposing of unneeded remodeling issues. Along with this, several reasons have tackled the significance of adopting machine studying algorithms in WSNs’ environmental applications, while Fig. 1 shows some WSN applications with the importance of using machine learning.

Fig. 1. Machine learning significance in WSNs.

However, a few drawbacks and boundaries should be considered when using machine learning techniques in WSNs [7]. This paper assumes that sensors in a wireless network are only activated on demand by the base station (sink node) of the underlying system (platform). Therefore, classifier training and validation are executed on the base station as the machine-learning techniques increase energy consumption if they are implemented separately in each sensor or fnode of the network. Moreover, each sensor measures one environmental feature and returns its value to the base station, where the proposed dual-level sensor selection (DLSS) is executed.
This paper has proposed three contributions as follows. The first contribution is a DLSS model, which aims to reduce the number of sensors and minimize the sensors’ energy consumption, thereby increasing the network’s lifetime while maintaining a proven accuracy rate. The second contribution demonstrates the effectiveness of the proposed model by conducting four different and extensive experiments to conduct. Then, the third contribution proposes an adaptive sensor recovery (ASR) solution to manage sensor failures using an alternative number of sensors without compromising accuracy.
This paper is organized as follows: Section 2 describes the literature review and the related work in the field of study. Section 3 deals with the proposed architecture and system model. Then, Section 4 demonstrates the experimental results and compares the proposed model and other meta-heuristic algorithms, as well as explaining a solution to deal with sensor failure. Lastly, Section 5 concludes the research findings and presents future recommendations.

Related Work

Literature Review
WSNs’ sensor selection schemes
Schemes of sensor selection are applied to choose sensors based on selection criteria such as coverage, localization, target tracking, single task, and multitasking schemas. Table 1 describes the purposes and categories of sensor selection schemes in WSNs based on usage [828]. Accordingly, this research deals with the selection for static sensor nodes and single task assignment areas.

Table 1. Purpose of sensor selection schemes in WSNs
Purpose Category Description Ref.
Coverage  Selection for static nodes -Goal: conserve energy and prolong the network lifetime.
-Solution: a subset of sensors must be active, while the rest be in sleep mode.
[8], [9], [10], [11], [12], [13]
Selection for mobile nodes  -Goal: cover a hole.
-Solution: relocate nodes to ensure coverage.
[14], [15], [16], [17]
Target trackingand localization Entropy-based solutions -Goal: a measure of uncertainty.
-Solution: heuristic, mutual information.
[18], [19], [20]
Dynamic information driven solutions  -Goal: improve detection quality, track quality, scalability, survivability, and resource usage.
-Solution: use optimization algorithms defined in terms of information gain and cost.
[21], [22]
Mean squared error-based solutions  -Goal: minimizing mean squared error.
-Solution: replace an active node with an inactive one
[23], [24]
Single-task mission - -Goal: select the sensor nodes that are most useful for the mission.
-Solution: select the most cost-effective sensor set.
[25], [26]
Multiple-task mission - -Goal: cover the maximum number of targets with the minimum number of active sensors.
-Solution: use greedy heuristic algorithms.
[14], [17], [27], [28]

Meta-heuristics algorithms used in WSNs
The choice of this meta-heuristic algorithm [29, 30] depends on the algorithm behavior, issue type, time limitation, availability of resources, and required accuracy. While different reviews have been mentioned in the case of using nature-inspired algorithms in WSNs, only some of them are directed at demonstrating their use in WSNs. Table 2 indicates some optimization algorithms used to reduce energy consumption, prolong lifetime duration, and select sensors in WSNs, such as particle swarm optimization (PSO) [29] and genetic algorithm (GA) [34].

Energy conservation and lifetime extension of WSNs
Recently, approaches based on machine learning and intelligent energy-saving models have been proposed for conserving the energy use of WSNs. Table 3 shows some of the sensor selection methods in WSNs [3540], indicating the method used to choose a subset of sensors[2934]. What’s more, in this study, the used datasets, the implementation environment, and proposed solution & limitations are specifically addressed.

Table 2. Related optimization algorithms used in WSN
Algorithm Problem domain Problem solution Ref.
PSO Optimal coverage Assures maximum lifetime [29]
Energy efficient
Clustering and routing
Minimizes the energy spend and maximizes the data transmission rate [30]
GA Energy-efficient and clustering and routing Reduces the average power consumption  [31]
Optimal coverage Proposed energy-efficient coverage control algorithm (ECCA) [32]
Data aggregation Maximizes the network lifetime
Increases the network lifetime by balancing the data load throughout the network
[33], [34]

Table 3. Some of the sensor selection techniques used in WSNs
Ref. Method Dataset Implementation Solution specification Limitation
[35] Naïve Bayes ISOLET, Ionosphere, Covertype MATLAB Increase the lifetime of WSNsand manage sensor failures  Not addressed
[36] MLP Ionosphere, Covertype, Sensor discrimination MATLAB Determine classification modelfor efficient energy in WSNs Not addressed
[37] SVM, MLP, Naïve Bayes Sensor discrimination, Ionosphere, Covertype MATLAB Determine appropriate intelligent classification  Not addressed
[38] PISAE Sensor discrimination, Ionosphere, Covertype PyTorch Prolong the lifetimeof WSNs Not addressed
[39] Filter-wrapper feature selection Light, temperature, infrared motion, door sensor MATLAB Save energy usingsimulated annealing  Lack of scalability in the real world
[40] Filter-wrapper sensor selection Sensor discrimination, Ionosphere, Covertype, Isolet PyThon Find the best sensor subsets sensor selection High-dimensional and real dataset
MLP=multi-layer perceptron, PISAE=partly-informed sparse autoencoder.

Background
Classification methods
The classification algorithms used to evaluate the proposed model performance are k-nearest neighbor (k-NN) [41], support vector machine (SVM) [42], random forest (RF) [43], logistic regression (LR) [44], decision tree (DT) [45], and extra tree (ET) [46]. When entering a sample for classification, the final classification result is determined by voting on the classifier’s output to overcome the overfitting problem and parallelism in the significant dimensional data classification problem.

Filter methods
Filter methods are regarded as pre-processing methods and unbiased on the classifier. The sensors subset is generated through calculating the affiliation between the system input and output, and sensors are ranked according to their relevancy to the goal through evaluating statistical tests. The principal advantage of filter methods lies in their low computational complexity, making them fast and suitable for complicated and massive datasets. The filter methods adopted are a combination of the Fisher score [47] and analysis of variance (ANOVA) [48].

Grey wolf algorithm
Grey wolves are considered among the most impressive predators which are highly skilled at catching their prey because they live in a rigorous and orderly group. As of late, grey wolf optimizer (GWO) [49, 50] is an algorithm that has been suggested for simulating the attitude and behaviors of the grey wolves in searching, encircling, and hunting their targets or prey. The GWO’s main procedures are used to address the sensor selection issues such as the initialization, evaluation, and transformation function.

Proposed Model

This paper addresses two main challenges as follows. The first challenge is to select the sensors in WSNs while maintaining high accuracy to extend the network’s life. The second challenge is to tolerate the faulty sensors of WSNs. This solution is adaptable by recovering faulty sensors and then replacing them with the optimal set of alternatives based on either of two scenarios. Fig. 2 shows the general structure of the proposed model, which mainly consists of two main contributions. The first contribution represents the sensor selection challenge and how to present the optimum solution through the proposed DLSS model. The second contribution illustrates how this proposed model deals with faulty sensor issues using the two scenarios to recover these faulty nodes through the proposed ASR approach.

Fig. 2. General structure of the proposed approach.

Data Pre-processing Stage
The first step is to obtain the related dataset in the pre-processing procedure. This dataset should consist of the data collected from several WSNs’ environment resources. Four datasets have been examined and obtained from the public machine learning repository of the University of California, Irvine (UCI) [51], and it is vital to correctly identify and handle the missing records. Otherwise, that may extract an inexact and wrong conclusion out of the data. Missing values in the used dataset can be calculated using modern imputation methods such as [52] to enhance the performance and accuracy of the proposed model. This imputation method integrates fuzzy c-means, k-NN, and iterative imputation algorithms to compute the missing values for the dataset.

First Challenge
The first challenge is to prolong the WSNs’ lifetime duration using the proposed model. This model consists of pre-processing data DLSS and performance evaluation as shown in Fig. 3. The missing values in the used data set can be calculated using modern imputation methods such as [52] to enhance the performance and accuracy of the proposed model. This imputation method integrates fuzzy c-means, k-NNs, and iterative imputation algorithms to compute the missing values for the dataset.

Fig. 3. Data pre-processing, DLSS approach and performance evaluation.

First level of sensor selection
The first level of sensor selection is the filtering method to rank the entire network sensors and then, finding the mean value (threshold) by dividing the entire output matrix by the used number of sensors. The filter methods are the Fisher score and ANOVA test with each method assigning an importance rank for all sensors concerning the class label. Therefore, the outputs of the two methods are normalized to gain values in the range (0, 1) using the min-max procedure to improve the efficiency by decreasing variations between sensors. After normalization, the results are appended and saved in a matrix. Furthermore, all sensors are dropped with scores below the threshold, producing a subset of sensors as shown in Algorithm 1.

Second level of sensor selection
Applying the GWO algorithm corresponds to the second level of sensor selection to solve classification issues based on the wrapper methods. The sigmoid function converts the continuous search space into a binary space to match the sensor selection binary type. The GWO algorithm is utilized to enhance the exploitability of the sensor selection stage by reducing the number of selected sensors while maintaining a high accuracy of classification, as well as adding extra informational sensors that improve accuracy. Fig. 4 represents a potential solution for a network containing ten sensors, coupled with an initialization of the search agents or wolves (n). Initially, each potential selection is configured with binary values (0's or 1's). Therefore, some sensors are selected (representing a value of 1) and others are rejected (representing a value of 0).

Fig. 4. Representation of the potential solution.

Each agent represents a potential solution with dimension $(d)$ equivalent to the previous first-level output selection. The sensor selection problem for the WSN classification purpose is like selecting a small of the potential sensors to maximize the accuracy of a classification and also extend the network lifetime duration.
The sensors selection is a multi-objective issue since it should reduce the number of selected sensors and maximize the accuracy for a specific classifier. Therefore, the fitness function is proposed to balance the two previous objectives for evaluating the potential solutions (1).

$fitness function=αε_r (D)+β \frac{|s|}{|d|}$(1)

$ε_r (D)$ is the classification error rate for the state attribute group $(r)$ comparative to the decision $(d)$. It can also be calculated by the classifier $k-NN$, and $|s|$ is the selected sensor set length|d|represents the original number of sensors in the network. The parameters $α$ and $β$ represent the weight that matches the significance of classification accuracy. The selected sensor set range α takes values between $[0, 1]$, and $β$ can be calculated as $(1–α)$. The critical weight and impact can be attributed to the classification accuracyinstead of the number of selected sensors. Assuming that the fitness function considers the classification accuracy, in this case, the solutions with the same accuracy but fewer selected sensors represent a significant factor in the high dimensionality reduction problem, which is certainly neglected.

Performance Evaluation
This stage’s result exhibits the proposed DLSS performance and efficacy using the lifetime extension factor (LEF) and classification accuracy phrases. The LEF is the ratio between the total number of network sensors and number of selected sensors used by DLSS for a given dataset and computed in (2).

$LEF= \frac{Total number of network sensors}{Number of used sensors}$(2)

Understandably, the minimum LEF value is limited to 1, reflecting the fact that all network sensors are used in each classification step, and this case reveals the highest possible accuracy rate scale. Conversely, if fewer sensors are selected for that particular network, the LEF value would be greater than 1. As a result, the network’s lifetime would be extended. Additionally, one of the research assumptions is that each sensor can be appropriately used several times before it becomes unavailable. That also reflects the lifetime duration of the sensor based on its energy consumed for measurement and network connectivity. Therefore, the point of view here is focused on conserving WSN energy from depletion by dropping irrelevant or redundant sensors. LEF is inversely proportional to the network energy consumption. The higher the LEF value is, the lower the network energy consumption becomes. If DLSS could increase the LEF, then the network energy conservation is achieved while keeping network functionality and accuracy within the accepted limits.

Second Challenge
WSNs always operate in hostile or unattended environments. As a result, it is easy for sensor nodes in WSNs to fail due to energy depletion or intentional attacks and natural disasters [53]. Furthermore, failed sensor nodes would decrease the network coverage, fragment the connected network, and lead to complete paralysis of the global network. For example, if multiple sensor nodes malfunction, losing detection of volcanic fault activity with faulty readings, this would lead to undue panic or fatalities due to a lack of warning. It is fundamental to detect faulty nodes before performing the necessary recovery procedures to guarantee a high quality level of service. WSN fault detection [54, 55] is a technique that identifies an error when it occurs and identifies the fault’s type and location.
The second challenge solves the fault-tolerance problem in the proposed DLSS model. Fault tolerance is the capability of a network to provide a functionality level without interruption, even if there are network faults. Therefore, network fault tolerance is one of the essential issues in WSNs, or else sensor nodes that are faulty would affect the entire network.
Therefore, WSNs can run smoothly and extend their lifespans, while an ASR solution has been proposed to recover the faulty sensor issues in the model. This ASR is divided into scenario one and scenario two, respectively. Accordingly, in the proposed model, the first level of sensor selection produces a reduced set of sensors (M). After the second level of sensor selection, a suboptimal set of sensors (N) is selected. Additionally, if faulty sensors are detected for any reason, the proposed ASR would be operated to the number of failed nodes for more stable performance. Based on this, the following sections introduce the two proposed solutions to ensure fault tolerance.

First scenario
As shown in Fig. 5, Scenario 1 can help solve the fault-tolerance problem, and when the fault detection algorithm detects an error, the following steps are used to retrieve and re-operate the network with a new sub-optimal alternative of standby sensors. Step 1 defines and determines the number of faulty sensors (K), Step 2 determines the number of healthy sensors, which are defined by (3), to calculate the number of healthy sensors, and then Step 3 drops the number of a suboptimal set of sensors (N). Step 4 selects (K) standby nodes instead of the faulted ones from the available pool in the subset (M). The recovery sensors would be the highest score after dropping (N). Then, Step 5 selects and defines the number of sensor recovery nodes to retrieve new suboptimal sensors (N) after the ASR solution. This selected sensor group is the new suboptimal node used as the best sensor subset.

Fig. 5. Scenario 1 of the proposed ASR solution.

Fig. 6. Scenario 2 of the proposed ASR solution.

Second scenario
In this paper, another solution is proposed for recovering the fault nodes, maintaining and extending the network’s lifetime duration. This scenario employs a K-means algorithm for clustering the node’s output of the proposed DLSS model. Fig. 6 explains the steps to retrieve the new suboptimal sensors after the ASR solution. Steps 1 and 2 in the second scenario are similar to Steps 1 and 2 mentioned earlier in the first one. The proposed ASR clusters the reduced set of sensors (M) and the suboptimal set of sensors (N) using the K-means algorithm. Step 4 selects and defines the available nodes found in M and not in N clusters using cluster dissimilarity to compare each output cluster from (M) and each corresponding cluster in (N) and thereby obtain the standby ones.
Then, Step 5 retrieves new suboptimal sensors (N) after the ASR solution. After substitution, this group of selected sensors would be the nodes in each cluster.

Experimental Results and Discussion

All experiments were performed on AMD A10-8700P Radeon R6, 1.80 GHz, and an 8 GB of RAM for a 64-bit Windows 10 operating system. The software implementation has been performed on Anaconda 2019 using Python version 3.7.

Dataset Description
The four datasets used in this paper are downloaded from the UCI machine learning repositories [51]. Sensor Discrimination is a labeled dataset and has three classes of “group A,”“group B,” and “false alarm.” The Ionosphere dataset is a radar dataset gathered through Goose Bay, Labrador. It is grouped with a valuable resource of 34 actual sensor nodes and has two classes for radar signal. A Covertype dataset has been developed at the University of Colorado and is used for predicting unknown regions. Additionally, Isolet is a large dataset divided into five parts, which are specified as Isolet 1, 2, 3, 4, and 5. In this paper, only the part Isolet 5 was considered with only 7,797 instances and 617 features due to the memory size limitations in a simulation. Table 4 illustrates a brief description of the datasets.

Table 4. Brief description of datasets
Datasets Sensors Instances Classes
Sensor Discrimination 12 2,211 3
Ionosphere 34 351 2
Covertype 55 581,012 7
Isolet 5 617 7,797 26

Results of the First Challenge
Parametersetting
The performance of different algorithms is investigated to solve sensor selection issues in comparison to the proposed DLSS. Since every algorithm works twenty separate runs with random seeds, all parameters were taken from the literature to ensure a fair comparison between the algorithms. Table 5 shows GWO parameters used in the proposed DLSS parameters.
Table 6 compares the DLSS model and GWO concerning the complete datasets and demonstrates the superiority of DLSS with a considerable difference in the number of sensors selected and LEF with a slight difference in classification accuracy. GWO uses 6 of 12 sensors in the sensor discrimination and achieves a 99% accuracy rate. DLSS uses a fewer five sensors and achieves a 98.8% accuracy rate, nearly the closest to the GWO results. Furthermore, DLSS operates 3 of 33 sensors with an accuracy rate of 90.6% less than GWO, which operates a larger number of sensors, namely 11 of 33 sensors, with a 91%accuracy rate at the Ionosphere dataset. For the Covertype datasets, the proposed DLSS uses half of the selected number of sensors used by GWO. Finally, DLSS reduces the number of network sensors from 617 to 100 with an acceptable accuracy.

Table5. GWO parameters used in the proposed DLSS model
Parameter Value Comment
ITmax 30 Total Iteration number
n 5 Number of search agents
Tr 20 Number of independent runs
k 10 k-value in k-fold cross validation
α 0.01 Alpha value є [0,1]
β 0.99 Beta value= 1- α
d # Sensors Dimension
K 5 k-value in KNN
U 1 Upper value
L 0 Lower value
A [0,2] Coefficient
Mp 0.5 Mutation probability

Table 6. Comparison based on the classification accuracy and the number of selected sensors
 FULL GWO DLSS ACC (%) #Sensors LEF ACC (%) #Sensors LEF ACC (%) #Sensors LEF Sensor Discrimination 99 12 1 99 6 2 98.8 5 2.4 Ionosphere 92 33 1 91 11 3 90.6 3 11 Covertype 98 55 1 96.5 12 4.5 94 7 7.85 Isolet 90 617 1 85 120 5.14 78 100 6.17
Fig. 7 demonstrates the comparison between the six machine-learning techniques used in this paper to evaluate the model performance. Taylor chart represents the geometric relationship among standard deviation (STD), correlation, and the middle RMS divergence, while all three could be plotted together. Taylor charts are used to visually summarize how closely a set of models converge with observations. This can be realized by plotting the STD of model values and the relationship between the model values and observations. STD of model values should be as close as possible to the observations. The correlation coefficient measures the linear dependence strength among any two variables and has to be close to 1. Meanwhile, RMSE measures the variances between the expected and observed values by the model, while it is an ideal measure of accuracy and has to be as small as possible.
In Fig. 7(a), k-NN algorithm score yields the lowest RMSE (below 0.008) with the highest correlation coefficient value being close to (0.99), and ET yields the lowest STD value (0.012). DT is the highest STD value w.r.t the observed value (the STD of the target actual output). These results suggest that the KNN algorithm is the closest reference STD to implement the sensor discrimination dataset. Fig. 7(b) demonstrates that the LogReg algorithm has the highest STD (0.095), ET the lowest STD, and k-NN the lowest RMSE value w.r.t the observed value. These results suggest that the ET, KNN, and RF algorithms are nearly the closest values and better than the others in the Ionosphere dataset.
Fig. 7(c) indicates that RF and DT are the closest value in STD (0.035 for DT & 0.029 for RF), correlation (0.74 for DT & 0.75 for RF), and RMSE (below 0.038) to the observed value. These results elect RF and DT more than the other Covertype dataset algorithms. Fig. 7(d) shows that the LogReg and SVM are the nearest classifiers to the observed STD, correlation, RMSE values, and the suggested algorithms for the Isolet dataset.

Fig. 7. Taylor diagram compares the performance of different machine-learning techniques: (a) Sensor Discrimination, (b) Ionosphere, (c) Covertype, and (d) Isolet.

2Comparison with other meta-heuristics
The performance of DLSS proposed model is compared to the well-acclaimed literature review of other optimization algorithms, like GWO, PSO, GA, differential evolution (DE) algorithm [56], cuckoo search (CS) algorithm [57], whale optimization algorithm (WOA) [58], and then salp swarm algorithm (SSA) [59]. The initial parameters setting are the same for these optimization algorithms to ensure a fair comparison. Some performance measures are used to evaluate the performance of the algorithms: the classification accuracy, LEF, and the number of selected sensors.

Fig. 8. Convergence curve for datasets: (a) Sensor Discrimination, (b) Ionosphere, (c) Covertype, and (d) Isolet.

Fig. 8 shows the accuracy comparison between the proposed DLSS, GWO, PSO, GA, DE, SC, WOA, and SSA. The convergence speed evidence can be estimated as to how the fitness function decreases over the number of iterations (n=30). From these graphs, DLSS can determine the best solutions in less than 15 iterations for sensor discrimination and twenty iterations for the ionosphere. The Covertype achieves the best solution in ten iterations and the Isolet dataset does so in fifteen iterations. Therefore, this fact proves the accomplishment and accepted performance of DLSS over other algorithms.
Fig. 9 shows the accuracy comparison between the proposed DLSS and (GWO, PSO, GA, DE, SC, WOA, and SSA) and the original accuracy from using the same dataset. The best accuracy obtained for the sensor discrimination dataset among the previously mentioned optimization algorithms is an SSA = 0.996. The worst accuracy is a PSO = 0.984. Although the reduction occurs in sensor selection, the proposed DLSS is 0.988 and reaches the fourth rank in the accuracy results. In the Ionosphere dataset, the best accuracy obtained is a PSO and DE = 0.915, the worst accuracy is an SSA = 0.886, and the proposed DLSS is 0.906 and reaches the fourth rank. In addition, the best accuracy obtained for the Covertype dataset is obtained through a WOA = 0.967, the worst accuracy is a PSO = 0.984, and the proposed DLSS is 0.885. For Isolet, the best accuracy is obtained through a GWO = 0.90.

Fig. 9. Compared results for DLSS versus other algorithms.

Fig. 10 represents the number of selected sensors for different meta-heuristic algorithms. It is noteworthy that the proposed DLSS shows the lowest number of sensors selected in comparison to other algorithms which yield the highest LEF possible with acceptable accuracy. The worst one is an SSA, with eight out of twelve sensors for the sensor discrimination dataset. In addition, the proposed DLSS selects 3, and the worst one is a PSO that selects 16 out of 34 sensors for the Ionosphere dataset. For Covertype, the DLSS chooses 7, and SSA chooses 27 out of 55 sensors. Furthermore, DLSS selects 100, and WOA selects 334 out of 617 in the Isolet dataset, which indicates that DLSS is the lowest algorithm for sensor selection.

Fig. 10. Compared results of DLSS versus other algorithms based on the number of selected sensors.

The proposed model DLSS is compared to other meta-heuristic algorithms based on LEF, which is computed using (1) from four different datasets as shown in Fig. 11. If all network nodes are used, the LEF would equal one, and WSNs would operate with 100% sensors. However, when other meta-heuristic algorithms use the sensor selection scheme to select the best rich-informational sensors, WSNs would operate with fewer sensors to increase LEF, reducing the network’s energy consumption. The proposed DLSS model achieves a better LEF compared to other algorithms and increases LEF by 2.4, 11.3, 7.8, and 6.2 times for the sensor discrimination, Ionosphere, Covertype, and Isolet datasets, respectively.

Fig. 11. LEF comparison for DLSS versus other algorithms.

3 Results of scenario 1
As illustrated in Fig. 5, Scenario 1 solves the ASR challenge based on direct faulty sensor replacement. The Ionosphere dataset has been selected as a case study for both scenarios, for which Table 7 shows the scenario one output and number of sensors at each step to indicate the proposed ASR solution. The DLSS model selected 15 nodes (out of 34 nodes) at the first level of sensor selection (filter methods) and then selected ten nodes in the second level of sensor selection (optimization algorithm). For instance, if nodes f17, f26, and f28 fail, the number of healthy nodes would be 7 nodes. Accordingly, based on scenario one, ASR selected three alternative nodes, namely f11, f15, and f21. These substitution nodes have the highest score based on the first selection stage. Finally, ASR would combine the healthy and newly selected K nodes for retrieving the suboptimal sensors (N). This scenario considers a straightforward solution to maintain the network LEF and performance based on a classification accuracy rate of 91.5% before the sensors’ failure. However, during the failure, it dropped to 83%, but after applying it, ASR has been restored to 90.4%.

Table 7. ASR solution based on Scenario 1 for Ionosphere dataset
Selection Selected sensor # sensors
First level selection (M) [f11, f13, f15, f17, f19, f20, f21, f22, f23, f24, f25, f26, f27, f28, f32] 15
Second level selection (N) [f13, f17, f19, f20, f23, f24, f26, f27, f28, f32] 10
Healthy sensors [f13, f19, f20, f23, f24, f27, f32] 7
Faulty sensors (K) [f17, f26, f28] 3
Selected K [f11, f15, f21] 3
Sensor recovery [f11, f13, f15, f19, f20, f21, f23, f24, f27, f32] 10

Results of scenario 2
As illustrated in Fig. 6, Scenario two solves the ASR challenge based on K-means clustering for faulty sensor replacement, for which Table 8 shows scenario two and the number of sensors at each selection step. Upon fault detection, the proposed ASR adapts and clusters the selected nodes (M), the output of the first selection level. This step groups the network into three clusters—i.e., cluster 0 contains three sensors defined as f23, f25, f27; cluster 1 contains six sensors defined as f20, f22, f24, f26, f28, f32; and cluster 2 contains six nodes defined as f11, f13, f15, f17, f19, f21. Therefore, all selected nodes in this step are 15 nodes. Afterward, the second sensor selection level then selected all nodes in cluster 0, only two nodes f26, f28 in cluster 1, and five out of six nodes in cluster 2. This solution considers a clustering solution to maintain the network LEF and performance based on the classification accuracy rate before and after ASR being 91.5% and 91%, respectively. However, during the sensors’ failure, it dropped to 83%.

Table 8. ASR based on Scenario 2 for Ionosphere dataset
Selection Selected sensor # sensors
First level selection (M) Cluster0 = 3 [f23, f25, f27] 15
Cluster1 = 6 [f20, f22, f24, f26, f28, f32]
Cluster2 = 6 [f11, f13, f15, f17, f19, f21]
Second level selection (N) Cluster0 = 3 [f23, f25, f27] 10
Cluster1 = 2 [f26, f28]
Cluster2 = 5 [f11 ,f15, f17, f19, f21]
Healthy sensors Cluster0 = 3 [f23, f25, f27] 7
Cluster1 = 0
Cluster2 = 1 [f11, f15, f19, f21]
Faulty sensors (K) Cluster0 = 0 3
Cluster1 = 2 [f26, f28]
Cluster2 = 1 [f17]
Selected K Cluster0 = 0 3
Cluster1 = 1 [f22, f32]
Cluster2 = 1 [f13]
Sensor recovery [f11, f13, f15, f19, f21, f22, f23, f25, f27, f32] 10

Conclusions and Follow-Up Work

In many applications, WSNs share critical data delivered to the sink node in an energy-efficient way as the sensors may have limited battery resources. Accordingly, the main challenge in WSN is to monitor the rate of energy consumption. Additionally, reducing energy consumption can effectively increase the network lifetime duration. Based on this, a solution for the two main challenges to extend the lifetime duration of WSNs is introduced in this paper.
The first solution solves the issue of the sensor selection challenge to improve the rate of energy consumption between the sensor nodes. In this vein, this paper involves a DLSS model, which mainly focuses on optimal energy utilization using sensor nodes selection. This proposed DLSS model increases the network lifetime duration by utilizing two levels for sensor selection, wherein the first level produces a reduced sub-optimal set of sensors and the second level selects the best subset of sensors.
This DLSS model uses four different WSN datasets from the UCI machine learning repositories. The results indicate that the DLSS model increases the network LEF by 6.17 times using 100 out of 617 sensors with a 78 percent accuracy rate for the Isolet dataset. In the Covertype dataset, LEF increased 5.78 times using 7 out of 55 sensors with an accuracy rate of 94 percent as well as 11 & 2.4 times for the Ionosphere and sensor discrimination dataset, respectively.
The second challenge is the fault tolerance issue which deals with the node's failure in the DLSS model using an ASR solution. This solution is presented in two scenarios for recovering faulty sensors with more energy efficiency, in which the first scenario is a straightforward solution with a direct sensor replacement and the second scenario uses a clustering K-means algorithm to select the alternative recovery sensors. Following the ASR first and second scenarios solution, the achieved accuracy rates for the first and second scenario are 90.4 and 91 percent, respectively, before the sensors’ failure. However, during the sensors’ failure, it dropped to 83 percent, which was 91.5 percent before the nodes failure. The results of the obtained accuracy indicated that the second scenario is better than the first scenario. However, more extensive computation shall be done in the base station to select the best fit replacement sensors to maintain a high accuracy.
For the follow-up work, the proposed model for sensor selection can be extended in internet-based applications by adding some regulations and security protocols. Additionally, a fault detection technique can also be proposed to identify errors as they occur, including a diagnose fault type and location. Eventually, the proposed model can be modified to handle big data by using more computational and memory resources

Author’s Contributions

Conceptualization, KMF, OMS. Supervision, KMF, OMS. Writing the original draft, OMS, BMH writing of the review and editing, KMF validation, KMF, BMH. Formal analysis, OMS, KMF, BMH. Data curation, KMF, OMS.

Funding

None.

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Omar M. Salim
Omar M. Salim was born in Cairo, Egypt in 1978. He received his B.S. and M.S. degrees in Electrical Engineering, BHIT, Benha University, Egypt in 2000 and 2006, respectively. He got his Ph.D. degree in Electrical Engineering as a JS between Cairo University, Egypt and Oakland University, USA. He is currently an Associate Professor of Computers and Systems, Electrical Engineering Dep., Benha University. His research interests are in Control, Soft Computing and Renewable Energy.

Khaled M. Fouad had obtained B.Sc. in 1995, M.Sc. in 2003 and Ph.D. in 2012, Department of Systems and computers engineering, Faculty of Engineering. Working now as an associate professor at Information systems Dept., faculty of Computers and Artificial Intelligence, Benha University, Egypt.
His current research interests focus on Intelligent systems, Text Mining, Machine Learning, Data Mining, Big Data processing and analytics, Semantic Web, and Expert Systems.

Basma M. Hassan
Basma M. Hassan had obtained B.Sc. in 2008, M.Sc. in 2016, Department of computers engineering, Benha Faculty of Engineering, Benha University, Egypt. She is currently an assistant lecturer, Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, Egypt. Her current research interests include machine learning, decision support systems, Data Science, Big Data processing, Artificial Intelligence, and pattern recognition.

References

[1] J. Yick, B. Mukherjee, and D. Ghosal, “Wireless sensor network survey,” Computer Networks, vol. 52, no. 12, pp. 2292-2330, 2008.
[2] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,” Computer Networks, vol. 38, no. 4, pp. 393-422, 2002.
[3] Z. Luo and P. S. Min, “Survey of sensor selection methods in wireless sensor networks,” in Proceedings of 2013 19th IEEE International Conference on Networks (ICON), Singapore, 2013, pp. 1-5. [4] J. Pirgazi, M. Alimoradi, T. EsmaeiliAbharian, and M. H. Olyaee, “An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets,” Scientific Reports, vol. 9, article no. 18580, 2019. https://doi.org/10.1038/s41598-019-54987-1
[4] S. Jannu and P. K. Jana, “Energy efficient algorithms to maximize lifetime of wireless sensor networks,” in Proceedings of 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, 2016, pp. 63-68.
[5] S. Jannu and P. K. Jana, “Energy efficient algorithms to maximize lifetime of wireless sensor networks,” in Proceedings of 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, 2016, pp. 63-68.
[6] M. Iqbal, M. Naeem, A. Anpalagan, A. Ahmed, and M. Azam, “Wireless sensor network optimization: multi-objective paradigm,” Sensors, vol. 15, no. 7, pp. 17572-17620, 2015.
[7] A. G. Hoffmann, “General limitations on machine learning,” in Proceedings of the 9th European Conference on Artificial Intelligence (ECAI), Stockholm, Sweden, 1990, pp. 345-347.
[8] M. Cardei and D. Z. Du, “Improving wireless sensor network lifetime through power aware organization,” Wireless Networks, vol. 11, no. 3, pp. 333-340, 2005.
[9] M. X. Cheng, L. Ruan, and W. Wu, “Achieving minimum coverage breach under bandwidth constraints in wireless sensor networks,” in Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies, Miami, FL, 2005, pp. 2638-2645.
[10] J. Lu, L. Bao, and T. Suda, “Coverage-aware sensor engagement in dense sensor networks,” in Embedded and Ubiquitous Computing – EUC 2005. Heidelberg, Germany: Springer, 2005, pp. 639-650.
[11] M. A. Perillo and W. B. Heinzelman, “Optimal sensor management under energy and reliability constraints,” in Proceedings of 2003 IEEE Wireless Communications and Networking (WCNC),New Orleans, LA, 2003, pp. 1621-1626.
[12] K. P. Shih, Y. D. Chen, C. W. Chiang, and B. J. Liu, “A distributed active sensor selection scheme for wireless sensor networks,” in Proceedings of11th IEEE Symposium on Computers and Communications (ISCC), Cagliari, Italy, 2006, pp. 923-928.
[13] T. Yan, T. He, and J. A. Stankovic, “Differentiated surveillance for sensor networks,” in Proceedings of the 1st International Conference on Embedded Networked Sensor Systems (SenSys), Los Angeles, CA, 2003, pp. 51-62.
[14] K. S. Kwok, B. J. Driessen, C. A. Phillips, and C. A. Tovey, “Analyzing the multiple-target-multiple-agent scenario using optimal assignment algorithms,” Journal of Intelligent and Robotic Systems: Theory and Applications, vol. 35, no. 1, pp. 111-122, 2002.
[15] A. Sekhar, B. S. Manoj, and C. Siva, and R. Murthy, “Dynamic coverage maintenance algorithms for sensor networks with limited mobility,” in Proceedings of the 3rd IEEE International Conference on Pervasive Computing and Communications, Kauai, HI, 2005, pp. 51-60.
[16] G. Wang, G. Cao, and T. F. La Porta, “Movement-assisted sensor deployment,” IEEE Transactions on Mobile Computing, vol. 5, no. 6, pp. 640-652, 2006.
[17] G. Wang, G. Cao, and T. La Porta, “A bidding protocol for deploying mobile sensors,” in Proceedings of the 11th IEEE International Conference on Network Protocols (ICNP), Atlanta, GA, 2003, pp. 315-324.
[18] E. Ertin, J. W. Fisher, and L. C. Potter, “Maximum mutual information principle for dynamic sensor query problems,” in Information Processing in Sensor Networks. Heidelberg, Germany: Springer, 2003, pp. 405-416.
[19] J. Liu, J. Reich, and F. Zhao, “Collaborative in-network processing for target tracking,” EURASIP Journal on Applied Signal Processing, vol. 2003, article no. 616720, 2003. https://doi.org/10.1155/S111086570321204X
[20] H. Wang, G. Pottie, K. Yao, and D. Estrin, “Entropy-based sensor selection heuristic for target localization,” in Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, Berkeley, CA, 2004, pp. 36-45.
[21] P. V. Pahalawatta, T. N. Pappas, and A. K. Katsaggelos, “Optimal sensor selection for video-based target tracking in a wireless sensor network,” in Proceedings of 2004 International Conference on Image Processing (ICIP), Singapore, 2004, pp. 3073-3076.
[22] F. Zhao, J. Shin, and J. Reich, “Information-driven dynamic sensor collaboration,” IEEE Signal Processing Magazine, vol. 19, no. 2, pp. 61-72, 2002.
[23] L. M. Kaplan, “Global node selection for localization in a distributed sensor network,” IEEE Transactions on Aerospace and Electronic Systems, vol. 42, no. 1, pp. 113-135, 2006.
[24] L. M. Kaplan, “Local node selection for localization in a distributed sensor network,” IEEE Transactions on Aerospace and Electronic Systems, vol. 42, no. 1, pp. 136-146, 2006.
[25] F. Bian, D. Kempe, and R. Govindan, “Utility-based sensor selection,” in Proceedings of the 5th International Conference on Information Processing in Sensor Networks, Nashville, TN, 2006, pp. 11-18.
[26] R. Govindan, E. Kohler, D. Estrin, F. Bian, K. Chintalapudi, O. Gnawali, R. Gummadi, S. Rangwala, and T. Stathopoulos, "Tenet: an architecture for tiered embedded networks," University of Southern California, Los Angeles, CA, 2005.
[27] J. Ai and A. A. Abouzeid, “Coverage by directional sensors in randomly deployed wireless sensor networks,” Journal of Combinatorial Optimization, vol. 11, no. 1, pp. 21-41, 2006.
[28] J. Ostwald, V. Lesser, and S. Abdallah, “Combinatorial auctions for resource allocation in a distributed sensor network,” in Proceedings of the 26th IEEE International Real-Time Systems Symposium (RTSS), Miami, FL, 2005.
[29] T. P. Hong and G. N. Shiu, “Allocating multiple base stations under general power consumption by the particle swarm optimization,” in Proceedings of the 2007 IEEE Swarm Intelligence Symposium (SIS), Honolulu, HI, 2007, pp. 23-28.
[30] J. C. Tillett, R. M. Rao, F. Sahin, and T. M. Rao, “Particle swarm optimization for the clustering of wireless sensors,” in Proceedings of SPIE 5100:Digital Wireless Communications V. Bellingham, WA: International Society for Optics and Photonics, 2003, pp. 73-83.
[31] G. H. Ekbatanifard, R. Monsefi, M. R. Akbarzadeh-T., and M. H. Yaghmaee, “A multi-objective genetic algorithm based approach for energy efficient QoS-routing in two-tiered wireless sensor networks,” in Proceedings of IEEE 5th International Symposium on Wireless Pervasive Computing, Modena, Italy, 2010, pp. 80-85.
[32] J. Jia, J. Chen, G. Chang, and Z. Tan, “Energy efficient coverage control in wireless sensor networks based on multi-objective genetic algorithm,” Computers and Mathematics with Applications, vol. 57, no. 11-12, pp. 1756-1766, 2009.
[33] J. N. Al-Karaki, R. Ul-Mustafa, and A. E. Kamal, “Data aggregation and routing in wireless sensor networks: Optimal and heuristic algorithms,” Computer Networks, vol. 53, no. 7, pp. 945-960, 2009.
[34] A. Norouzi, F. S. Babamir, and A. H. Zaim, “A new clustering protocol for wireless sensor networks using genetic algorithm approach,” Wireless Sensor Network, vol. 3, no. 11, pp. 362-370, 2011.
[35] M. D. Alwadi and G. Chetty, “Feature selection and energy management for wireless sensor networks,” International Journal of Computer Science and Network Security, vol. 12, no. 6, pp. 46-51, 2012.
[36] A. Y. Barnawi and I. M. Keshta, “Energy management of wireless sensor networks based on multi-layer perceptrons,” in Proceedings of the 20th European Wireless Conference, Barcelona, Spain, 2014, pp. 1-6.
[37] A. Y. Barnawi and I. M. Keshta, “Energy management in wireless sensor networks based on naive Bayes, MLP, and SVM classifications: a comparative study,” Journal of Sensors, vol. 2016, article no. 6250319, 2016. https://doi.org/10.1155/2016/6250319
[38] B. O. Ayinde and A. Y. Barnawi, “Energy conservation in wireless sensor networks using partly-informed sparse autoencoder,” IEEE Access, vol. 7, pp. 63346-63360, 2019.
[39] J. Kang, J. Kim, M. Kim, and M. Sohn, “Machine learning-based energy-saving framework for environmental states-adaptive wireless sensor network,” IEEE Access, vol. 8, pp. 69359-69367, 2020.
[40] K. M. Fouad, B. M. Hassan, and O. M. Salim, “Hybrid sensor selection technique for lifetime extension of wireless sensor networks,” Computers, Materials & Continua, vol. 70, no. 3, pp. 4965-4985, 2022.
[41] N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” American Statistician, vol. 46, no. 3, pp. 175-185, 1992.
[42] K. M. Fouad and D. L. El-Bably, “Intelligent approach for large-scale data mining,” International Journal of Computer Applications in Technology, vol. 63, no. 1-2, pp. 93-113, 2020.
[43] X. Tan, S. Su, Z. Huang, X. Guo, Z. Zuo, X. Sun, and L. Li, “Wireless sensor networks intrusion detection based on SMOTE and the random forest algorithm,” Sensors, vol. 19, no. 1, article no. 203, 2019. https://doi.org/10.3390/s19010203
[44] A. B. Musa, “Logistic regression classification for uncertain data,” Research Journal of Mathematical and Statistical Sciences, vol. 2, no. 2, pp. 1-6. 2014.
[45] B. Charbuty and A. Abdulazeez, “Classification based on decision tree algorithm for machine learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 1, pp. 20-28, 2021.
[46] E. K. Ampomah, Z. Qin, and G. Nyame, “Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement,” Information, vol. 11, no. 6, article no. 332, 2020. https://doi.org/10.3390/info11060332
[47] X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,” Advances in Neural Information Processing Systems, vol. 18, pp. 507-514, 2005.
[48] S. F. Sawyer, “Analysis of variance: the fundamental concepts,” Journal of Manual & Manipulative Therapy, vol. 17, no. 2, pp. 27E-38E, 2009.
[49] M. Abdel-Basset, D. El-Shahat, I. El-henawy, V. H. C. de Albuquerque, and S. Mirjalili, “A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection,” Expert Systems with Applications, vol. 139, article no. 112824, 2020. https://doi.org/10.1016/j.eswa.2019.112824
[50] A. T. Azar, A. M. Anter, and K. Fouad, “Intelligent system for feature selection based on rough set and chaotic binary grey Wolf optimisation,” International Journal of Computer Applications in Technology, vol. 63, no. 1-2, pp. 4-24, 2020.
[51] UCI Machine Learning Repository (2013) [Online]. Available: https://archive.ics.uci.edu/ml/index.php.
[52] K. M. Fouad, M. M. Ismail, A. T. Azar, and M. M. Arafa, “Advanced methods for missing values imputation based on similarity learning,” PeerJ Computer Science, vol. 7, article no. e619, 2021. https://doi.org/10.7717/peerj-cs.619
[53] S. Petridou, S. Basagiannis, and M. Roumeliotis, “Survivability analysis using probabilistic model checking: a study on wireless sensor networks,” IEEE Systems Journal, vol. 7, no. 1, pp. 4-12, 2013.
[54] W. I. Gabr, M. A. Ahmed, and O. M. Salim, “Hybrid detection algorithm for online faulty sensors identification in wireless sensor networks,” IET Wireless Sensor Systems, vol. 10, no. 6, pp. 265-275, 2020.
[55] T. Muhammed and R. A. Shaikh, “An analysis of fault detection strategies in wireless sensor networks,” Journal of Network and Computer Applications, vol. 78, pp. 267-287, 2017.
[56] T. Li, H. Dong, and J. Sun, “Binary differential evolution based on individual entropy for feature subset optimization,” IEEE Access, vol. 7, pp. 24109-24121, 2019.
[57] M. Alzaqebah, K. Briki, N. Alrefai, S. Brini, S. Jawarneh, M. K. Alsmadi, et al., “Memory based cuckoo search algorithm for feature selection of gene expression dataset,” Informatics in Medicine Unlocked, vol. 24, article no. 100572, 2021. https://doi.org/10.1016/j.imu.2021.100572
[58] S. Mirjalili and A. Lewis, “The whale optimization algorithm,” Advances in Engineering Software, vol. 95, pp. 51-67, 2016.
[59] G. I. Sayed, G. Khoriba, and M. H. Haggag, “A novel chaotic salp swarm algorithm for global optimization and feature selection,” Applied Intelligence, vol. 48, no. 10, pp. 3462-3481, 2018.

Omar M. Salim1, Khaled M. Fouad2,3, and Basma M. Hassan4,*, Dual-Level Sensor Selection with Adaptive Sensor Recovery to Extend WSNs’ Lifetime, Article number: 12:18 (2022) Cite this article 1 Accesses