ArticlesAll Issue
ArticlesAn Augmented SLAM Method with a Visual-Lidar-Equipped Unmanned Vehicle for Detecting Control Measures against COVID-19
• Siyuan Liang1, Mengna Xie1, Sahil Garg2, Georges Kaddoum2, Mohammad Mehedi Hassan3, and Salman A. AlQahtani4,*

Human-centric Computing and Information Sciences volume 13, Article number: 01 (2023)
https://doi.org/10.22967/HCIS.2023.13.001

Abstract

In epidemic prevention and control measures, unmanned devices based on autonomous driving technology have stepped into the front lines of epidemic prevention, playing a vital role in epidemic prevention measures such as protective measures detection. Autonomous positioning technology is one of the key technologies of autonomous driving. The realization of high-precision positioning can provide accurate location epidemic prevention services and a refined intelligent management system for the government and citizens. In this paper, we propose an unmanned vehicle (UV) positioning system REW_SLAM based on lidar and stereo camera, which realize real-time online pose estimation of UV by using high-precision lidar pose correction visual positioning data. A six-element extended Kalman filter (6-element EKF) is proposed to fusion lidar and stereo camera sensors information, which retains the second-order Taylor series of observation and state equation, and effectively improves the accuracy of data fusion. Meanwhile, considering improving lidar outputs quality, a modified wavelet denoising method is introduced to preprocess the original data of lidar. Our approach was tested on KITTI datasets and real UV platform, respectively. By comparing with the other two algorithms, the relative pose error and absolute trajectory error of this algorithm are increased by 0.26 m and 2.36 m on average, respectively, while the CPU occupancy rate is increased by 6.685% on average, thereby proving the robustness and effectiveness of the algorithm.

Keywords

Autonomous Driving, Autonomous Positioning, Multi-Sensor Fusion, EKF, Wavelet Transform

Introduction

With the continuous development of information technology, smart city construction has gained prominence, with its development involving the design and implementation of transportation, safety, and other fields [1]. Similarly, artificial intelligence is also gaining traction as a global cutting-edge technology [2]. In the background of smart city and artificial intelligence, autonomous driving technology is widely used in various fields of transportation, which can improve safety, reduce traffic congestion, and save on time [3]. Due to the onset of the coronavirus disease 2019 (COVID-19) pandemic, autonomous driving technology has quickly taken part in the struggle against COVID-19, such as protection detection of medical staff, personal protection detection, body temperature measurement, and dense personnel statistics. COVID-19 is caused by the severe acute respiratory syndrome coronavirus [4]. Yang et al. [5] and Sahraoui et al. [6] have conducted numerous researches on COVID-19 prevention and other measures based on patient contact networks and Internet of vehicles, respectively.
Applying autonomous driving technology to epidemic prevention and control can reduce the stress of medical staff and the probability of cross infection in tandem. Therefore, groundbreaking autonomous driving technology is of great significance to the application of unmanned vehicle (UV) in epidemic prevention, along with 5G technology being applicable to data processing, computing, and communication of autonomous driving technology [7, 8]. Specifically, 5G is a new communication medium, and its high speed and low delay can solve the problem of low work efficiency of UV attributed to a large amount of data [9]. Meanwhile, Cao et al. [10-12] also conducted in-depth research on network resource allocation based on virtual network embedding (VNE) to improve the utilization of physical resources. UVs and unmanned aerial vehicles [13, 14] are most widely used in the field of autonomous driving. This paper mainly studies the accuracy of SLAM positioning based on UV. Khanh et al. [15] proposed a wireless indoor positioning system based on Wi-Fi. In addition, wireless sensor networks have also been successfully applied to UV positioning and navigation, but wireless sensor networks have strict data confidentiality requirements and information security needs to be fully considered [16]. Compared with Wi-Fi and wireless sensor networks, camera (visual SLAM) and lidar (laser SLAM) are two more commonly used sensors for SLAM positioning. Based on whether the described features are extracted or not, visual odometry includes the feature point method and direct method. In addition to the feature extraction algorithm, the traditional methods of vision-based positioning and navigation system also adopt clustering and matching algorithms [17]. Compared with visual sensors, lidar can collect spatial and motion information of objects [18], and obtain a large number of high-resolution and accurate 3D point clouds from the surrounding environment [19, 20]. However, due to the high requirements for positioning accuracy of autonomous driving, it is difficult to achieve high-precision positioning through a single solution, which mainly uses multi-sensor fusion [21] to improve the robustness of system positioning. However, most fusion schemes have trouble obtaining high-precision positioning information in large and complex environments to date, and do not take into account the existence of data noise. To solve the above-mentioned problems, this paper proposes a real-time and high-precision UV positioning system REW_SLAM based on lidar and stereo camera, which is applicable to UV equipment in large and complex environments such as the detection of protective measures against COVID-19. The main contributions of this paper are as follows:
1) The modified wavelet transform algorithm: Different from soft threshold and hard threshold functions, the proposed threshold function can solve the problem of soft threshold distortion and hard threshold pseudo-Gibbs effect in tandem. The modified wavelet transform method is used to preprocess the original data of lidar, which can improve the quality of feature extraction of the surrounding environment. 2) The vision pose estimation system: Feature screening adopts “bucketing plus aging” to achieve more effective feature matching. Specifically, it sorts all features according to age error, and the size of age error determines the priority of feature matching. In addition, we use the PnP-ICP combination method to realize feature matching and pose estimation, which can match more features compared to a single PnP or ICP algorithm.
3) A more efficient fusion algorithm: The proposed six-element extended Kalman filter (6-element EKF) is more suitable for SLAM algorithm based on multi-sensor fusion, which is used for autonomous localization of UV and epidemic prevention and control. It can avoid the error caused by linearization and reduce the odometry drift. Compared to the traditional EKF, the positioning accuracy is significantly improved.
4) Performance evaluations: We evaluate our proposed algorithm REW_SLAM on KITTI datasets and real UV platform, respectively, and benchmarked it with two other advanced algorithms to verify the robustness of this algorithm.
The rest of this paper is structured as follows. The second part describes the overall system framework. The third part describes the fusion method in detail. The fourth part gives the experimental results and analysis of KITTI datasets and UV platform. Lastly, the fifth part summarizes this paper.

System Overview

Fig. 1. ystem overview of the proposed unmanned vehicle location algorithm, mainly including lidar odometry, visual odometry, & data fusion.

Due to the defects of lidar or the influence of external environmental factors, the output original data has noise data inconsistent with the actual data, with the effect of greatly influencing the accuracy of system positioning. Therefore, a modified wavelet filtering algorithm is proposed in this paper. The laser points data is preprocessed by this modified wavelet filtering algorithm, and the processed points data is used for lidar pose estimation. In the lidar odometry park, the ICP is used for a pose estimate of lidar. In the visual odometry part, we extract the ORB features in each frame image and filter the features using “bucketing plus aging.” The filtered feature points are used to match the points in the local map, and then the camera pose is estimated using a combination of PnP and ICP. In the data fusion part, the 6-element EKF method is used to fuse the sensor data of lidar and sensor camera. Compared with the traditional EKF, it ensures the accuracy of data processing and avoids linearization error. Then, the position collected by the camera is used as the input of the system, while the position collected by the lidar is used as the measured value of the system. The positioning data obtained by the camera is corrected by using the position information of the lidar to realize data fusion and obtain the optimal position estimation.
In this paper, the problem to be solved is to estimate the UV's motion. The sensors carried by the UV are lidar and a camera. We calibrate the camera and lidar jointly, so the two sensors can use the same coordinate system and project the laser point cloud coordinates onto the camera coordinate system. The left uppercase superscription is used to represent the coordinate systems, with each coordinate system being defined as follows [22]:

Image projection coordinate system {I}: is set at the top left corner of the image projection plane. The right indicates the x-axis (u-axis) direction. The bottom indicates the y-axis (v-axis) direction.

Camera coordinate system {C}: is set at the optical center of the camera. The left indicates x-axis direction, and the top indicates the y-axis direction. The front indicates the z-axis direction, which coincides with the camera spindle.

Lidar coordinate system {L}: is set at the optical center of the lidar. The front indicates x-axis direction, while the left indicates the y-axis direction, and the top indicates the z-axis.

World coordinate system {W}: its starting position coincides with the camera coordinate system

Lidar-Vision Odometry Based on Loose Coupling

In order to estimate the pose of the UV more robustly, we need to preprocess the lidar and fuse it with the stereo camera through loose coupling to obtain the visual-laser odometry.

Modified Wavelet Transform Algorithm
Due to the influence of lidar itself and the surrounding environment, there will be some noise data in the output data of lidar that is not in the original data. For example, complex environment, changeable scene, and obstacle occlusion will lead to uneven or missing distribution of laser point cloud. To solve the above problems, we preprocess the laser point cloud to extract the surrounding environment features accurately, efficiently, and completely from the large-scale complex environment.
Lidar output data includes distance information, angle, reflection intensity, time stamp, and other information. But in this paper, we mainly deal with the distance value. The denoising process of lidar data using the wavelet transform method roughly includes three steps as follows. Firstly, the appropriate wavelet base and decomposition layer N are selected according to the demand, and the lidar data is decomposed into N layers by a wavelet transform; secondly, the appropriate threshold function is selected to quantify the refine coefficient of each layer through different thresholds; and finally, based on the approximate coefficients of the N-th layer and the quantized refine coefficients on all decomposition layers, the data is reconstructed by an inverse wavelet transform.
The focus of the wavelet denoising method is the selection of wavelet base and threshold size. Here, the number of decomposition layers is 5, with a threshold size of 30 and the wavelet basis function as the Daubechies (dbN). At present, wavelet denoising methods based on soft threshold and hard threshold functions are the most commonly used ones. But both methods have their inherent shortcomings in that the hard threshold function undergoes a sudden change and the signal processed by the hard threshold will contain new noise. Meanwhile, there will be a pseudo-Gibbs effect, which is far less than the smoothness of the previous signal. Although there is no sudden change in the soft threshold function, the high-frequency components will be filtered over due to the difference between the estimated and real values, while the processed signal will be distorted. Therefore, this paper adopts a modified wavelet denoising method, which can overcome the above shortcomings, and the new threshold function is denoted as:

(1)

where ω and λ represent distance values in the lidar output data and thresholds, respectively. The new threshold function, which is distributed between the soft and hard threshold functions, is a nonlinear function, so the advantages and disadvantages described above can be taken into account.

Lidar Sensor
Before the pose estimation of lidar, wavelet filtering is used to filter the noise of laser points data, while the feature point extraction method is similar to LOAM algorithm. However, due to the high sampling frequency of lidar, the number of laser-ranging points is large, affecting the accuracy and execution speed of pose estimation. Therefore, compared to LOAM, we only select the points with sharp edges and planar surface patches as the key points. Smoothness of the local surface S is defined as denoted below:

(2)

where k represents the k-th scan. i, j represents i and j points, respectively. and are the coordinates of i and j points in the coordinate system {C} in the k-th scan, respectively. is the set of ten points closest to point in the k-th scan. In each scan, all points are sorted by the S value. Select the point with a lower S value as the plane point and the point with a higher S value as the edge point. In addition, the number of channels of the key points needs to be obtained. The laser point clouds in each scan is matched by comparing between the smoothness of key points and the number of channels. To improve the matching accuracy, it is essential to formulate a rule to select these point clouds for matching. Assume that the distance between the two matched keys is d, and then calculate the average distance d ̄ between all matches. According to the two critical values s and Δd that we set, the matched points where matching distance d is greater than sd ̄ are deleted, and simultaneously, the average distance d ̄ until is recalculated. The position of the first scan will be used as the coordinate system {W} of the map. The correct matching points between the k-th and (k-1)-th scans are represented by respectively. Define ρ and ϕ as the conversion of scanning (k-1)-th and k-th, which can be denoted as follows:

(3)

In order to apply most of the key point pairs to Equation (3), the ICP method is used.

(4)

The ICP problem is solved by the Levenberg-Marquardt (LM) nonlinear optimization. Different components of pose transformation are calculated to estimate pose, while the different components of 6-DoF transformation are solved by plane and edge features in continuous scanning. The plane features extracted from the ground are used to obtain, and the edge feature extracted from the segmented point cloud is used to obtain the rest of the transformation by matching.

Camera Sensor
We select the most commonly used oriented FAST and Rotated BRIEF (ORB) methods to realize feature extraction and matching, and consider the spatial distribution and temporal evolution of feature points, so we screen the extracted feature points to reduce the drift of visual odometry. Because the measurement range of the stereo camera is short, some extracted features may not provide depth values. Therefore, the condition of feature extraction is relaxed, so that more features can be obtained in the close region of the input image.

Initialize local map
Before estimating the motion, we initialize the local map. The ORB features in each input image are extracted. represent the coordinates of point i in the coordinate system {I} and {C} in the k-th frame, respectively. represent the coordinates of point in the {W} in the k-th frame. The relationship between can be represented by (5):

(5)

where F is the camera internal parameter, which can be obtained during calibration. Move in (5) to the left of the equation and let to obtain the following mathematically:

(6)

where π represents the transformation from homogeneous coordinates to non-homogeneous coordinates. ρ and ϕ represent translation and rotation, respectively, with their homogeneous coordinate format as $T_k$. Based on the properties of homogeneous coordinates, Equation (6) can be modified to the following:

(7)

Since the world coordinate is initialized by the first frame, there are $T_0$=$I_4$, and the first local map point can be obtained according to (7).

Feature screening
To achieve more effective matching, it is highly necessary to screen the feature points in the current frame. The tracked features are uniformly distributed along the image by the bagging method, and all the points used for feature matching are obtained from those buckets.
In addition, the age of features also has a significant impact on the quality of pose estimation, so we further screen features according to the age. The age of all initially tracked features is zero, increasing in increments of one for each successfully detected feature in the future. The measured positions of all features in the k-th frame are compared with the predicted positions, where the predicted positions are obtained according to the ground truth of the motion parameters provided by KITTI. If P_($k-1$) is a point in the ($k-1$)-th frame, its corresponding measurement feature in the next frame is $P_k$, with the average interframe error of feature points P_($k-1$) denoted as:

(8)

where Age is the survival age of the point is the motion matrix. represents the interframe error between measured and predicted positions. The interframe error depends on the tracking quality. Therefore, these features are selected according to the error size, while the features with a smaller error have higher motion estimation priority. The “bucketing plus aging” method only selects better features for pose estimation according to the interframe error, and does not filter out too many features. Also, the extraction conditions have been relaxed during feature extraction, so the pose estimation accuracy of the camera will not be affected by over-filtering.

Error function
The feature points for pose estimation are obtained according to “bucketing plus aging.” Due to the limited detection distance of the stereo camera, the depth data of one pixel may or may not be detected. Therefore, we use the PnP-ICP combination algorithm to estimate the camera pose. Compared to the single PnP and ICP algorithms, this combined method can match more features and improve the accuracy of pose estimation. Since the existence of noise will lead to inaccurate estimation results and data loss, we use bundle adjustment, a nonlinear optimization method, to estimate the camera pose. Assuming that and are features with accurate depth values and features without depth values and , respectively, are the matching points in the local map corresponding to feature, the following can be obtained mathematically:

(9)

In order to obtain a relatively accurate $T_k$, the LM method is also used to solve the function. When each estimation is made, the local map should be updated and optimized according to these matched feature points.

Fusion Method
According to the previous chapters, we have obtained the lidar and camera poses. The complete pose of the UV includes the translation part and rotation part Our fusion algorithm is the 6-element EKF, which mainly includes two processes consisting of firstly, the nonlinear transformation and secondly, the data fusion.

Nonlinear transformation
In [23], the system has a linearization error because the higher-order term in the Taylor series is ignored. The accumulation of this error over time makes the estimation results divergent and the positioning inaccurate. Therefore, in our method, we retain the second-order Taylor series. Compared to the traditional EKF algorithm, it can avoid the error attributed to linearization and improve the data accuracy. Firstly, we expand the state and observation equations by the second-order Taylor series d^*:

(10)

where $F_k$ can be expressed as follows:

(11)

Also,

(12)

In the EKF method, the state and observation equations are expressed as follows:

(13)

where $x_k$ is the state vector, $z_k$ is the observation vector. h is the observation matrix, and f is the system transfer matrix. w, v represent the state and observation noises. Bring Equation (10) into Equation (13) to obtain the following equation of state denoted as:

(14)

In the above formula, in order to optimization the linearization error, d^* in the linearization process and the system noise $w_k$ are combined into $ε_k$, which represents virtual noise.
Similarly, the observation equation can be expressed as follows:

(15)

In the above formula, $d^*$ in the linearization process is combined with the observation noise $v_k$, with $λ_k$ representing virtual observation noise.

Data fusion
Through the above analysis, the state Equation (14) and observation Equation (15) for 6-element EKF positioning have been obtained. Take the camera pose as the system input u_k. The position information is obtained by the lidar as the actual observation value $z_{(k+1)}$ and matched with the environmental characteristics corresponding to the observation prediction value. Assuming that there are m pairs of matched environmental characteristics, the gain of each pair is denoted as:

(16)

where P is prediction variance and S is variance based on the pose estimation obtained by lidar.
Status update is as follows:

(17)

The prediction estimation covariance matrix update is as follows:

(18)

Experimental Results and Analysis

This part mainly evaluates the positioning accuracy of the algorithm. Our algorithm REW_SLAM is compared to ORB_SLAM2 and VISO_LOAM_TF algorithms on KITTI datasets and ORB_SLAM2 and LOAM on the UV platform.

KITTI Datasets Test
In the experiment, the KITTI datasets is used for testing. Also, we use relative pose error (RPE) and absolute trajectory error (ATE) to evaluate the accuracy of the algorithm. Compare the following algorithms as listed:

REW_SLAM: our proposed algorithm set.

Laser_visual: REW_SLAM without a wavelet transform.

Laser: single-laser odometry.

Visual: single-visual odometry.

ORB_SLAM2: an open source visual SLAM system.

VISO2_LOAM_TF: a vision-lidar loosely coupled algorithm proposed in [24].

Table 1 shows the RPE of the three algorithms. The REW_SLAM and VISO2_LOAM_TF algorithms have smaller errors in all sequences compared with ORB_SLAM2. The error of the VISO2_LOAM_TF algorithm on sequences 02, 09, and 10 is better than our algorithm, but our algorithm has a better effect on other sequences. The average RPE of REW_SLAM is only 0.31, which is the smallest of the three algorithms, while the smallest RPE is attained on sequence 05, which is only 0.054.
As shown in Table 2, REW_SLAM also achieved good results with respect to ATE. The error range of VISO2_LOAM_TF on 00, 01 and 02 is too large, especially on sequence 01, with the maximum error reaching 25.389. The error of ORB_SLAM2 is smaller than VISO2_LOAM_TF, while the minimum ATE of 0.2 is attained on sequence 04. The maximum ATE of ORB_SLAM2 is 10.4, while the maximum ATE of REW_SLAM is only 7.547 and the average ATE is only 2.467, which is better than ORB_SLAM2.

Table 1. Relative pose error of REW_SLAM, ORB_SLAM2, and VISO2_LOAM_TF
 Sequence 0 1 2 4 5 6 7 8 9 10 Average REW_SLAM 0.162 0.248 0.257 0.076 0.054 0.103 0.095 0.309 1.545 0.254 0.31 ORB_SLAM2 0.702 1.391 0.764 0.489 0.407 0.51 0.502 1.057 0.872 0.603 0.73 VISO2_LOAM_TF 0.21 2.457 0.205 0.249 0.112 0.124 0.13 0.344 0.153 0.142 0.413
Bold font indicates the best performance of the three methods in each test.

Table 2. Absolute trajectory error of REW_SLAM, ORB_SLAM2, and VISO2_LOAM_TF
 Sequence 0 1 2 4 5 6 7 8 9 10 Average REW_SLAM 1.023 4.359 5.103 0.267 0.621 0.348 0.435 4.049 7.547 0.918 2.467 ORB_SLAM2 1.344 10.402 5.719 0.293 0.851 0.82 0.524 3.607 3.291 1.037 2.789 VISO2_LOAM_TF 13.305 25.389 17.115 0.792 2.691 0.832 1.005 4.703 1.367 1.889 6.909
Bold font indicates the best performance of the three methods in each test.

Fig. 2 shows the distribution of translation error and rotation error with trajectory on sequence 00, with as the translation error and the rotation error. The peak of error is usually caused by the turning of the vehicle, which produces a large amount of drift. The translation error is always less than 1.25 m, while the rotation error is less than 4.5°. The average translation and rotation error are 0.16 m and 0.34°, respectively. Compared to the results proposed in [25], the average translation and rotation error of our proposed algorithm are lower.

Fig. 2. Percentage positioning error of our proposed algorithm relative to the real trajectory on sequence 00 with as (a) translation error and (b) rotation error.

Fig. 3 shows the RPEs of the three algorithms on sequences 00, 04, 07 and 10. Blue represents our algorithm REW_SLAM, green represents algorithm ORB_SLAM2, and red represents algorithm VISO2_LOAM_TF. ORB_SLAM2 has the largest error range on these four datasets, especially on sequence 07, with the maximum error being more than 1.3 m. The effect of VISO2_LOAM_TF is better than ORB_SLAM2. The maximum error of VISO2_LOAM_TF on sequences 00 and 10 is the smallest of the three algorithms, while the median error on sequence 10 is also the best of the same three. Our algorithm achieves the best results on sequences 00, 04, and 07. Based on the above analysis, it can be concluded that REW_SLAM has good positioning accuracy and minimum drift.
The prediction trajectory of algorithms on sequences 01, 07, and 10 is shown in Fig. 4, and Fig. 4(a), 4(b), and 4(c) are the resultant comparisons of REW_SLAM, laser, and visual algorithms. The black dotted line is the real trajectory of the datasets, while the green and red bars are the trajectories of laser and visual algorithms, respectively, and the blue bar is the trajectory of REW_SLAM. As shown above, our algorithm can significantly reduce the error and estimate the UV's pose more accurately compared with laser and visual algorithms. Moreover, the trajectory generated by our algorithm in each sequence several times almost overlaps the real trajectory. Fig. 4(d), 4(e) and 4(f) are the resultant comparisons between REW_SLAM, laser_visual algorithms. The blue is the trajectory of REW_SLAM and the green is the trajectory of the laser_visual algorithms. Although the laser_visual algorithms can estimate the pose of the UV, its accuracy is poor compared with REW_SLAM. This shows that the modified wavelet denoising method designed by us has a good optimization effect on the positioning accuracy of the UV, and verifies the effectiveness and robustness of the modified wavelet denoising method.

Fig. 3. Box plot of REW_SLAM, ORB_SLAM2, and VISO2_LOAM_TF on (a) sequence 00, (b) sequence 04, (c) sequence 07, and (d) sequence 10.
Fig. 4. Trajectory estimated and the corresponding ground truth. (a–c) Predicted trajectories of REW_SLAM, laser, and visual; the trajectories of sequences 01, 07, and 10, respectively. (d–f) Predicted trajectories of REW_SLAM and Laser_Visual; the trajectories of sequences 01, 07, and 10, respectively.
Robot Platform Test
The camera and lidar adopt Orbbec Astra Pro and C16 lidar, respectively. They are mounted on a ROS UV. All algorithms run on a Lenovo laptop, which has a i5 core processor and 8 GB memory. The onboard system runs on the 64-bit Ubuntu 16.04 operating system. The algorithm is implemented in the framework of a ROS operating system.
Due to the limited equipment, we could not get the true value of the test trajectories, and we could not calculate the RPE and ATE. So, we used the following three methods to evaluate our algorithm, namely static positioning experiment, CPU occupancy rate, Euclidean distance error.
In the static positioning error, the UV is kept stationary, and the fluctuation of pose data is analyzed. The camera and lidar sensors form a rigid body, so the conversion between them is constant. We analyze the position fluctuations in three x, y, and z directions. The specific pose fluctuation curve of static positioning is shown in Fig. 5. The green curve represents the position fluctuation before fusion (only use a camera sensor), and the blue curve represents the position fluctuation after fusion. It is obvious from Fig. 5 that the position fluctuation after fusion is smoother. Meanwhile, the variance of position fluctuation in each direction is also calculated, as shown in Table 3. It can be seen from Table 3 that the fluctuation variance of the fused system is less than that of the non-fused system. Experimental results show that the fusion localization algorithm proposed by us improves the stability of localization.
Fig. 5. Trajectory estimated and the corresponding ground truth. (a–c) Predicted trajectories of REW_SLAM, laser, and visual; the trajectories of sequences 01, 07, and 10, respectively. (d–f) Predicted trajectories of REW_SLAM and Laser_Visual; the trajectories of sequences 01, 07, and 10, respectively.

Table 3.Variance of position fluctuations in x, y, and z directions
 x y z Fusion before 0.005745 0.002531 0.000318 Fusion later 0.005734 0.002511 0.00031

Besides the location module, the autonomous driving system also needs target detection, path planning, and other modules. Therefore, under the limited computing resources, the amount of computation of algorithm should be reduced as much as possible, and the algorithm performance should be guaranteed in tandem. Therefore, we evaluate the CPU occupancy rate of the three algorithms under four different trajectories, with one of trajectories shown in Fig. 6. In the four tests, the running speed of the UV is 3.6 m/s. In this paper, ORB_SLAM2 and LOAM are used as comparison algorithms. The three algorithms run under four different trajectories, while calculating and comparing the CPU occupancy rate of each algorithm. The test method is designed to record the CPU occupancy rate every 3 seconds, and then take the average value. The CPU occupancy rate of the three algorithms is shown in Table 4. The last column of Table 4 is the average value of the CPU occupancy rate of the three algorithms under all trajectories. It can be seen from the average value that the average CPU occupancy rate of the fusion algorithm in this paper is 25.49%, which is much lower than the other two algorithms.
In order to further evaluate the fusion location algorithm performance, the Euclidean distance error is selected as the performance index. The specific expression of the Euclidean distance error D is denoted as:

(19)

where, is the estimated pose of the UV, with as the actual pose of the UV. The above formula can be used to measure the degree of deviation between the estimated and real positions.

Fig. 6. Trajectory estimation of algorithm REW_SLAM on trajectory 1.

Table 4. CPU occupancy rate of REW_SLAM, ORB_SLAM2, and LOAM
 Sequence 1 2 3 4 Average REW_SLAM/% 25.703 24.71 26.231 25.354 25.499 ORB_SLAM2/% 29.961 29.855 30.029 28.311 29.539 LOAM/% 34.568 34.972 35.803 33.967 34.828
Bold font indicates the best performance of the three methods in each test.

We select the position data of the turning, starting, and end points of trajectory 2 (Fig. 7(a)) as the representative ones. During the experiment, when recording the real and estimated positions, their Euclidean distance error is calculated according to formula (19), while the Euclidean distance error curve is drawn as shown in Fig. 7(b). As such, the blue line represents our algorithm REW_SLAM, the red line the ORB_SLAM2 algorithm, and the green line the LOAM algorithm.

Fig. 7. (a) Trajectory 2 and (b) Euclidean distance error curve of REW_SLAM, ORB_SLAM2, & LOAM in trajectory 2.

It can be seen from Fig. 7 that the Euclidean distance error of a single-sensor positioning is significantly bigger than the fusion algorithm proposed in this paper at most positions. The Euclidean distance error curve located only by single-sensor data basically marks an upward trend, indicating that the single-sensor location algorithm has a cumulative error. Also, its error will increase with the passage of time, and the fusion algorithm in this paper is basically stable between 5 and 10. The increase of time is not correlated to a significant increase in the Euclidean distance error. The above experimental results show that the proposed fusion algorithm can significantly improve the positioning performance of the UV.

Conclusion

Under the background of detection of protective measures against COVID-19, the problem of how to improve the autonomous positioning accuracy of the UV has been studied in depth based on lidar and the stereo camera sensor. The visual-lidar SLAM fusion algorithm can realize the real-time and high-precision positioning of the UV in a large and complex environment, thereby effectively improving the efficiency of the UV in the work of detecting protective measures against COVID-19, as well as providing a strong assurance for epidemic prevention and control work. As for technical contribution, we propose a 6-element EKF method, which can effectively fuse the sensor information of lidar and a stereo camera, and overcome the linear error and non-real-time problem of EKF. Compared with the traditional lidar and vision fusion algorithms, we have introduced the wavelet-denoising algorithm for the first time. The modified wavelet filter is used to preprocess the lidar data, which can further improve the positioning accuracy of the system. We evaluated the proposed method on KITTI datasets and the UV platform. Accordingly, the experimental results show that the proposed algorithm has good performance in accuracy, stability, and CPU occupancy compared to the other two algorithms. In the future, we hope to improve the efficiency and accuracy of UV positioning by more effectively integrating IMU data. Also, we can try to build a loop detection to deal with the relocation problem without visual information.

Author’s Contributions

Conceptualization: SL. Investigation and methodology: SL. Writing of the original draft: MX. Writing of the review and editing: SL, SG, GK. Software: MX. Validation: MX, SG. Project administration: SL, MMH, SAA. Funding acquisition: MMH, SAA. All the authors have proofread the final version.

Funding

A special appreciation to the Deanship of Scientific Research at King Saud University for funding this work through the Vice Deanship of Scientific Research Chairs: Research Chair of New Emerging Technologies and 5G Networks and Beyond.

Competing Interests

The authors declare that they have no competing interests.The authors declare that they have no competing interests.

Author Biography

Siyuan Liang received the B.S. degree from the Xi’An University of Posts and Telecommunications, Xi’An, China, in 2006, and the Ph.D. degree from Beijing University of Posts and Telecommunications, Beijing, China, in 2012. He is currently an Associate Professor with the School of Communications and Information Engineering, Xi’An University of Posts and Telecommunications. His main research interests include visual positioning, integrated navigation and automated mission planning of unmanned systems.

Mengna Xie received the B.E. degree from the Xi’An University of Posts and Telecommunications, Xi’An, China, in 2020. She is currently studying for Master of Engineering with the School of Communications and Information Engineering, Xi’An University of Posts and Telecommunications. Her main research interests include laser positioning and integrated navigation.

Sahil Garg received the Ph.D. degree from the Thapar Institute of Engineering and Technology, Patiala, India, in 2018. He is currently a Research Associate at Resilient Machine learning Institute (ReMI) in correlation with École de technologie supérieure (ÉTS), Montréal. Prior to this, he worked as a Postdoctoral Research Fellow at ÉTS, Montreal and MITACS Researcher at Ericsson, Montreal. He has many research contributions in the area of Machine Learning, Big Data Analytics, Knowledge Discovery, Cloud Computing, Internet of Things, Software Defined Networking, and Vehicular Ad-hoc Networks. He has over 80 publications in high ranked Journals and Conferences, including 50+ top-tier journal papers and 30+ reputed conference articles. He is currently a Managing Editor of Springer's Human-centric Computing and Information Sciences (HCIS) journal; and an Associate Editor of IEEE Network Magazine, IEEE Transactions on Intelligent Transportation Systems, Elsevier's Applied Soft Computing (ASoC), and Wiley's International Journal of Communication Systems (IJCS). In addition, he also serves as the Workshops and Symposia Officer for the IEEE ComSoc ETI on Aerial Communications. He guest edited a number of special issues in top-cited journals including IEEE T-ITS, IEEE TII, IEEE IoT Journal, IEEE Network Magazine, FGCS, NCAA, etc. He also served as the TPC Co-Chair/Publicity Co-chair/Special Sessions Chair/Publication Chair for several conferences. He also served as the workshop co-chair for different workshops in IEEE /ACM conferences including IEEE Infocom, IEEE Globecom, ACM MobiCom, etc. He is a member of IEEE, IEEE Communications Society, IEEE Industrial Electronics Society, IEEE Software Defined Networks Community, IEEE Smart Grid Community, ACM, and IAENG.

MOHAMMAD MEHEDI HASSAN [SM’18] received his Ph.D. degree in computer engineering from Kyung Hee University, Seoul, South Korea, in February 2011. He is currently a professor with the Information Systems Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia. He has authored and coauthored around 210+ publica¬tions including refereed IEEE/ACM/Springer/Elsevier journals conference papers, books, and book chapters. His research interests include edge/cloud computing, the Internet of Things, cyber security, deep learning, artificial intelligence, body sensor networks, 5G networks, and social network.

SALMAN ALQAHTANI is currently a Professor in the Department of Computer Engineering, King Saud University, Riyadh. His current research interests are in the area of 5G networks, broadband wireless communications, radio resource management for 4G and beyond networks (call admission control, packet scheduling and radio resource sharing techniques), cognitive and cooperative wireless networking, small cell and heterogeneous networks, self-organizing networks, SDN/NFV, 5G network slicing, smart grid, intelligent IoT solutions for smart cities, dynamic spectrum access, co-existence issues on heterogeneous networks in 5G, industry 4.0 issues, Internet of Everything, mobile edge and fog computing, and Cyber sovereignty. In addition, his interests also include performance evaluation and analysis of high-speed packet switched networks, system model and simulations and integration of heterogeneous wireless networks. Mainly his focus is on the design and optimization of 5G MAC layers, closed-form mathematical performance analysis, energy-efficiency, and resource allocation and sharing strategies. He has authored two scientific books and authored/co-authored around 76 journal and conference papers in the topic of his research interests since 2004. He serves as a reviewer for several national and international journals.

References

[1] E. Kim, S. Jang, S. Li, and Y. Sung, “Newspaper article-based agent control in smart city simulations,” Human-centric Computing and Information Sciences, vol. 10, article no. 44, 2020. https://doi.org/10.1186/s13673-020-00252-8
[2] D. G. Lee and Y. S. Seo, “Improving bug report triage performance using artificial intelligence based document generation model,” Human-centric Computing and Information Sciences, vol. 10, article no. 26, 2020. https://doi.org/10.1186/s13673-020-00229-7
[3] M. Wen, J. Park, and K. Cho, “A scenario generation pipeline for autonomous vehicle simulators,” Human-centric Computing and Information Sciences, vol. 10, article no. 24, 2020. https://doi.org/10.1186/s13673-020-00231-z
[4] F. Hao and D. S. Park, “CoNavigator: a framework of FCA-based novel coronavirus COVID-19 domain knowledge navigation,” Human-centric Computing and Information Sciences, vol. 11, article no. 6, 2021. https://doi.org/10.22967/HCIS.2021.11.006
[5] Y. Yang, F. Hao, D. S. Park, S. Peng, H. Lee, and M. Mao, “Modelling prevention and control strategies for COVID-19 propagation with patient contact networks,” Human-centric Computing and Information Sciences, vol. 11, article no. 45, 2021. https://doi.org/10.22967/HCIS.2021.11.045
[6] Y. Sahraoui, A. Korichi, C. A. Kerrache, M. Bilal, and M. Amadeo, “Remote sensing to control respiratory viral diseases outbreaks using Internet of Vehicles,” Transactions on Emerging Telecommunications Technologies, vol. 33, no. 10, article no. e4118, 2022. https://doi.org/10.1002/ett.4118
[7] A. Irshad, S. A. Chaudhry, A. Ghani, and M. Bilal, “A secure blockchain-oriented data delivery and collection scheme for 5G-enabled IoD environment,” Computer Networks, vol. 195, article no. 108219, 2021. https://doi.org/10.1016/j.comnet.2021.108219
[8] M. Fayaz, G. Mehmood, A. Khan, S. Abbas, M. Fayaz, and J. Gwak, “Counteracting selfish nodes using reputation based system in mobile ad hoc networks,” Electronics, vol. 11, no. 2, article no. 185, 2022. https://doi.org/10.3390/electronics11020185
[9] J. H. Park, S. Rathore, S. K. Singh, M. M. Salim, A. E. Azzaoui, T. W. Kim, Y. Pan, and J. H. Park, “A comprehensive survey on core technologies and services for 5G security: taxonomies, issues, and solutions,” Human-centric Computing and Information Sciences, vol. 11, no. article no. 3, 2021. https://doi.org/10.22967/HCIS.2021.11.003
[10] H. Cao, J. Du, H. Zhao, D. X. Luo, N. Kumar, L. Yang, and F. R. Yu, “Resource-ability assisted service function chain embedding and scheduling for 6G networks with virtualization,” IEEE Transactions on Vehicular Technology, vol. 70, no. 4, pp. 3846-3859, 2021.
[11] H. Cao, A. Xiao, Y. Hu, P. Zhang, S. Wu, and L. Yang, “On virtual resource allocation of heterogeneous networks in virtualization environment: a service oriented perspective,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 4, pp. 2468-2480, 2020.
[12] H. Cao, S. Wu, G. S. Aujla, Q. Wang, L. Yang, and H. Zhu, “Dynamic embedding and quality of service-driven adjustment for cloud networks,” IEEE Transactions on Industrial Informatics, vol. 16, no. 2, pp. 1406-1416, 2020.
[13] R. Kumar, P. Kumar, R. Tripathi, G. P. Gupta, T. R. Gadekallu, and G. Srivastava, “SP2F: a secured privacy-preserving framework for smart agricultural unmanned aerial vehicles,” Computer Networks, vol. 187, article no. 107819, 2021. https://doi.org/10.1016/j.comnet.2021.107819
[14] C. F. Cheng, G. Srivastava, J. C. W. Lin, and Y. C. Lin, “A consensus protocol for unmanned aerial vehicle networks in the presence of Byzantine faults,” Computers and Electrical Engineering, vol. 99, article no. 107774, 2022. https://doi.org/10.1016/j.compeleceng.2022.107774
[15] T. T. Khanh, V. Nguyen, X. Q. Pham, and E. N. Huh, “Wi-Fi indoor positioning and navigation: a cloudlet-based cloud computing approach,” Human-centric Computing and Information Sciences, vol. 10, no. 32, 2020. https://doi.org/10.1186/s13673-020-00236-8
[16] M. Bilal and S. G. Kang, “An authentication protocol for future sensor networks,” Sensors, vol. 17, no. 5, article no. 979, 2017. https://doi.org/10.3390/s17050979
[17] J. Kunhoth, A. Karkar, S. Al-Maadeed, and A. Al-Ali, “Indoor positioning and wayfinding systems: a survey,” Human-centric Computing and Information Sciences, vol. 10, article no. 18, 2020. https://doi.org/10.1186/s13673-020-00222-0
[18] M. Zhang, R. Fu, Y. Guo, L. Wang, P. Wang, and H. Deng, “Cyclist detection and tracking based on multi-layer laser scanner,” Human-centric Computing and Information Sciences, vol. 10, article no. 20, 2020. https://doi.org/10.1186/s13673-020-00225-x
[19] W. Song, L. Zhang, Y. Tian, S. Fong, J. Liu, and A. Gozho, “CNN-based 3D object classification using Hough space of LiDAR point clouds,” Human-centric Computing and Information Sciences, vol. 10, article no. 19, 2020. https://doi.org/10.1186/s13673-020-00228-8
[20] Q. Zou, Q. Sun, L. Chen, B. Nie, and Q. Li, “A comparative analysis of LiDAR SLAM-based indoor navigation for autonomous vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6907-6921, 2022.
[21] M. Montemerlo and S. Thrun, “Simultaneous localization and mapping with unknown data association using FastSLAM,” in Proceedings of 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), Taipei, Taiwan, 2003, pp. 1985-1991.
[22] F. Wu, B. He, L. Zhang, S. Chen, and J. Zhang, “Vision-and-Lidar based real-time outdoor localization for unmanned ground vehicles without GPS,” in Proceedings of 2018 IEEE International Conference on Information and Automation (ICIA), Wuyishan, China, 2018, pp. 232-237.
[23] K. Konolige and M. Agrawal, “FrameSLAM: from bundle adjustment to real-time visual mapping,” IEEE Transactions on Robotics, vol. 24, no. 5, pp. 1066-1077, 2008.
[24] M. Yan, J. Wang, J. Li, and C. Zhang, “Loose coupling visual-lidar odometry by combining VISO2 and LOAM,” in Proceedings of 2017 36th Chinese Control Conference (CCC), Dalian, China, 2017, pp. 6841-6846.
[25] T. Caselitz, B. Steder, M. Ruhnke, and W. Burgard, “Monocular camera localization in 3D LiDAR maps,” in Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea, 2016, pp. 1926-1931.

Siyuan Liang1, Mengna Xie1, Sahil Garg2, Georges Kaddoum2, Mohammad Mehedi Hassan3, and Salman A. AlQahtani4,*, An Augmented SLAM Method with a Visual-Lidar-Equipped Unmanned Vehicle for Detecting Control Measures against COVID-19, Article number: 13:01 (2023) Cite this article 2 Accesses