Human-centric Computing and Information Sciences volume 13, Article number: 07 (2023)
Cite this article 1 Accesses
https://doi.org/10.22967/HCIS.2023.13.007
Siamese trackers have achieved significant progress over the past few years. However, the existing methods are either high speed or high performance, and it is difficult for previous Siamese trackers to balance both. In this work, we propose a high-performance yet effective tracker (SiamSERPN), which utilizes MobileNetV2 as the backbone and equips with the proposed squeeze and excitation region proposal network (SERPN). For the SERPN block, we introduce the distance-IoU (DIoU) into the classification and regression branches to remedy the weakness of traditional RPN. Benefiting from the structure of MobileNetV2, we propose a feature aggregation architecture of multi-SERPN blocks to improve performance further. Extensive experiments and comparisons on visual tracking benchmarks, including VOT2016, VOT2018, and GOT-10k, demonstrate that our SiamSERPN can balance speed and performance. Especially on GOT-10k benchmark, our tracker scores 0.604 while running at 75 frames per second (FPS), which is nearly 27 times that of the state-of-the-art tracker.
Object Tracking, Siamese Network, MobileNet-V2, SERPN, Distance-IoU
Visual object tracking is one of the most fundamental yet challenging topics in computer vision [1], and it has come into a wide range of applications [2–7]. Over the past few years, due to the neural network structure from shallow to deep, the Siamese network-based trackers that utilize the neural network as the backbone have achieved significant progress. But meanwhile, the Siamese tracking models are becoming increasingly heavy, which severely slows down the speed of the Siamese trackers and even below the minimum speed of real-time—25 frames per second (FPS), minimum rate for computer vison and industrial applications. For instance, the latest SiamRPN++ [8] and SiamRCNN [9] trackers, respectively, only run at 35 FPS and 2 FPS to achieve state-of-the-art performance, being much slower than the early SiamFC [10] and SiamRPN [11] method that adopts shallow networks as the backbone, as visualized in Fig. 1. Therefore, how to keep the balance between performance and speed is one of the main challenges for the Siamese trackers.
We propose a fast Siamese tracker using a lightweight backbone that maintains competitive performance while running at 75 FPS.
We design the SERPN block and propose the SERPN aggregation structure to compensate for the performance loss caused by the lightweight backbone.
We introduce the DIoU metric into the classification branch and regression branch of the proposed SERPN to remedy the natural deficiencies of the standard RPN.
Visual tracking is one of the most active research topics in computer vision in recent decades. Many excellent methods have emerged [18–23], from correlation filter-based trackers to deep learning-based trackers. A comprehensive survey of the trackers is beyond the scope of this paper, so we only briefly review three aspects that are most relevant to our work: Siamese network-based visual trackers, deep architecture, and RPN in detection, in which the Siamese visual tracker is our major direction. Therefore, to clearly review the related work, the contributions of the mainstream anchor-based Siamese trackers are listed in Table 1 [8–11, 24, 25].
Table 1. Related studies of the mainstream anchor-based Siamese trackers
Study | Proposed methods | Main contributions | Performance (on VOT2016) |
Speed (FPS) |
Bertinetto et al. [10] | SiamFC | First Siamese tracker used shallow network | 0.387 | 86 |
Li et al. [11] | SiamRPN | Introduce RPN into Siamese trackers | 0.393 | 160 |
Zhu et al. [24] | DaSiamRPN | Expand more dataset to SiamRPN | 0.411 | 160 |
Li et al. [8] | SiamRPN++ | Introduce very deep network into Siamese tracker | 0.464 | 35 |
Wang et al. [25] | SiamMask | Add Semantic segmentation | 0.412 | 77 |
Voigtlaender et al. [9] | SiamRCNN | Proposed re-detection | 0.46 | 4.7 |
In this section, we illustrate the proposed SiamSERPN framework. As shown in Fig. 2, the proposed SiamSERPN consists of a Siamese network backbone and multiple SERPN blocks. The Siamese network backbone is responsible for computing the convolutional feature maps of the template patch and the search patch, which uses a lightweight convolutional network. The SERPN block includes a classification branch and a regression branch. Specifically, the classification branch performs foreground-background classification on each point of the correlation layer, and the regression branch performs bounding box regression on the corresponding position.
MobileNet-driven Siamese Tracking
Modern deep neural networks [29] have proven to be effective as the feature extraction backbone in
Siamese network-based trackers [8], which has led to an increasing number of trackers [9, 25] using the deep network as the backbones. Although the performance of trackers has improved, the resulting inefficiencies have been neglected. In our work, we utilize MobileNetV2 [12] as the backbone network, whose parameters are listed in Table 2.
(1)
Layer (type) | Output shape | Parameters | Connected to |
Input_1 (input layer) | (None, 255, 255, 3) | 0 | |
Conv1_pad (ZeroPadding2D) | (None, 255, 255, 3) | 0 | Input_1[0][0] |
Conv1 (Conv2D) | (None, 112, 112, 32) | 864 | Conv1_pad [0][0] |
Bn_Conv1(BatchNormalization) | (None, 112, 112, 32) | 128 | Conv1[0][0] |
Conv1_relu (Relu) | (None, 112, 112, 32) | 0 | Bn_Conv1[0][0] |
Expanded_conv_depthwise_Depthw | (None, 112, 112, 32) | 288 | Conv1_Relu [0][0] |
Expanded_conv_depthwise_BN | (None, 112, 112, 32) | 128 | Expanded_conv_depthwise [0][0] |
Expanded_conv_depthwise_Relu | (None, 112, 112, 32) | 0 | Expanded_conv_depthwise_BN [0][0] |
Expanded_conv_project | (None, 112, 112, 16) | 512 | Expanded_conv_depthwise_Relu [0][0] |
Expanded_conv_depthwise_BN | (None, 112, 112, 16) | 64 | Expanded_conv_project [0][0] |
Block_1_expand (Conv2D) | (None, 112, 112, 96) | 1536 | Expanded_conv_depthwise_BN [0][0] |
Block_1_expand_BN (BatchNormalization) | (None, 112, 112, 96) | 384 | Block_1_expand [0][0] |
Block_1_expand_relu (Relu) | (None, 112, 112, 96) | 0 | Block_1_expand_BN [0][0] |
Block_1_pad (ZeroPadding2D) | (None, 113, 113, 96) | 0 | Block_1_expand_relu [0][0] |
Block_1_depthwise | (None, 56, 56, 96) | 864 | Block_1_pad [0][0] |
Block_1_depthwise_BN (BatchNormalization) | (None, 56, 56, 96) | 384 | Block_1_depthwise [0][0] |
Block_1_depthwise_relu (Relu) | (None, 56, 56, 96) | 0 | Block_1_depthwise_BN [0][0] |
Block_1_project (Conv2D) | (None, 56, 56, 24) | 2304 | Block_1_depthwise_relu [0][0] |
Block_1_project_BN | (None, 56, 56, 24) | 96 | Block_1_project [0][0] |
Block_2_expand (Conv2D) | (None, 56, 56, 144) | 3456 | Block_1_project_BN [0][0] |
… | … | … | … |
Block_16_project_BN (BatchNormalization) | (None, 7, 7, 320) | 1280 | Block_16_project [0][0] |
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
Training Dataset
The backbone network of our tracker is pre-trained on ImageNet [43] for image labelling because the pre-trained network converges faster, which has been proven in other works [8]. We train the network on training dataset of ImageNet-DET [43], ImageNet-VID, COCO [44], LaSOT [45] and GOT-10k (For training) [17] to learn a generic notion of how to measure the similarities between general objects for visual tracking. In both training and testing, we use single scale images with 127 pixels for template images and 255 pixels for search images.
Implementation Details
In our experiments, we follow SiamRPN++ [8] for the training and inference settings. Our proposed method SiamSERPN is trained with stochastic gradient descent (SGD) and sets the batch size to 64. We trained a total of 50 epochs, with a warm-up learning rate of 0.001 for the first 10 epochs to train the SERPN. The last 40 epochs train the entire network end-to-end with a learning rate that exponentially decreases from 0.005 to 0.0005. Weight decay is 0.0005, and momentum is 0.9. In SiamSERPN, the loss function in the classification branch uses the cross-entropy loss, and the regression branch uses the DIoU loss from Section 3.4. The training loss is the sum of classification loss and the DIoU loss for regression. Our approach is implemented in Python using PyTorch on a PC with Intel Xeon E5-2667 v3 3.20 GHz CPU, 32 G RAM, and Nvidia RTX 2080ti. The training time for the model lasts about a week, depending on the PC’s GPU specifications. Implementation Details about hardware configurations and hyperparameters of our experiment are given in Table 3.
Table 3. Implementation details of the hardware configurations and hyper-parameters in experiments
Hardware
Software
CPU
Intel Xeon E5-2667 v3
Training epoch
50
GPU
Nvidia RTX 2080ti
Warm-up learning rate
0.001
RAM
32G
Learning rate
0.005 to 0.0005
Momentum
0.9
Classification loss
Cross-entropy loss
Regression loss
DIoU loss
Comparisons and Analyses
We compare the proposed SiamSERPN tracker with other current mainstream trackers in three extensive tracking benchmarks, including VOT2016, VOT2018, and GOT-10k (for testing). It is worth noting that the performance evaluation methods adopted in our comparisons and analysis are given by the official papers of the tracking benchmarks.
VOT2016 dataset
We test our SiamSERPN tracker on the VOT-2016 benchmark [15] in comparison with the current mainstream trackers. The VOT2016 public dataset is one of the most common benchmarks for evaluating single object trackers and includes 60 public video sequences with different challenge factors. Following the evaluation protocol of VOT-2016, we adopt the except average overlap (EAO), accuracy (average overlap during successful tracking periods) and robustness (failure rate) to compare different trackers. Accuracy expresses the overlap between the predicted bounding box and the ground-truth box, and it can be calculated as follows:
(11)
where OS_t denotes the overlap rate at t frame, $N_{valid}$ denotes the number of video sequences that are successfully tracked. As previously mentioned, the robustness represents the stability of the tracker, where the larger value indicates the poorer stability, and it calculated as:
(12)
where F(k) represents the initial number of times after tracker failed. EAO combines accuracy and robustness. Firstly, all sequences in the benchmark are classified by the total number of video sequences $N_s$. Next, calculate the number of video frames that are tracked accurately. The EAO value of video with number $N_s$, can be obtained as:
(13)
The EAO value used for ranking trackers is the average EAO value for each frame. The detailed comparisons are listed in Table 4. Besides, we also count the average FPS while testing.
Table 4 shows the comparison results of our tracker with the current mainstream trackers, including the deep learning tracker TCNN [46], the correlation filtering tracker CCOT [47], and the mainstream Siamese network trackers, where Siamese network-based trackers include SiamRPN [11], DaSiamRPN [24], SiamMask [25] and SiamRCNN [9]. We observe that the proposed tracker SiamSERPN is the highest score of 0.479 in EAO but ranks second in accuracy and robustness, which are 0.641 and 0.191, respectively. Compared with SiamRCNN, our tracker lags 1% and 10% behind in accuracy and robustness. We think the core reason is that SiamRCNN, as a two-stage tracker, adopts the bounding box regression and the re-detection strategies, which improve the performance and stability of SiamRCNN. Still, a coin has two sides. These two strategies severely slow down SiamRCNN due to heavy hyperparameters. To the best of our knowledge, SiamRCNN’s average speed does not exceed 25 FPS, while our tracker is close to 70 FPS on VOT2016 due to the lightweight backbone (Fig. 5).
Our tracker leads in all three metrics compared to SiamMask, which use standard RPN. Especially,
our tracker achieves substantial gains of 14% in EAO, due to the fact that SiamMask introduces RPN from detection without any modification. In contrast, benefiting from SERPN, our tracker SiamSERPN scored 0.191 on robustness, 19% ahead of SiamMask. We believe the core reason is that the DIoU metric and DIoU loss are utilized in SERPN, both of which have a better ability than the original metric and loss function used in standard RPN to handle complex scenes in visual tracking. It is worth noting that SiamMask runs at 55 FPS [25] on the VOT2016 benchmark, while our SiamSERPN runs at close to 70 FPS, which means that the lightweight network as the backbone can provide substantial advancements in speed.
Table 4. Comparisons on VOT2016
The best results are highlighted in red, and the second-best results are blue fonts. SiamMask is the RPN-based version.
Trackers
Accuracy
Robustness
EAO
MLDF
0.49
0.233
0.311
SSAT
0.577
0.291
0.321
TCNN
0.554
0.268
0.325
CCOT
0.539
0.238
0.331
SiamFC
0.568
0.262
0.387
SiamRPN
0.618
0.238
0.393
DaSiamRPN
0.612
0.221
0.411
SiamMask
0.623
0.233
0.412
SiamRCNN
0.645
0.172
0.46
SiamSERPN
0.641
0.191
0.479
VOT2018 dataset
We likewise evaluated our tracker on the VOT2018 benchmark [16], containing 60 public video sequences with several challenging topics, including fast motion, occlusion, etc. As the following generation to VOT2016, VOT2018 also uses the three previous official metrics: accuracy, robustness, and EAO as evaluation methods.
As shown in Table 5, we compare the current mainstream trackers, including the deep learning tracker LSART [48], CPT [49] and the anchor-free method SiamFC++ [31]. Our tracker achieves the raking second only to SiamRCNN in VOT2018, where we are slightly behind SiamRCNN by 1.5% in accuracy and 1.8% in EAO, but we are the same as SiamRCNN in robustness. Meanwhile, we lag the anchor-free tracker SiamFC++ by 17% in robustness, we think the reason is that quality evaluation branch proposed by SiamFC++ greatly helps it maintain stability. The lack of a quality evaluation branch is a weakness of our tracker, which results in a less robustness score for our tracker than SiamFC++. Nevertheless, our SiamSERPN still achieves a 7.4% improvement in accuracy over it.
We set DaSiamRPN which use the standard RPN as the benchmark method to explore the differences between the proposed SERPN and RPN when testing. In Table 5, our method significantly improves more than 5.2% and 19% in accuracy and EAO, respectively. The core reason is that our proposed SERPN is more capable of performing in the face of complex scenes than RPN used in previous trackers. Besides, our tracker yields substantial gains of nearly 35% on robustness, which is the layer-wise feature aggregation to single RPN advantage.
Table 5. Comparisons with the mainstream trackers in terms of EAO, robustness (failure rate), and accuracy on the VOT2018 benchmarks
GOT-10k dataset
Trackers
Accuracy
Robustness
EAO
CFCT
0.505
0.258
0.300
SRCT
0.520
0.290
0.310
LSART
0.495
0.218
0.323
DLSTpp
0.543
0.224
0.325
DaSiamRPN
0.569
0.337
0.326
SA_Siam
0.566
0.258
0.337
CPT
0.506
0.239
0.339
SiamFC++
0.556
0.183
0.400
SiamRCNN
0.609
0.220
0.408
SiamSERPN
0.598
0.220
0.401
GOT-10k is a large diversity dataset recently released by CAS for the same object tracking in the field. It contains more than 10,000 video sequences of real-world moving objects. At the same time, the protocol guarantees the fairness of the trackers, and all methods use the same training data provided by the dataset. In particular, there are zero overlaps between the training and testing datasets classes. The tracker authors need to train on the officially given dataset, test the methods and upload the results to the official website. The evaluation is performed automatically through the official website. The AO represents the average overlaps between the estimated bounding box and the ground-truth boxes. The SR0.5 denotes the rate of successfully tracked frames whose overlap exceeds 0.5, while SR0.75 represents the rate of successfully tracked frames whose overlap exceeds 0.75.
As the details are listed in Table 6, we focus on the Siamese network trackers and deep learning ones [50, 51]. We achieve the second-best score in the GOT-10k benchmark. Compared to the two-stage tracker SiamRCNN, our tracker is weaker than it except for the speed metric. We believe the core reason is that the GOT-10k benchmark contains a large number of wild scenes, which are more complex than other visual benchmarks. Since our proposed SERPN block uses a lightweight network SENet, the proposed tracker is not strong enough to deal with complex scenes in the wild. Although we introduce DIoU to compensate for this phenomenon, it is still a weakness of our tracker. Despite this, our tracker still achieves the second-best result compared to other trackers while running almost 27 times faster than SiamRCNN.
Comparisons our tracker with mainstream trackers on GOT-10k
The best results are highlighted in red, and the second-best results are blue fonts.
Trackers
mAO
mSR0.5
mSR0.75
FPS
DSiam
0.417
0.461
0.149
3.780
SiamFCv2
0.434
0.481
0.19
19.6
SA_Siam_P
0.445
0.491
0.165
25.40
DeepSTRCF
0.449
0.481
0.169
1.07
DaSiamRPN
0.444
0.536
0.220
134.4
THOR
0.447
0.538
0.204
1
SiamSERPN
0.604
0.726
0.472
75.45
SiamRCNN
0.649
0.728
0.593
2.790
Moreover, compared to the benchmark tracker DaSiamRPN, our SERPN significantly improves the scores by 27%, 26%, and 54%, relatively for AO, SR0.5, and SR0.75. We believe that GOT-10k contains a large number of general field objects and that these scenarios are more complex and challenging. The RPN used by DaSiamRPN is incapable of handling a large number of complex cases, which results in much weaker than our SiamSERPN. However, it is worth noting that DaSiamRPN’s speed of 134 FPS is the highest-ranked tracker. Although the proposed tracker is only half as fast as DaSiamRPN, it still achieves 75 FPS and far exceeds the 25 FPS. We observe that our SiamSERPN is competitive in both performance and speed types of metrics, which shows that our tracker is able to find a balance between speed and performance.
Summary of comparison experiments
After experiments on three large visual tracking benchmarks, the proposed tracker achieved competitive scores while running at approximately 70–75 FPS. On the VOT series benchmark, our trackers achieved competitive scores of 0.479 and 0.401, respectively. Especially on the VOT2016 benchmark, the proposed tracker SiamSERPN obtains the best performance, which adequately demonstrates that the proposed tracker equipped with SERPN blocks can achieve higher performance than the mainstream trackers that utilize the standard RPN. On the GOT-10k benchmark, SiamSERPN achieves second place in all indicators, which means that it performs better than high-speed trackers and is more efficient than high-performance trackers. One of the FPS scores is 75, substantially more than 25 FPS, which indicates that the proposed method is the real-time tracker. Finally, by combining the results of the VOT series benchmark with the GOT-10k benchmark, our tracker can achieve a balance of both performance and speed.
Ablation Study
We conduct ablation experiments on the VOT-2016 benchmark. We first explore multi-level aggregation, in which variants all use the standard RPN. After that, we test the changes carry by IoU, DIoU, and SERPN blocks, respectively.
Multi-level aggregation
We conduct ablation experiments on multi-layer aggregation to explore the role of different level features and the effect of multi-layer feature aggregation. We design multiple variants of the proposed SiamSERPN. At first, we do not output features at any level, similar to SiamFC, which outputs features directly through the convolution of two identical and shared parameter networks for object tracking. However, compared to the benchmark tracker SiamRPN, the lack of RPN assistance leads to severe performance loss, despite the stronger feature extraction capability of the deep network MobileNetV2. For the variant that uses the single SERPN block, we adopt the original IoU metric in the classification branch and a standard smooth-ln loss function in the regression branch for bounding box regression. When the Siamese subnetwork feeds the extracted features to the single SERPN block, first, the standard RPN generates five scales of anchor boxes. Then the classification branch estimates whether the anchor boxes contain the object based on the IoU value (following SiamRPN, the IoU values greater than 0.6 are positive samples). Meanwhile, the regression branch performs bounding box regression on these positive samples using the smooth Ln-norm loss to obtain object position. Finally, the features are fed into the squeeze and excitation section for channel reweighting to improve the performance of the tracker. Although the use of single-layer features can lead to some performance gains, we observe that the performance of these variants is basically equivalent whether the SERPN blocks are placed on conv3, conv5, or conv7. Therefore, the performance gain from a single SERPN block is limited. Compared to single-layer features, performance is improved when two-layer features are aggregated, with conv3 and conv7 aggregation performing the best, improving by 1% over the baseline tracker SiamRPN. We believe this is because the latter SERPN block can further refine the output of the features from the previous block. As a result, after aggregating three layers of features, our tracker gradually achieves the best results.
Classification and regression
The classification task and the regression task play a key role in the performance of the tracker, but previous Siamese network trackers do not pay sufficient attention to them. We adopt the DIoU metric and DIoU loss in the classification branch and regression branch of RPN to distinguish foreground-background and bounding box regression, respectively. In Table 7, DIoU, as metric and loss function, leads over IoU metric and the smooth Ln-norm loss in the context of three-layer features aggregation. However, we observe that the improvement is not significant. We believe the core reason is that the single improvement is too slight to impact the tracker, which lacks the ability to handle challenging scenarios.
Table 7. Ablation study of the proposed tracker on VOT2016
L3, L5, and L7 represent conv3, conv5, and conv7, respectively. I/S and D/DL represent IoU metric/smooth Ln-norm loss and DIoU/DIoU loss. SERPN denotes the RPN using squeeze and excitation operations. We set SiamRPN as the benchmark tracker, which uses AlexNet as the backbone. Besides, MobileNet represents MobileNetV2.
Backbone
L3
L5
L7
I/S
D/DL
SERPN
EAO
AlexNet
v
0.397
MobileNet
v
0.377
v
v
0.383
v
v
0.384
v
v
0.384
v
v
v
0.397
v
v
v
0.399
v
v
v
0.4
MobileNet
v
v
v
v
0.403
v
v
v
v
0.407
v
v
v
v
v
0.411
MobileNet
v
v
v
v
v
0.479
RPN with squeeze and excitation
We simply operate the squeeze and excitation operation for the RPN in this variant and do not use the DIoU metric and DIoU loss to explore the effects of SERPN. As can be seen in Table 7, although feature aggregation using SERPN blocks alone can further improve the performance of the tracker, the degree of performance improvements is not significant. We believe the core reason is similar to that described previously: the limited performance increasing from a single improvement.
Summary of ablation study
Finally, we compare the proposed method SiamSERPN with all variants and the benchmark tracker, and we find a great performance improvement. Because the complete SERPN block adopts the advanced DIoU for foreground-background classification and bounding box regression. Specifically, the DIoU metric can handle the relationship between the predicted bounding box and the ground-truth box in the classification task, which is easily ignored by the IoU metric. DIoU as the loss function can jointly optimize the coordinates of the predicted bounding box during bounding box regression, resulting in more accurate location information. Based on the complete SERPN block, utilizing the deep architecture of MobileNetV2, multi-layers allow features to be aggregated and output final refined features for peak performance. After combining all the improvements, our tracker achieves the best possible performance that demonstrate proposed improvements are effective and synergistic.
In this paper, we propose a visual tracking framework that can balance performance and speed named SiamSERPN. It consists of a Siamese subnetwork and multiple proposed SERPN blocks. The former utilizes two identical lightweight MobileNetV2 as the backbone to achieve efficiency. The latter consists of the standard RPN and squeeze-excitation section to compensate for the performance loss caused by the lightweight backbone. Specifically, the proposed SERPN block improves the performance via two main strategies. One is to reweight the rough features extracted from the backbone by squeezing and excitation to retain the valuable features and filter the unnecessary ones. The other is to introduce DIoU for foreground-background classification and bounding box regression to fix the deficiencies of traditional classification metric and regression loss function adopted in standard RPN. Extensive experiments on multiple tracking benchmarks show that our trackers achieve competitive performance while operating efficiently. It scores 0.479 on the VOT2016 benchmark, which is 4% ahead of the second place. Despite being second on both the VOT2018 and GOT-10k benchmarks, it runs at 70 FPS and 75 FPS, respectively, which significantly exceeds the minimum speed requirement of real-time performance (25 FPS). Being more efficient than other anchor-based trackers is the advantage of our SiamSERPN. Still, the proposed method is also essentially an anchor-based tracker, which inherently introduces many hyperparameters and complexity. Therefore, SiamSERPN still has the potential for further speedup, which leads to a gap between SiamSERPN and practical applications. In the future, we will focus on applying lightweight networks to anchor-free trackers, which only use the expressive power of fully convolutional networks to achieve visual tracking, leading this type of trackers more efficient and taking up fewer resources than anchor-based ones. We expect future work might be able to narrow the gap between academic methods and practical applications in object tracking field.
Conceptualization, DC, MZ. Investigation and methodology, RD. Formal analysis, JW. BJ. Supervision, QA, AT. Writing of the original draft, RD. Writing of the review and editing, SSB, AM.
This work was funded by the National Natural Science Foundation of China (Grant No. 62272063, 62072056, 61902041 and 61801170), Open research fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education, project of Education Department Cooperation Cultivation (Grant No. 201602011005 and No. 201702135098), China Postdoctoral Science Foundation (Grant No. 2018M633351), the National 13th Five National Defense Fund (Grant No. 6140311030207). Researchers Supporting Project No. RSP2023R102 King Saud University, Riyadh, Saudi Arabia.
The authors declare that they have no competing interests.
Please be sure to write the name, affiliation, photo, and biography of all the authors in order.
Only up to 100 words of biography content for each author are allowed.
Name: Dun Cao
Affiliation: Department of Computer and Communication Engineering
Changsha University of Science and Technology
Changsha, China
Biography: Dun Cao (Member, IEEE) received the B.S. degree in communication engineering from Central South University, China, in 2001, the M.S. degree in information systems and communications from Hunan University, China, in 2006, and the Ph.D. degree in vehicle engineering from the Changsha University of Science and Technology, China, in 2017. She is currently a Faculty Member with the School of Computer and Communication Engineering. She was a Visiting Scholar with the National Mobile Communications Research Laboratory, Southeast University, China, from 2012 to 2013, and with The University of Texas at Arlington, from 2017 to 2018. Her research interests include vehicular networks and MIMO wireless communications.
Name: Renhua Dai
Affiliation: Changsha University of Science and Technology
School of Computer and Communication Engineering
Changsha, China
Biography: Renhua Dai received the B.S. degree in computer science and technology from Orient Science and Technology College of Hunan Agriculture University, Changsha, China, in 2017. He is currently pursuing the master’s degree in software engineering with the School of Computer and Communication Engineering at Changsha University of Science and Technology, Changsha, China. His current research interests include computer vision, visual tracking, and machine learning.
Name: Jin Wang
Affiliation: Changsha University of Science and Technology
School of Computer and Communication Engineering
Changsha, China
Biography: NATALIA KRYVINSKA is a Full Professor and a Head of Information Systems Department at the Faculty of Management, Comenius University in Bratislava, Slovakia. Previously, she served as a University Lecturer and a Senior Researcher at the eBusiness Department, University of Vienna’s School of Business Economics and Statistics. She received her PhD in Electrical&IT Engineering from the Vienna University of Technology in Austria, and a Docent title (Habilitation) in Management Information Systems from the Comenius University in Bratislava, Slovakia. Her research interests include Complex Service Systems Engineering, Service Analytics, and Applied Mathematics.
Name: Baofeng Ji
Affiliation: LAGEO
Institute of Atmospheric Physics
Chinese Academy of Sciences
Biography: Baofeng Ji (fengbaoji@126.com) received his Ph.D. degree in information and communication engineering from Southeast University, China, in 2014. Since 2014 he has been a postdocto-rial fellow in the School of Information Science and Engineering, Southeast University. He has published over 40 peer-reviewed papers and three scholarly books. In 2009, he was invited to serve as an Associate Editor for the International Journal of Electronics and Communications, and has been a reviewer for over 20 international journals. He was selected as the Young Academic Leader of Henan University of Science and Technology in 2015.
Name: Osama Alfarraj
Affiliation: Computer Science Department
Community College
King Saud University, Riyadh, Saudi Arabia
Biography: Osama Alfarraj received the master’s and Ph.D. degrees in information and communication technology from Griffith University, in 2008 and 2013, respectively. He is currently an Associate Professor of computer sciences with King Saudi University, Riyadh, Saudi Arabia. His current research interests include eSystems (eGov, eHealth, and ecommerce), cloud computing, and big data. For two years, he has served as a Consultant and a member for the Saudi National Team for Measuring E-Government, Saudi Arabia.
Name: Amr Tolba
Affiliation: Mathematics and Computer Science Department
Faculty of Science
Menoufia University, Shebin El-Kom, Egypt
Biography: Amr M. Tolba received the M.Sc. and Ph.D. degrees from the Mathematics and Computer Science Department, Faculty of Science, Menoufia University, Egypt, in 2002 and 2006, respectively. He is currently an Associate Professor with the Faculty of Science, Menoufia University. He is on leave from the Computer Science Department, Menoufia University, and the Community College, King Saud University (KSU), Saudi Arabia. He has authored or coauthored over 65 scientific papers in top-ranked (ISI) international journals and conference proceedings. His research interests include socially aware networks, vehicular ad hoc networks, the Internet of Things, intelligent systems, and cloud computing.
Name: Pradip Kumar Sharma
Affiliation: University of Aberdeen, Aberdeen, U.K.
Biography: Pradip Kumar Sharma [M'18, Sm'21] (pradip.sharma@abdn.ac.uk) is an assistant professor of cybersecurity in the Department of Computing Science at the University of Aberdeen, United Kingdom. He received his Ph.D. in CSE (August 2019) from Seoul National University of Science and Technology, South Korea. He worked as a postdoctoral research fellow in the Department of Multimedia Engineering at Dongguk University, South Korea. His research interests are in the areas of cybersecurity, blockchain, edge computing, SDN, security and privacy in AI, and IoT security.
Name: Min Zhu
Affiliation: College of Information Science and Technology
Zhejiang Shuren University
Hangzhou, China
Biography: Min Zhu received the B.S. degree from the Nanjing University of Posts and Telecommunications, China, in 2002, the M.S. degree from the Beijing University of Posts and Telecommunications, China, in 2005, and the Ph.D. degree from the Nanjing University of Posts and Telecommunications, in 2018. She is currently with the College of Information Science and Technology, Zhejiang Shuren University, China. Her research interests include routing protocol, optimization algorithm design, and key technologies in 5G.
Dun Cao1, Renhua Dai1, Jin Wang1, Baofeng Ji2, Osama Alfarraj3, Amr Tolba3, Pradip Kumar Sharma4, and Min Zhu5,*, Fast Visual Tracking with Squeeze and Excitation Region Proposal Network, Article number: 13:07 (2023) Cite this article 1 Accesses
Download citationAnyone you share the following link with will be able to read this content:
Provided by the Springer Nature SharedIt content-sharing initiative