홈으로ArticlesAll Issue
ArticlesA Spatial Coverage-Based Participant Recruitment Scheme for Mobile Crowdsourcing
  • Jian Yang.1,*, Di Zhang2, ChunMei Hu3, and KaiXuan Wang1

Human-centric Computing and Information Sciences volume 13, Article number: 20 (2023)
Cite this article 2 Accesses
https://doi.org/10.22967/HCIS.2023.13.020

Abstract

A revolutionary distributed problem-solving model is being enabled by the increasing proliferation of mobile smart devices, namely mobile crowdsourcing, which employs ubiquitous mobile users to gather and analyze data beyond what was traditionally possible, while becoming a leading area for study in both academia and industry. However, how to recruit participants to provide high quality data with a limited budget is challenging for a crowdsourcing system. In contrast to traditional fixed sensors, participants are the general population with smart devices, which boast the unique advantage of predictable mobility. To address this problem, this article first studies the relationship between predictable mobility and spatial coverage, and proposes a new strategy to recruit the optimal subset of participants by jointly considering their current and future locations with the aim of ensuring that the recruited participants can provide high coverage sensory data. Secondly, participant recruitment proves to be a NP-hard problem, while a harmony search algorithm with annealing randomness (HSA-AR) is proposed as a trade-off between the quality of problem-solving and the computation time. Finally, through extensive simulations utilizing real-world and synthetic datasets, we demonstrate that HSA-AR outperforms other baseline methods under various settings.


Keywords

Mobile Crowdsourcing, Participant Recruitment, Spatial Coverage, NP-Hard, HSA-AR


Introduction

With the pervasiveness of smart devices, mobile crowdsourcing [1, 2] has been touted by both academia and industry and become a sought-after topic in the mobile network. Mobile crowdsourcing is a new paradigm of data collection and sharing that employs mobile workers with smart devices to collect data from different locations in the physical world. Typically these smart devices are equipped with various sensors, such as GPS, cameras, accelerometers, Bluetooth, etc., which allow mobile workers to easily acquire basic data about various applications and upload data to a crowdsourcing system.
As shown in Fig. 1, a mobile crowdsourcing system is a new form of multi-agent system where two types of agents interact with each other, for example, a set of users distributed throughout a specific geographical region and a cloud-based crowdsourcing platform.

Mobile workers (participants) use smart devices equipped with various sensors for remote patient monitoring [3], road condition monitoring [4] and location-based services[5], etc., and send the sensory data to the crowdsourcing platform through a mobile network or Wi-Fi. In addition, participants will generate certain costs when performing tasks, namely mobile data usage, batteries, travel distance, time consumption, computing power, etc.

A crowdsourcing platform is the manager of the entire system. On the one hand, it is responsible for formulating effective incentive mechanisms [6, 7] and recruiting appropriate participants [8, 9] to maximize the utility of crowd work. On the other hand, it is responsible for data cleaning, desensitization, conversion, and exploring more interesting applications, including data analysis, visualization, and data trading [10, 11], etc.




Fig. 1. A mobile crowdsourcing system.


As mentioned earlier, obtaining high quality sensory data is the goal of the crowdsourcing platform. However, it is unrealistic to recruit all potential participants due to limited budget constraints. The usual approach is to select a subset of potential participants who provide high quality data and maximize the utility of crowd work. Therefore, how to make a trade-off between a limited budget and high-quality sensory data is a challenge for the system.
According to recent research [[12–[14], the quality of mobile crowdsourcing is affected by its specific regions. An effective algorithm, specifically, should record participants' concealed mobility patterns and anticipate their future locations by exploiting their prior trajectories. In other words, collecting high quality data means that a participant's movement trajectory needs to cover as many areas as possible [15]. However, the location of participants will change at different times, so the mobile sensing mode is more dynamic. Fig. 2 illustrates an example of the trajectories of the four participants.
Next, effectively predicting the trajectory of the participants is a prerequisite for solving the problem of participant recruitment. Owing to the rapid development of information technology and research of relevant scholars, trajectory prediction can be realized in various ways, such as navigation [16], periodic movement [17], based on models [18] or simply uploaded by the participants themselves. In particular, we do not attempt to solve the problem of trajectory prediction, which is based on the basic assumption that the trajectories of the participants are predictable and known in advance.
Due to the mobility of participants, the crowdsourcing platform can obtain more comprehensive and substantial data resources with a lower budget. Existing participant recruitment strategies usually only consider a participant’s current location [19, 20], while generally speaking, a participant’s trajectory can cover multiple geographic areas. Especially for tasks that take a long time to complete, such as traffic condition monitoring, the collected sensory data is easily affected by the mobility of participants, and so it cannot be ignored.

Fig. 2. An example of four participants trajectory at different times.


In this paper, predictable mobility as an important evaluation factor is used to recruit participants with the aim of obtaining high quality data within a limited budget. In summary, our work contributes the following: we studied the relationship between predictable mobility and spatial coverage, and proposed a new strategy to recruit the optimal subset of participants by jointly considering their current and future locations, and the problem of participant recruitment is formally defined in Section 2. In Section 3, we prove that participant recruitment is a NP-hard problem, and propose efficient heuristic approaches, namely harmony search algorithm with annealing randomness (HSA-AR) as a trade-off between the quality of problem-solving and the computation time. In Section 4, through extensive simulations utilizing real-world and synthetic datasets, we demonstrate that HSA-AR outperforms other baseline methods under various settings. In Section 5, the conclusions and future research proposals are discussed.


Participant Recruitment Scheme Based on Spatial Coverage

First of all, we proposed the model of participant recruitment based on spatial coverage. Secondly, different cost-benefit models were constructed with the spatial coverage as the driving factor, which creates a foundation for follow-up algorithm evaluation. Finally, an example of participant recruitment in traffic monitoring was given to illustrate the workflow of the proposed approach.
Problem Formulation
Let’s assume that the crowdsourcing platform wants to obtain the monitoring data of urban traffic conditions, it usually specifies the number of geographical areas of interest within a limited time, so as to issue the sensory task in that area. We used R={$r_1$…,$r_i$…,$r_k$} to represent the area of interest. Considering that it is a sensory task, let’s suppose there are m participants interested in the task, which can be denoted as U={$u_1$,…,$u_i$,…,$u_m$}. Generally, the tasks published by the crowdsourcing system are time-sensitive; the participants were required to collect and send sensory data within a certain period of time T={$t_1$,…,$t_i$,…,$t_n$}, where t_i (1≤i≤n) is a specific moment; and the specific granularity should be modified to accommodate the challenges of different sensory tasks. In each period, a certain time τ∈[$t_i$,$t_{(i+1)}$ ] is sampled to indicate the sensing result in the time period. Then, the following matrix can be used to characterize the participants' trajectories throughout this period T.

Definition 1 (Trajectory matrix). The trajectory of participants in time T can be denoted as an m×n matrix as follows:

(1)


where, l_ij is the location of a participant u_i at time $t_j$, i∈[1,m] and j∈[1,n]. Each row indicates a participant's movement through a continuous time series, while each column represents the location of all participants at a specific point in time.
To motivate participants to complete their task, crowdsourcing platforms should provide the participants rewards, either in cash or non-cash form, which are part of the cost of a task.

Definition 2 (Cost of a task). Let C={$c_1$,$c_2$,…,$c_m$ }, which denotes the reward that will be paid to potential participants, for example, participants $u_i$ will receive reward $c_i$ if recruited to complete a task. Then, let’s give the subset of participants S, and so the cost is defined as follows:

(2)


Definition 3 (Budget constraint). The fundamental limitation of the sensory task is that the overall cost of recruiting participants cannot exceed the allocated budget. If the total budget is denoted by δ, the constraint can be determined as follows:

(3)


To recruit the appropriate participants, it is necessary to estimate the quality of data collected by them. As mentioned earlier, the spatial coverage is an important factor, assuming that within a specific period, a participant can completely cover a geographical area. However, redundancy will be created by multiple participants in the same geographic area, a scenario that should be avoided.

Definition 4 (Spatial coverage). Spatial coverage is the ratio of the number of participants arriving in different areas to the total number of spatial areas in all periods, which can be formulated as follows:

(4)


where R is the total number of spatial regions, and w_j is the weight factor at the corresponding time, which is determined by the crowdsourcing platform.

Definition 5. (Spatial coverage-based participant recruitment model, SCPR) Given a sensory task with spatial region R and period T, a group of participants is recruited to maximize the sensory data of high spatial coverage under limited budget constraints, which can be formulated as follows:

(5)


where m is the maximum number of potential stakeholders in the crowdsourcing system, $x_i$ is a binary variable that is set to either 1 if a participant is recruited, or 0 if not so. s indicates the number of participants that have been employed by the system. Furthermore, the overall cost of all recruited participants is less than budget δ.

Spatial Coverage-based Cost-Benefit Model
To facilitate analysis and calculation, we unified the spatial coverage and corresponding costs of the participants into the same unit, and thus proposed multiple cost-benefit models [21, 22] driven by spatial coverage. In this section, we allocate a participant's cost in [5, 20] into four following types:

RandomCost generates a random integer in [5, 20];

LinearCost implies that the cost rises linearly with Cov(s) and set C(u)=15Cov(s)+5;

QuadCost implies that the cost rises quadratically with Cov(s) and set C(u)=15Co$v^2$ (s)+5;

StepCost implies that the cost satisfies the piecewise function denoted below as:


In addition, three benefit models are generated utilizing the spatial coverage as a significant heuristic component and described as follows.

LinearBenefit implies that the benefit rises linearly with Cov and set B(s) = 100Cov(s);

QuadBenefit implies that the benefit rises quadrantally with Cov and set B(s)=100Co$v^2$ (s);

StepBenefit implies that the benefit satisfies the piecewise function denoted below as:




An Example of Participant Recruitment
To demonstrate the workflow of the proposed methodology, an example of participant recruitment in traffic condition monitoring is presented in Fig. 3.
Let’s suppose the crowdsourcing platform wants to monitor traffic conditions in a city by collecting traffic information from mobile users and is interested in traffic conditions in four areas in the next 30 minutes. Fig. 3(a) shows the topology of these four regions, assuming that three participating candidates want to complete the task and report their current GPS locations as {A,B,C}, which is shown as $t_1$ in Fig. 3(b). It should be noted the crowdsourcing platform’s budget allows for only two participants to be recruited.
The traffic conditions will be different at different times. If the crowdsourcing platform wants to monitor more information over different time periods {$t_1$,$t_2$,$t_3$,$t_4$,$t_5$}, the spatial coverage of the participants needs to be calculated. Let’s take u_1 and u_2 participants as examples. For simplicity, it is assumed that each participant is ascribed the same cost. The various cost models proposed in Section 2.2 can be used to evaluate their impact on participant recruitment.
Fig. 3(c) shows the trajectories of $u_1$ and $u_2$ at different times. We observed that except for monitoring two different regions at $t_1$, $t_4$, $t_5$, only one region was monitored during the remaining two time periods, because both participants were in the same region and the collected traffic conditions were duplicated. In addition, the crowdsourcing platform may pay specific attention to traffic conditions at different times, so different weights w_i can be set for spatial coverage at different times, with the sum of the weights (n is the number of divided periods, as shown in Figure 3(d), n=5). For simplicity, we set $w_1$=$w_2$=$w_3$=$w_4$=$w_5$=0.2. According to Definition 5, the spatial coverage of {$u_1$,$u_2$} can be calculated as Similarly, we can calculate the spatial coverage of {$u_1$,$u_34 } is 0.45, {$u_2$,$u_3$} as 0.5. To maximize the benefit of the crowdsourcing platform, the combination of {$u_2$,$u_3$} is an optimal participant recruitment scheme.

Fig. 3. An example of four participants trajectory at different times: (a) the topology of the four regions, (b) participant trajectory matrix, (c) the proportions of $u_1$ and $u_2$ at different times, and (d) weighted spatial coverage.



Solution Approach

Complexity Analysis of SCPR Problem
Theorem 1. Definition 5 states that the SCPR problem is NP-hard. Proof. To illustrate that SCPR is a NP-hard problem, we demonstrated that its decision version is NP-complete. It should be demonstrated that the decision-making problem belongs to NP, and then selected another known NPC problem that can be converted in polynomial time to be reduced into a decision version of SCPR.
We considered the minimum weighted set coverage problem [23] as an example of a NPC problem, which is described as follows: Give a set of elements U={$u_1$,$u_2$,…,$u_m$}, a collections S={$s_1$,$s_2$,…,$s_n$ } of subsets of U, each $s_i$ satisfies $s_i$⊆U, and thus give weight w($s_i$ )>0 . The weighted minimum coverage problem is to find a subset of a collection of subset Q⊆S, so that is the set coverage of the element set U and requires This instance is denoted as X in the following discussion.
Then we transferred instance X to an instance of SCPR. Let’s assume that participant is the set of elements U, and the subset of participants is considered as the element $s_i$ in set S. Meanwhile, the cost of each participant is the corresponding weight w($s_i$ ) of the element $s_i$. Thus, we can construct a set ¯S={$s_1$,$s_2$,…s_n,r}, r⊂U, w(r)>k, while r can not be selected at the same time as each $s_i$. In this way, we can get an instance of a participant recruitment problem denoted by Y.
Thus, q is a solution to X, if and only if q is a solution to Y. Furthermore, the reduction from X to Y requires polynomial time, meaning that the SCPR is a NP-hard problem. ◻

Harmony Search Algorithm with Annealing Randomness
As previously demonstrated, participant recruitment is a NP-hard problem. The number of decision variables increases exponentially with an increase in the number of participants. Even if computational resources are endlessly improved, the exact solution may not be accomplished in a reasonable period. As a result, we introduced a harmonic search algorithm with annealing randomness as a trade-off between computing overhead and solution quality.
Harmony search (HS) is an emerging intelligent optimization algorithm proposed by Geem et al. [24], a Korean scholar, in 2001. It is similar to a genetic algorithm’s imitation of biological evolution [22], ant colony optimization algorithm’s imitation of ant foraging [25], etc. HS is a new heuristic search method that mimics the process by which musicians constantly alter the pitch of their instruments to generate the most beautifully harmonic result. The algorithm is well-applied in addressing various NP-hard problems and has been widely used in many fields such as civil engineering structure design [26], single machine scheduling problems [27], soil stability analysis [28], and so forth.
The pseudocode for HS is shown in Algorithm 1. Generally, HS consists of the five stages as follows.
Step 1 (Initialization parameters): Initialize the key parameters affecting algorithm performance, which are described below:

Harmony memory size (HMS): The size of the harmonic population.

Harmony memory consideration rate (HMCR): The probability of taking a harmony from an existing population (HM harmony library).

Pitch-adjusting rate (PAR): A proportion of the decision factors that will be fine-tuned by altering their pitches.

The number of improvisations (NI): The total number of iterations.



Step 2 (Initialization harmony memory): Several randomly generated harmonies are used to establish harmonic memory. The best harmony discovered during algorithm execution is stored in an HMS × n matrix called HM. Each harmony is a vector that represents the various solutions to the problem's constraints. Furthermore, each of the harmony vectors is randomly initialized to 1 or 0, and 1 if a participant is recruited and 0 if not. The HM matrix is formally represented in Equation (6).

(6)


Step 3 (Improvise a new harmony): The harmony memory consideration rule is very important for obtaining new harmonies, which reflects whether the new candidate harmonies are generated by HM or randomly. For our problem at hand, each bit of the solution has only two binary values of 1 and 0, while the parameter BW is discarded in the pitch adjustment. Therefore, we adopt the DBHS method proposed in [29], in which all elements of a solution only chose a new harmony from the current HM to generate a new one. In other words, Equations (7) and (8) can be used to define individual strategy operations.

(7)


(8)


Step 4 (Update harmony memory) The new harmony $s^{new}$ is evaluated in this stage, namely SC($s^{new}$). Thereafter, the worst solution $s^{worst}$ is replaced by the new harmony solution $s^{new}$, if it’s better, such as, (SC($s^{new}$≤$s^{worst}$). Otherwise, it is discarded.
Step 5 (Verification of stopping criterion): Steps (3) and (4) should be repeated until the number of iterations equals NI.
However, if the initially picked harmonics are near the local optimal, the harmony search may suffer from a serious limitation of slipping into the local optimal. To solve this issue, we proposed a simulated annealing approach [30] that increases the accuracy and robustness of the harmony search. The effectiveness of the simulated annealing algorithm is that it can generate sub-optimal solutions within a certain probability, which depends on the synthesis temperature in the metallurgical annealing process, and can gradually reduce the probability by decreasing the temperature parameters. Therefore, SA can explore potential solutions in a larger region, thereby improving the exploration ability of the algorithm and finding the global optimal solution.
Algorithm 2 shows the pseudocode of simulated annealing (SA). The algorithm begins with the HS generation models, while the temperature parameter is set to the baseline value $T_0$. The next procedures are then executed until the ending threshold is reached. Firstly, the neighbor function is applied to construct a new solution $s^{new}$ for "s" by modifying the value of one or more components, and then the fitness difference between the two solutions is calculated as Δ. Let’s consider the candidate solution as the current solution, and update the best solution if it is better. Alternatively, it is recognized as the new current solution with a probability of exp (-Δ/αT), which is determined by its fitness and current temperature T. Finally, the temperature steadily drops with the cooling factor, decreasing the possibility of accepting poor inferior options.
We proposed that the participant recruitment strategy should begin with a harmony search and then move on to simulated annealing. The harmony search algorithm is performed by initializing a harmony memory, with a new solution obtained from the HM. All components of the solution are based on the HMCR, while improvisation is performed with the HMCR parameter and PAR. Then choose the current best harmony vector as a candidate solution. Next, the SA is applied, where not only the optimal harmony is always accepted, but also the probability of the suboptimal harmony being accepted is determined by the fitness of the new harmony and existing temperature. The proposed harmony search with annealing randomness is illustrated in Algorithm 3.



Experimental Study

In this section, through extensive simulations utilizing real-world and synthetic datasets, we demonstrate that HSA-AR outperforms the three baseline methods under various settings. Our experiments are coded in Python under a standard server (Windows), along with an Intel Xeon Gold 5320H processor and 64 GB of main memory.

Experimental Setting
4.1.1 Datasets
Real-world data: We have used a real dataset [31] to evaluate the performance of the proposed algorithm. The dataset contains the road data of 4,000 taxis in Shanghai within 24 hours. The sampling interval of taxi-based GPS data is 1 minute and composed of the vehicle ID, time stamp, latitude, longitude, speed, etc.
To facilitate the simulation, we preprocessed the dataset as follows: (1) We randomly selected 1,000 taxis as participants in the original dataset; (2) The simulation range was limited to a rectangular area of 10 km × 10 km, which was divided into 10,000 grid cells of similar size. Also, the GPS location of each vehicle was mapped to these grid cells; (3) We divided 24 hours a day into 48-time windows with 30 minutes as the minimum time granularity, denoted as T={$t_1$,$t_2$,⋯,$t_48$}, and the location of each vehicle at each time window was recorded. (4) We randomly selected some famous buildings or tourist landmarks as the sensory task locations. The parameters of the real-world dataset are given in Table 1.

Table 1.Parameter setting for the real-world data

Parameter Description Value
n Number of participants 1000
𝑅 Range constraints for region 10 km × 10 km
g Number of grids 10,000
$T_g$ Time granularity 30 minutes


Synthetic data: We used the Gaussian-Markov movement model [32] to generate the simulated movement tracks of 10 to 3,000 participants. The randomness of the Gaussian-Markov model is 0.1. We divided the granularity of the spatial region into square blocks with even sides of 10 m to 1,000 m, and randomly assigned these participants. In addition, the time granularity is divided into 5 to 50 minutes, while the budget constraint is set to 100 to 1,000. The parameter settings of the synthetic data are given in Table 2.
Table 2.Parameter setting for the synthetic data
Parameter Name Range of values Default value
n Number of participants {100, 500, 1000, 1500, 2000,2500,3000} 1,000
𝑅 Range constraints for region {10, 100, 300, 500, 700, 900} 300
g Time granularity {5, 10, 20, 30, 40, 50} 30
$T_g$ Budget constraint {100, 200, 400, 600, 800, 1000} 400


Baseline algorithms
The three baseline algorithms have been introduced to test the performance of the proposed approach. One is particle swarm optimization (PSO) algorithm as shown in [33]. PSO is a kind of population-based stochastic optimization technology, which imitates the swarm predation behavior of insects, birds, and fish. The essential thought is to identify the best solution by collaborating and exchanging information among members of the target population. The algorithm is widely utilized in a variety of fields, including function optimization, neural network training, fuzzy system control, etc. In the experiment, the swarm size was set to 30; the inertia weight was altered linearly between 0.9 and 0.4; and the acceleration coefficients were set at 1.5. In addition, the other baseline algorithms are two simplified versions of HSA-AR, namely HS and SA.
Furthermore, we altered parameters independently and incrementally, so when one was tuned, the others were reset to their default settings. While the next parameters were being explored, the preceding ones were given the best value determined so far. The parameters and default values of the HSA-AR algorithm are indicated in Table 3.

Table 3.Parameter setting for the HSA-AR algorithm
Parameter Name Range of values Default value
HMS Harmony memory size {5, 10, 15, 20, 25, 30} 10
HMCR Harmony memory consideration rate {0.91, 0.93, 0.95, 0.97, 0.99} 0.99
PAR Pitch adjusting rate {5, 10, 20, 30, 40, 50} 0.3
$N_i$ Number of iterations {100, 200, 400, 600, 800, 1000} 150
a Cooling rete factor {0.990, 0.995, 0.999, 0.9995, 0.9999} 0.995
T Initial temperature {200, 400, 600, 800, 1000} 400
$T_low$ Termination temperature 0.1 0.1


Overall Results
It is difficult to establish and quantify the global optimal solution vector of a restricted optimization problem, making a trade-off between constraints and objective function value problematic. In this paper, we penalize infeasible solutions depending on their remoteness from the feasible region. As commonly known, the maximization of f(x) can be converted to the minimization of -f(x), and then according to Definition 5, the corresponding penalty function is defined as: min SC(s)= -Cov(s)+ α×max⁡(0,β), where
is the penalty coefficient, which is set as $10^{10}$. In addition, we set δ=B(1)/2 (B(1) corresponds to the maximum benefit).
To evaluate the performance of various algorithms, we applied the cost-benefit models proposed in Section 2.2 to the real-world dataset and randomly combined three benefit models and four cost models to generate a total of 12 different instances. In our experiments, each algorithm is tested utilizing 5 metrics. The "best," "median," and "worst" cases are selected from the total results of 30 independent runs, with the "mean," "st.d" being used to estimate the stability of all algorithms.
The execution results of four algorithms are shown in Table 4. Firstly, we observed that HSA-AR, HS, SA, and PSO obtained the best values of 7, 3, 2, and 0 in 12 instances, respectively. Therefore, HSA-AR achieved the highest share of best values at around 58.33%. Secondly, the solution obtained by using the StepBenefit model is poor, which may be caused by the benefit discontinuity with an increase of the spatial coverage, which is consistent with the results in [21]. Moreover, we used a gap measure to test the statistical characteristics of the average performance of each algorithm. A gap is a key concept for evaluating the quality of the obtained results, and is defined as the percentage of the difference between the values of the mean value and best-known value.

(9)


Secondly, a gap can be used to estimate the average performance of each approach. In other words, the closer a gap is to the x-axis, the superior the algorithm's average performance. As shown in Fig. 4, the average performance of HSA-AR is better than that of HS, SA, or PSO. Thus, it is the best among all algorithms, with HS and SA as the next best and PSO the least.
Finally, we can generate a histogram based on the value of "st.d" and use the distribution of columns to test the robustness of all algorithms. Fig. 5 shows that the stability of HAS-AR is significantly greater than the other three methods.

Table 4.Detailed all algorithms results
No. Benefit Cost Algorithm Best Mean Worst St.d
1 Linear Random HSA-AR 66,296 63,185 60,962 123.29
PSO 65,578 59,610 53,182 312.63
HS 64,732 62,136 55.221 254.21
SA 64,916 62,515 58,527 198.57
2 Linear HSA-AR 69,485 67,192 63,204 133.15
PSO 64,548 60,231 58,863 302.39
HS 65,984 63,236 59,105 235.27
SA 65,122 64,572 58,332 268.53
3 Quad HSA-AR 74,218 71,125 68,742 112.56
PSO 73,406 64,229 57,221 327.37
HS 71,256 67,773 61.968 244.22
SA 72,284 66,925 61,827 224.36
4 Step HSA-AR 70,245 67,932 63,548 126.37
PSO 66,113 62,231 59,169 278.16
HS 70,436 65,552 61,051 295.14
SA 67,938 62,818 58,862 248.32
5 Quad Random HSA-AR 35,523 33,153 30,258 145.81
PSO 33,361 29,827 26,886 298.54
HS 34,252 32,331 29,319 198.21
SA 36,228 31,241 28,433 340.13
6 Linear HSA-AR 36,192 34,862 32,824 110.85
PSO 34,521 31,258 27,011 242.1
HS 36,339 33,122 31,927 144.85
SA 35,656 32,521 29,616 168.22
7 Quad HSA-AR 37,768 35,395 33,654 107.51
PSO 35,128 32,746 28,221 198.22
HS 36,153 33,712 31.068 184.76
SA 35,994 33,475 30,327 194.58
8 Step HSA-AR 36,045 34,847 32,044 116.15
PSO 33,824 30,723 28,863 233.06
HS 34,080 32,451 30,058 165.54
SA 33,993 31,418 28,862 188.32
9 Step Random HSA-AR 95,724 85,848 67,451 217.41
PSO 84,236 68,945 60,032 478.23
HS 89,342 72,628 65,396 422.55
SA 96,156 74,863 64,150 462.25
10 Linear HSA-AR 104,124 96,547 73,113 252.18
PSO 85,423 76,326 67,916 378.87
HS 95,125 83,541 70,284 258.32
SA 93,251 80,877 68,153 436.08
11 Quad HSA-AR 115,768 97,395 83,742 282.82
PSO 99,442 83,946 75,221 337.2
HS 104,161 86,411 78,968 314.71
SA 100,562 87,752 73,827 398.28
12 Step HSA-AR 101,245 92,847 84,328 246.15
PSO 92,824 84,723 78,863 328.16
HS 93,080 88,451 82,054 295.14
SA 92,993 87,418 80,821 278.35


Fig. 4. Percentage of the gap in various instances.


Fig. 5. Standard deviation (St.d) histogram.


Performance evaluation of all algorithms using synthetic datasets
We applied LinearBenefit and LinearCost models on the synthetic dataset to compare the performance of four algorithms with different settings. Our model has some configurable parameters, namely the granularity of spatial and time, the number of participants, and budget constraints, which can be altered to meet the needs of specific applications. We analyzed the results of all algorithms in this work by adjusting these parameters, according to the values indicated in Table 2.

Different spatial differentiation granularity
Fig. 6(a) shows the maximum benefit obtained by four algorithms with different spatial granularity. In this simulation, other parameters are assigned as t = 30, $B_c$ = 400 and n = 1000. We made two observations as follows: (1) As the number of squares increases, the maximum benefit benefited by the four algorithms decreases inversely. This is because the specified budget is predetermined. Therefore, when the number of squares increases, the proportion of the selected squares in the entire square decreases inversely, and the calculated spatial coverage also decreases. (2) In contrast to the other three algorithms, the incremental advantage obtained by HAS-AR reduces steadily as the spatial granularity increases. For example, when a square measures 300 × 300, the performance of HAS-AR is improved by approximately 35.8%, 32.8%, and 18.4%, respectively in contrast to PSO, HS, and SA. However, when the square measures 700 × 700, the increments are only 23.7%, 20.7%, and 11.1%, respectively. This is due to the fact that our objective function in the experimental setup is based on square numbers.

Different time granularity
Fig. 6(b) compares the maximum benefits obtained by four algorithms with various temporal levels of granularity. In this simulation, we increased the number of the period from 5 to 50. Other parameters are assigned default values. The execution results of various algorithms increase roughly linearly with an increase in time granularity. Also, the increments of the HAS-AR value are stable based on different temporal granularity compared to HS, SA, and PSO having values of about 12%, 28%, and 34%, respectively.

Different budget constraints
We compare the maximum benefit results obtained by four algorithms with different budget constraints. In this assessment, we increased the budget from 100 to 1,000. As shown in Fig. 6(c), HAS-SA achieved maximum benefits under different budget constraints. Specifically, the performance of HAS-SA is about 28%, 18%, and 13%, which are better than that of HS, SA, and PSO, respectively. It is not surprising that such results are obtained. The efficiency of the HSA-SA algorithm depends on the diversification and enhancement in the search process. Among them, HS has a strong exploration capability and provides a good starting point solution, and SA further explores the neighbors of the solution to improve the quality of the solution.

Fig. 5. The maximum benefit results obtained by four algorithms with different settings: (a) spatial differentiation granularity, (b) time granularity, (c) budget constraints, and (d) number of participants.


Different numbers of participants
We compared the performance of the four algorithms with different numbers of participants. In this evaluation, we gradually increased the budget from 100 to 3,000. As shown in Fig. 6(d), as the number of participating candidates increases, the total benefit obtained by various algorithms also increases, with their fitting curves also getting increasingly closer. This is because the greater the number of participants in the search region, the greater the number of participating candidates with high coverage, and thus a superior subset of participants may be located. It is noteworthy that the HAS-SA algorithm continues to outperform alternative baseline algorithms. In particular, average benefit of HSA-SA is 11%, 17%, and 18% higher than that of HS, SA, and PSO, respectively.


Conclusion

This article investigates the participant recruitment problem in mobile crowdsourcing, namely how to recruit a group of participants to maximize the total benefit of the crowdsourcing platform while staying within the budget. To address this problem, we first proposed a new scheme to recruit the optimal set of participants by jointly considering the current and future locations of participants from the perspective of predictable mobility. This scheme evaluated the service quality of participants by taking the spatial coverage as a key metric. To evaluate the benefit of the crowdsourcing platform, we also put forward multiple cost-benefit models with the spatial coverage as a driving factor. Secondly, we proved that participant recruitment is a NP-hard problem, and proposed a harmony search algorithm with annealed randomness to solve the problem. The HS phase demonstrates the anticipated tendency of exploring the search space in early iterations and utilizing good solutions in later iterations, and the SA phase utilizes a linear reduction of the cooling factor to progressively shift the attention from exploring the search space to developing potential search regions. Finally, extensive experimental evaluation on real and simulated datasets is performed, with the experimental results showing that our algorithm has good performance and wide applicability. The proposed method provides a new way to solve the problem of participant recruitment in mobile crowdsourcing.
In this work, we assumed that all participants' trajectories are predetermined. However, this assumption may be impractical in other cases. In the future, we will investigate scenarios where the participant's movement trajectory information is either insufficient or inaccurate.


Author’s Contributions

Conceptualization, JY, KW. Funding acquisition, JY. Writing of the original draft, JY. Writing of the review and editing, JY, DZ. Software, JY, CH, DZ. Formal Analysis, JY, DZ ,CH, KW.


Funding

This research was supported in part by the Humanities and Social Science Fund of Ministry of Education of China (Grant No. 21YJCZH197); the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi (Grant No. 2020L0252).


Competing Interests

The authors declare that they have no competing interests.


Author Biography

Author
Name:Jian Yang
Affiliation: School of information, Shanxi University of Finance and Economics
Biography: Jian Yang received the Ph.D. degree in School of Computer and Communication Engineering from the University of Science and Technology Beijing, China, in 2020. Dr. Yang is currently an assistant professor with the School of information, Shanxi University of Finance and Economics, Taiyuan, China. His current research interests include mobile computing, data management, data sharing/trading. He is a member of the IEEE and CCF.


Author
Name:Di Zhang
Affiliation: Modern Educational Technology Center, Jiangsu College of Nursing
Biography: Di Zhang received the master’s degree in school of Computer Science and Engineering from North Minzu University, Yinchuan, China, in 2015. He is currently a lecturer with the Modern Educational Technology Center, Jiangsu College of Nursing, Huaian, China. His current research interests include IoT, cloud computing, data managment.

Author
Name:Chunmei Hu
Affiliation: School of Cyber Science and Engineering, Qufu Normal University
Biography: Chunmei Hu received the master's degree in School of Computer and Communication Engineering from the University of Science and Technology Beijing, China, in 2017. She is currently a lecturer with the School of Cyber Science and Engineering, Qufu Normal University, Qufu, China. Her current research interests include machine learning, data management, data sharing/trading.

Author
Name:Kaixuan Wang
Affiliation: School of information, Shanxi University of Finance and Economics
Biography: Kaixuan Wang, received the M.S. degree in computer science from the University of Shanxi, China, and the Ph.D. degree in communication and information systems from the State Key Laboratory of Network and Switching at Beijing University of Posts and Telecommunications, China, in 2015. He is currently an assistant professor of management of networks, School of information, Shanxi University of Finance and Economics, Taiyuan, China. His research focuses on management for network communications, big data, and protocols and algorithms for smart grid applications. He has authored journal and conference papers about the communications.


References

[1] S. R. B. Gummidi, X. Xie, and T. B. Pedersen, “A survey of spatial crowdsourcing,” ACM Transactions on Database Systems, vol. 44, no. 2, pp. 1-46, 2019.
[2] R. Chen, L. Li, Y. Ma, Y. Gong, Y. Guo, T. Ohtsuki, and M. Pan, “Constructing mobile crowdsourced COVID-19 vulnerability map with geo-indistinguishability,” IEEE Internet of Things Journal, vol. 9, no. 18, pp. 17403-17416, 2022.
[3] G. Mehmood, M. Z. Khan, A. Waheed, M. Zareei, and E. M. Mohamed, “A trust-based energy-efficient and reliable communication scheme (trust-based ERCS) for remote patient monitoring in wireless body area networks,” IEEE Access, vol. 8, pp. 131397-131413, 2020.
[4] Y. Lai, Y. Xu, D. Mai, Y. Fan, and F. Yang, “Optimized large-scale road sensing through crowdsourced vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 4, pp. 3878-3889, 2022.
[5] Q. Jiang, D. Liu, H. Zhu, Y. Qiao, and B. Huang, “Quasi group role assignment with role awareness in self-service spatiotemporal crowdsourcing,” IEEE Transactions on Computational Social Systems, vol. 9, no. 5, pp. 1456-1468, 2022.
[6] J. Xu, Z. Luo, C. Guan, D. Yang, L. Liu, and Y. Zhang, “Hiring a team from social network: incentive mechanism design for two-tiered social mobile crowdsourcing,” IEEE Transactions on Mobile Computing, 2022. https://doi.org/10.1109/TMC.2022.3162108
[7] Y. Wang, Y. Gao, Y. Li, and X. Tong, “A worker-selection incentive mechanism for optimizing platform-centric mobile crowdsourcing systems,” Computer Networks, vol. 171, article no. 107144, 2022. https://doi.org/10.1016/j.comnet.2020.107144
[8] Z. Liu, K. Li, X. Zhou, N. Zhu, Y. Gao, and K. Li, “Multi-stage complex task assignment in spatial crowdsourcing,” Information Sciences, vol. 586, pp. 119-139, 2022.
[9] H. Li, T. Li, W. Wang, and T. Wang, “Dynamic participant selection for large-scale mobile crowd sensing,” IEEE Transactions on Mobile Computing, vol. 18, no. 12, pp. 2842-2855, 2019.
[10] J. Yang, C. Zhao, and C. Xing, “Big data market optimization pricing model based on data quality,” Complexity, vol. 2019, article no. 5964068, 2019. https://doi.org/10.1155/2019/5964068
[11] J. Yang and C. Xing, “Personal data market optimization pricing model based on privacy level,” Information, vol. 10, no. 4, article no. 123, 2019. https://doi.org/10.3390/info10040123
[12] H. Cai, Y. Zhu, and Z. Feng, “A truthful incentive mechanism for mobile crowd sensing with location-sensitive weighted tasks,” Computer Networks, vol. 132, pp. 1-14, 2018.
[13] X. Zhang, Z. Yang, Y. J. Gong, Y. Liu, and S. Tang, “SpatialRecruiter: maximizing sensing coverage in selecting workers for spatial crowdsourcing,” IEEE Transactions on Vehicular Technology, vol. 66, no. 6, pp. 5229-5240, 2017.
[14] N. Wang and J. Wu, “Cost-efficient heterogeneous worker recruitment under coverage requirement in spatial crowdsourcing,” IEEE Transactions on Big Data, vol. 7, no. 2, pp. 407-420, 2021.
[15] Y. Chen, P. Lv, D. Guo, T. Zhou, and M. Xu, “Trajectory segment selection with limited budget in mobile crowd sensing,” Pervasive and Mobile Computing, vol. 40, pp. 123-138, 2017.
[16] S. Liu, Y. Wang, J. Sun, and T. Mao, “An efficient spatial-temporal model based on gated linear units for trajectory prediction,” Neurocomputing, vol. 492, pp. 593-600, 2022.
[17] F. Zhou, Y. Dai, Q. Gao, P. Wang, and T. Zhong, “Self-supervised human mobility learning for next location prediction and trajectory classification,” Knowledge-Based Systems, vol. 228, article no. 107214, 2021. https://doi.org/10.1016/j.knosys.2021.107214
[18] S. Qiao, D. Shen, X. Wang, N. Han, and W. Zhu, “A self-adaptive parameter selection trajectory prediction approach via hidden Markov models,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 1, pp. 284-296, 2015.
[19] Y. Luo and N. R. Jennings, “A budget-limited mechanism for category-aware crowdsourcing of multiple-choice tasks,” Artificial Intelligence, vol. 299, article no. 103538, 2021. https://doi.org/10.1016/j.artint.2021.103538
[20] C. Miao, H. Yu, Z. Shen, and C. Leung, “Balancing quality and budget considerations in mobile crowdsourcing,” Decision Support Systems, vol. 90, pp. 56-64, 2016.
[21] J. Yang, X. Ban, and C. Xing, “Using greedy random adaptive procedure to solve the user selection problem in mobile crowdsourcing,” Sensors, vol. 19, no. 14, article no. 3158, 2019. https://doi.org/10.3390/s19143158
[22] J. Yang and C. Xing, “Data source selection based on an improved greedy genetic algorithm,” Symmetry, vol. 11, no. 2, article no. 273, 2019. https://doi.org/10.3390/sym11020273
[23] G. Lin, “An implementation of harmony search algorithm to the set covering problem,” in Proceedings of 2015 11th International Conference on Natural Computation (ICNC), Zhangjiajie, China, 2015, pp. 200-204.
[24] Z. W. Geem, J. H. Kim, and G. V. Loganathan, “A new heuristic optimization algorithm: harmony search,” Simulation, vol. 76, no.2, pp. 60-68, 2001.
[25] J. Skackauskas, T. Kalganova, I. Dear, and M. Janakiram, “Dynamic impact for ant colony optimization algorithm,” Swarm and Evolutionary Computation, vol. 69, article no. 100993, 2022. https://doi.org/10.1016/j.swevo.2021.100993
[26] J. Naranjo-Perez, M. Infantes, J. F. Jimenez-Alonso, and A. Saez, “A collaborative machine learning-optimization algorithm to improve the finite element model updating of civil engineering structures,” Engineering Structures, vol. 225, article no. 111327, 2020. https://doi.org/10.1016/j.engstruct.2020.111327
[27] A. Costa and V. Fernandez-Viagas, “A modified harmony search for the T-single machine scheduling problem with variable and flexible maintenance,” Expert Systems with Applications, vol. 198, article no. 116897, 2022. https://doi.org/10.1016/j.eswa.2022.116897
[28] M. Vahidi Farashah, A. Etebarian, R. Azmi, and R. Ebrahimzadeh Dastjerdi, “An analytics model for TelecoVAS customers’ basket clustering using ensemble learning approach,” Journal of Big Data, vol. 8, article no. 36, 2021. https://doi.org/10.1186/s40537-021-00421-1
[29] L. Wang, Y. Xu, Y. Mao, and M. Fei, “A discrete harmony search algorithm,” in Life System Modeling and Intelligent Computing. Heidelberg, Germany: Springer, 2010. pp. 37-43. https://doi.org/10.1007/978-3-642-15859-9_6
[30] A. Assad and K. Deep, “A hybrid harmony search and simulated annealing algorithm for continuous optimization,” Information Sciences, vol. 450, pp. 246-266, 2018.
[31] Y. Wu, Y. Zhu, and B. Li, “Infrastructure-assisted routing in vehicular networks,” in Proceedings IEEE INFOCOM, Orlando, FL, 2012, pp. 1485-1493.
[32] Y. Qiao, Z. Si, Y. Zhang, F. B. Abdesslem, X. Zhang, and J. Yang, “A hybrid Markov-based model for human mobility prediction,” Neurocomputing, vol. 278, pp. 99-109, 2018.
[33] E. H. Houssein, A. G. Gad, K. Hussain, and P. N. Suganthan, “Major advances in particle swarm optimization: theory, analysis, and application,” Swarm and Evolutionary Computation, vol. 63, article no. 100868, 2021. https://doi.org/10.1016/j.swevo.2021.100868

About this article
Cite this article

Jian Yang.1,*, Di Zhang2, ChunMei Hu3, and KaiXuan Wang1, A Spatial Coverage-Based Participant Recruitment Scheme for Mobile Crowdsourcing, Article number: 13:20 (2023) Cite this article 2 Accesses

Download citation
  • Received24 January 2022
  • Accepted10 May 2022
  • Published15 May 2023
Share this article

Anyone you share the following link with will be able to read this content:

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords