- HOMEHOME
- ABOUTAims and scope Editorial Board Contact
- ARTICLESCurrent Issue All Issue
- SUBMISSIONPreparing a Manuscript Peer-review Policy Peer-Review Process Article Processing Charges (APC) Submission Track Language and Editing Services Prepare Supporting Information Editorial Policies Copyright

- Jian Yang
^{.1,*}, Di Zhang^{2}, ChunMei Hu^{3}, and KaiXuan Wang^{1}

Human-centric Computing and Information Sciences volume **13**, Article number: 20 (2023)

Cite this article **2 Accesses**

https://doi.org/10.22967/HCIS.2023.13.020

Abstract

A revolutionary distributed problem-solving model is being enabled by the increasing proliferation of mobile smart devices, namely mobile crowdsourcing, which employs ubiquitous mobile users to gather and analyze data beyond what was traditionally possible, while becoming a leading area for study in both academia and industry. However, how to recruit participants to provide high quality data with a limited budget is challenging for a crowdsourcing system. In contrast to traditional fixed sensors, participants are the general population with smart devices, which boast the unique advantage of predictable mobility. To address this problem, this article first studies the relationship between predictable mobility and spatial coverage, and proposes a new strategy to recruit the optimal subset of participants by jointly considering their current and future locations with the aim of ensuring that the recruited participants can provide high coverage sensory data. Secondly, participant recruitment proves to be a NP-hard problem, while a harmony search algorithm with annealing randomness (HSA-AR) is proposed as a trade-off between the quality of problem-solving and the computation time. Finally, through extensive simulations utilizing real-world and synthetic datasets, we demonstrate that HSA-AR outperforms other baseline methods under various settings.

Keywords

Mobile Crowdsourcing, Participant Recruitment, Spatial Coverage, NP-Hard, HSA-AR

Introduction

With the pervasiveness of smart devices, mobile crowdsourcing [1, 2] has been touted by both academia and industry and become a sought-after topic in the mobile network. Mobile crowdsourcing is a new paradigm of data collection and sharing that employs mobile workers with smart devices to collect data from different locations in the physical world. Typically these smart devices are equipped with various sensors, such as GPS, cameras, accelerometers, Bluetooth, etc., which allow mobile workers to easily acquire basic data about various applications and upload data to a crowdsourcing system.

As shown in Fig. 1, a mobile crowdsourcing system is a new form of multi-agent system where two types of agents interact with each other, for example, a set of users distributed throughout a specific geographical region and a cloud-based crowdsourcing platform.

Mobile workers (participants) use smart devices equipped with various sensors for remote patient monitoring [3], road condition monitoring [4] and location-based services[5], etc., and send the sensory data to the crowdsourcing platform through a mobile network or Wi-Fi. In addition, participants will generate certain costs when performing tasks, namely mobile data usage, batteries, travel distance, time consumption, computing power, etc.

A crowdsourcing platform is the manager of the entire system. On the one hand, it is responsible for formulating effective incentive mechanisms [6, 7] and recruiting appropriate participants [8, 9] to maximize the utility of crowd work. On the other hand, it is responsible for data cleaning, desensitization, conversion, and exploring more interesting applications, including data analysis, visualization, and data trading [10, 11], etc.

As mentioned earlier, obtaining high quality sensory data is the goal of the crowdsourcing platform. However, it is unrealistic to recruit all potential participants due to limited budget constraints. The usual approach is to select a subset of potential participants who provide high quality data and maximize the utility of crowd work. Therefore, how to make a trade-off between a limited budget and high-quality sensory data is a challenge for the system.

According to recent research [[12–[14], the quality of mobile crowdsourcing is affected by its specific regions. An effective algorithm, specifically, should record participants' concealed mobility patterns and anticipate their future locations by exploiting their prior trajectories. In other words, collecting high quality data means that a participant's movement trajectory needs to cover as many areas as possible [15]. However, the location of participants will change at different times, so the mobile sensing mode is more dynamic. Fig. 2 illustrates an example of the trajectories of the four participants.

Next, effectively predicting the trajectory of the participants is a prerequisite for solving the problem of participant recruitment. Owing to the rapid development of information technology and research of relevant scholars, trajectory prediction can be realized in various ways, such as navigation [16], periodic movement [17], based on models [18] or simply uploaded by the participants themselves. In particular, we do not attempt to solve the problem of trajectory prediction, which is based on the basic assumption that the trajectories of the participants are predictable and known in advance.

Due to the mobility of participants, the crowdsourcing platform can obtain more comprehensive and substantial data resources with a lower budget. Existing participant recruitment strategies usually only consider a participant’s current location [19, 20], while generally speaking, a participant’s trajectory can cover multiple geographic areas. Especially for tasks that take a long time to complete, such as traffic condition monitoring, the collected sensory data is easily affected by the mobility of participants, and so it cannot be ignored.

In this paper, predictable mobility as an important evaluation factor is used to recruit participants with the aim of obtaining high quality data within a limited budget. In summary, our work contributes the following: we studied the relationship between predictable mobility and spatial coverage, and proposed a new strategy to recruit the optimal subset of participants by jointly considering their current and future locations, and the problem of participant recruitment is formally defined in Section 2. In Section 3, we prove that participant recruitment is a NP-hard problem, and propose efficient heuristic approaches, namely harmony search algorithm with annealing randomness (HSA-AR) as a trade-off between the quality of problem-solving and the computation time. In Section 4, through extensive simulations utilizing real-world and synthetic datasets, we demonstrate that HSA-AR outperforms other baseline methods under various settings. In Section 5, the conclusions and future research proposals are discussed.

Participant Recruitment Scheme Based on Spatial Coverage

First of all, we proposed the model of participant recruitment based on spatial coverage. Secondly, different cost-benefit models were constructed with the spatial coverage as the driving factor, which creates a foundation for follow-up algorithm evaluation. Finally, an example of participant recruitment in traffic monitoring was given to illustrate the workflow of the proposed approach.

**Problem Formulation**

Let’s assume that the crowdsourcing platform wants to obtain the monitoring data of urban traffic conditions, it usually specifies the number of geographical areas of interest within a limited time, so as to issue the sensory task in that area. We used R={$r_1$…,$r_i$…,$r_k$} to represent the area of interest. Considering that it is a sensory task, let’s suppose there are m participants interested in the task, which can be denoted as U={$u_1$,…,$u_i$,…,$u_m$}. Generally, the tasks published by the crowdsourcing system are time-sensitive; the participants were required to collect and send sensory data within a certain period of time T={$t_1$,…,$t_i$,…,$t_n$}, where t_i (1≤i≤n) is a specific moment; and the specific granularity should be modified to accommodate the challenges of different sensory tasks. In each period, a certain time τ∈[$t_i$,$t_{(i+1)}$ ] is sampled to indicate the sensing result in the time period. Then, the following matrix can be used to characterize the participants' trajectories throughout this period T.

**Definition 1 (Trajectory matrix). **The trajectory of participants in time T can be denoted as an m×n matrix as follows:

*(1)*

where, l_ij is the location of a participant u_i at time $t_j$, i∈[1,m] and j∈[1,n]. Each row indicates a participant's movement through a continuous time series, while each column represents the location of all participants at a specific point in time.

To motivate participants to complete their task, crowdsourcing platforms should provide the participants rewards, either in cash or non-cash form, which are part of the cost of a task.

*(2)*

*(3)*

To recruit the appropriate participants, it is necessary to estimate the quality of data collected by them. As mentioned earlier, the spatial coverage is an important factor, assuming that within a specific period, a participant can completely cover a geographical area. However, redundancy will be created by multiple participants in the same geographic area, a scenario that should be avoided.

*(4)*

where R is the total number of spatial regions, and w_j is the weight factor at the corresponding time, which is determined by the crowdsourcing platform.

*(5)*

where m is the maximum number of potential stakeholders in the crowdsourcing system, $x_i$ is a binary variable that is set to either 1 if a participant is recruited, or 0 if not so. s indicates the number of participants that have been employed by the system. Furthermore, the overall cost of all recruited participants is less than budget δ.

Spatial Coverage-based Cost-Benefit Model

To facilitate analysis and calculation, we unified the spatial coverage and corresponding costs of the participants into the same unit, and thus proposed multiple cost-benefit models [21, 22] driven by spatial coverage. In this section, we allocate a participant's cost in [5, 20] into four following types:

RandomCost generates a random integer in [5, 20];

LinearCost implies that the cost rises linearly with Cov(s) and set C(u)=15Cov(s)+5;

QuadCost implies that the cost rises quadratically with Cov(s) and set C(u)=15Co$v^2$ (s)+5;

StepCost implies that the cost satisfies the piecewise function denoted below as:

In addition, three benefit models are generated utilizing the spatial coverage as a significant heuristic component and described as follows.

LinearBenefit implies that the benefit rises linearly with Cov and set B(s) = 100Cov(s);

QuadBenefit implies that the benefit rises quadrantally with Cov and set B(s)=100Co$v^2$ (s);

StepBenefit implies that the benefit satisfies the piecewise function denoted below as:

To demonstrate the workflow of the proposed methodology, an example of participant recruitment in traffic condition monitoring is presented in Fig. 3.

Let’s suppose the crowdsourcing platform wants to monitor traffic conditions in a city by collecting traffic information from mobile users and is interested in traffic conditions in four areas in the next 30 minutes. Fig. 3(a) shows the topology of these four regions, assuming that three participating candidates want to complete the task and report their current GPS locations as {A,B,C}, which is shown as $t_1$ in Fig. 3(b). It should be noted the crowdsourcing platform’s budget allows for only two participants to be recruited.

The traffic conditions will be different at different times. If the crowdsourcing platform wants to monitor more information over different time periods {$t_1$,$t_2$,$t_3$,$t_4$,$t_5$}, the spatial coverage of the participants needs to be calculated. Let’s take u_1 and u_2 participants as examples. For simplicity, it is assumed that each participant is ascribed the same cost. The various cost models proposed in Section 2.2 can be used to evaluate their impact on participant recruitment.

Fig. 3(c) shows the trajectories of $u_1$ and $u_2$ at different times. We observed that except for monitoring two different regions at $t_1$, $t_4$, $t_5$, only one region was monitored during the remaining two time periods, because both participants were in the same region and the collected traffic conditions were duplicated. In addition, the crowdsourcing platform may pay specific attention to traffic conditions at different times, so different weights w_i can be set for spatial coverage at different times, with the sum of the weights (n is the number of divided periods, as shown in Figure 3(d), n=5). For simplicity, we set $w_1$=$w_2$=$w_3$=$w_4$=$w_5$=0.2. According to Definition 5, the spatial coverage of {$u_1$,$u_2$} can be calculated as Similarly, we can calculate the spatial coverage of {$u_1$,$u_34 } is 0.45, {$u_2$,$u_3$} as 0.5. To maximize the benefit of the crowdsourcing platform, the combination of {$u_2$,$u_3$} is an optimal participant recruitment scheme.

Solution Approach

**Complexity Analysis of SCPR Problem**

Theorem 1. Definition 5 states that the SCPR problem is NP-hard.
Proof. To illustrate that SCPR is a NP-hard problem, we demonstrated that its decision version is NP-complete. It should be demonstrated that the decision-making problem belongs to NP, and then selected another known NPC problem that can be converted in polynomial time to be reduced into a decision version of SCPR.

We considered the minimum weighted set coverage problem [23] as an example of a NPC problem, which is described as follows: Give a set of elements U={$u_1$,$u_2$,…,$u_m$}, a collections S={$s_1$,$s_2$,…,$s_n$ } of subsets of U, each $s_i$ satisfies $s_i$⊆U, and thus give weight w($s_i$ )>0 . The weighted minimum coverage problem is to find a subset of a collection of subset Q⊆S, so that is the set coverage of the element set U and requires This instance is denoted as X in the following discussion.

Then we transferred instance X to an instance of SCPR. Let’s assume that participant is the set of elements U, and the subset of participants is considered as the element $s_i$ in set S. Meanwhile, the cost of each participant is the corresponding weight w($s_i$ ) of the element $s_i$. Thus, we can construct a set ¯S={$s_1$,$s_2$,…s_n,r}, r⊂U, w(r)>k, while r can not be selected at the same time as each $s_i$. In this way, we can get an instance of a participant recruitment problem denoted by Y.

Thus, q is a solution to X, if and only if q is a solution to Y. Furthermore, the reduction from X to Y requires polynomial time, meaning that the SCPR is a NP-hard problem. ◻

**Harmony Search Algorithm with Annealing Randomness**

As previously demonstrated, participant recruitment is a NP-hard problem. The number of decision variables increases exponentially with an increase in the number of participants. Even if computational resources are endlessly improved, the exact solution may not be accomplished in a reasonable period. As a result, we introduced a harmonic search algorithm with annealing randomness as a trade-off between computing overhead and solution quality.

Harmony search (HS) is an emerging intelligent optimization algorithm proposed by Geem et al. [24], a Korean scholar, in 2001. It is similar to a genetic algorithm’s imitation of biological evolution [22], ant colony optimization algorithm’s imitation of ant foraging [25], etc. HS is a new heuristic search method that mimics the process by which musicians constantly alter the pitch of their instruments to generate the most beautifully harmonic result. The algorithm is well-applied in addressing various NP-hard problems and has been widely used in many fields such as civil engineering structure design [26], single machine scheduling problems [27], soil stability analysis [28], and so forth.

The pseudocode for HS is shown in Algorithm 1. Generally, HS consists of the five stages as follows.

**Step 1 (Initialization parameters): **Initialize the key parameters affecting algorithm performance, which are described below:

Harmony memory size (HMS): The size of the harmonic population.

Harmony memory consideration rate (HMCR): The probability of taking a harmony from an existing population (HM harmony library).

Pitch-adjusting rate (PAR): A proportion of the decision factors that will be fine-tuned by altering their pitches.

The number of improvisations (NI): The total number of iterations.

*(6)*

*(7)*

*(8)*

However, if the initially picked harmonics are near the local optimal, the harmony search may suffer from a serious limitation of slipping into the local optimal. To solve this issue, we proposed a simulated annealing approach [30] that increases the accuracy and robustness of the harmony search. The effectiveness of the simulated annealing algorithm is that it can generate sub-optimal solutions within a certain probability, which depends on the synthesis temperature in the metallurgical annealing process, and can gradually reduce the probability by decreasing the temperature parameters. Therefore, SA can explore potential solutions in a larger region, thereby improving the exploration ability of the algorithm and finding the global optimal solution.

Algorithm 2 shows the pseudocode of simulated annealing (SA). The algorithm begins with the HS generation models, while the temperature parameter is set to the baseline value $T_0$. The next procedures are then executed until the ending threshold is reached. Firstly, the neighbor function is applied to construct a new solution $s^{new}$ for "s" by modifying the value of one or more components, and then the fitness difference between the two solutions is calculated as Δ. Let’s consider the candidate solution as the current solution, and update the best solution if it is better. Alternatively, it is recognized as the new current solution with a probability of exp (-Δ/αT), which is determined by its fitness and current temperature T. Finally, the temperature steadily drops with the cooling factor, decreasing the possibility of accepting poor inferior options.

We proposed that the participant recruitment strategy should begin with a harmony search and then move on to simulated annealing. The harmony search algorithm is performed by initializing a harmony memory, with a new solution obtained from the HM. All components of the solution are based on the HMCR, while improvisation is performed with the HMCR parameter and PAR. Then choose the current best harmony vector as a candidate solution. Next, the SA is applied, where not only the optimal harmony is always accepted, but also the probability of the suboptimal harmony being accepted is determined by the fitness of the new harmony and existing temperature. The proposed harmony search with annealing randomness is illustrated in Algorithm 3.

Experimental Study

In this section, through extensive simulations utilizing real-world and synthetic datasets, we demonstrate that HSA-AR outperforms the three baseline methods under various settings. Our experiments are coded in Python under a standard server (Windows), along with an Intel Xeon Gold 5320H processor and 64 GB of main memory.

**Experimental Setting**

4.1.1 Datasets

**Real-world data:** We have used a real dataset [31] to evaluate the performance of the proposed algorithm. The dataset contains the road data of 4,000 taxis in Shanghai within 24 hours. The sampling interval of taxi-based GPS data is 1 minute and composed of the vehicle ID, time stamp, latitude, longitude, speed, etc.

To facilitate the simulation, we preprocessed the dataset as follows: (1) We randomly selected 1,000 taxis as participants in the original dataset; (2) The simulation range was limited to a rectangular area of 10 km × 10 km, which was divided into 10,000 grid cells of similar size. Also, the GPS location of each vehicle was mapped to these grid cells; (3) We divided 24 hours a day into 48-time windows with 30 minutes as the minimum time granularity, denoted as T={$t_1$,$t_2$,⋯,$t_48$}, and the location of each vehicle at each time window was recorded. (4) We randomly selected some famous buildings or tourist landmarks as the sensory task locations. The parameters of the real-world dataset are given in Table 1.

**Table 1.**Parameter setting for the real-world data

Parameter | Description | Value |

n | Number of participants | 1000 |

𝑅 | Range constraints for region | 10 km × 10 km |

g | Number of grids | 10,000 |

$T_g$ | Time granularity | 30 minutes |

Parameter | Name | Range of values | Default value |

n | Number of participants | {100, 500, 1000, 1500, 2000,2500,3000} | 1,000 |

𝑅 | Range constraints for region | {10, 100, 300, 500, 700, 900} | 300 |

g | Time granularity | {5, 10, 20, 30, 40, 50} | 30 |

$T_g$ | Budget constraint | {100, 200, 400, 600, 800, 1000} | 400 |

Baseline algorithms

The three baseline algorithms have been introduced to test the performance of the proposed approach. One is particle swarm optimization (PSO) algorithm as shown in [33]. PSO is a kind of population-based stochastic optimization technology, which imitates the swarm predation behavior of insects, birds, and fish. The essential thought is to identify the best solution by collaborating and exchanging information among members of the target population. The algorithm is widely utilized in a variety of fields, including function optimization, neural network training, fuzzy system control, etc. In the experiment, the swarm size was set to 30; the inertia weight was altered linearly between 0.9 and 0.4; and the acceleration coefficients were set at 1.5. In addition, the other baseline algorithms are two simplified versions of HSA-AR, namely HS and SA.

Furthermore, we altered parameters independently and incrementally, so when one was tuned, the others were reset to their default settings. While the next parameters were being explored, the preceding ones were given the best value determined so far. The parameters and default values of the HSA-AR algorithm are indicated in Table 3.

Parameter | Name | Range of values | Default value |

HMS | Harmony memory size | {5, 10, 15, 20, 25, 30} | 10 |

HMCR | Harmony memory consideration rate | {0.91, 0.93, 0.95, 0.97, 0.99} | 0.99 |

PAR | Pitch adjusting rate | {5, 10, 20, 30, 40, 50} | 0.3 |

$N_i$ | Number of iterations | {100, 200, 400, 600, 800, 1000} | 150 |

a | Cooling rete factor | {0.990, 0.995, 0.999, 0.9995, 0.9999} | 0.995 |

T | Initial temperature | {200, 400, 600, 800, 1000} | 400 |

$T_low$ | Termination temperature | 0.1 | 0.1 |

It is difficult to establish and quantify the global optimal solution vector of a restricted optimization problem, making a trade-off between constraints and objective function value problematic. In this paper, we penalize infeasible solutions depending on their remoteness from the feasible region. As commonly known, the maximization of f(x) can be converted to the minimization of -f(x), and then according to Definition 5, the corresponding penalty function is defined as: min SC(s)= -Cov(s)+ α×max(0,β), where

is the penalty coefficient, which is set as $10^{10}$. In addition, we set δ=B(1)/2 (B(1) corresponds to the maximum benefit).

To evaluate the performance of various algorithms, we applied the cost-benefit models proposed in Section 2.2 to the real-world dataset and randomly combined three benefit models and four cost models to generate a total of 12 different instances. In our experiments, each algorithm is tested utilizing 5 metrics. The "best," "median," and "worst" cases are selected from the total results of 30 independent runs, with the "mean," "st.d" being used to estimate the stability of all algorithms.

The execution results of four algorithms are shown in Table 4. Firstly, we observed that HSA-AR, HS, SA, and PSO obtained the best values of 7, 3, 2, and 0 in 12 instances, respectively. Therefore, HSA-AR achieved the highest share of best values at around 58.33%. Secondly, the solution obtained by using the StepBenefit model is poor, which may be caused by the benefit discontinuity with an increase of the spatial coverage, which is consistent with the results in [21]. Moreover, we used a gap measure to test the statistical characteristics of the average performance of each algorithm. A gap is a key concept for evaluating the quality of the obtained results, and is defined as the percentage of the difference between the values of the mean value and best-known value.

*(9)*

Secondly, a gap can be used to estimate the average performance of each approach. In other words, the closer a gap is to the x-axis, the superior the algorithm's average performance. As shown in Fig. 4, the average performance of HSA-AR is better than that of HS, SA, or PSO. Thus, it is the best among all algorithms, with HS and SA as the next best and PSO the least.

Finally, we can generate a histogram based on the value of "st.d" and use the distribution of columns to test the robustness of all algorithms. Fig. 5 shows that the stability of HAS-AR is significantly greater than the other three methods.

No. | Benefit | Cost | Algorithm | Best | Mean | Worst | St.d |

1 | Linear | Random | HSA-AR | 66,296 | 63,185 | 60,962 | 123.29 |

PSO | 65,578 | 59,610 | 53,182 | 312.63 | |||

HS | 64,732 | 62,136 | 55.221 | 254.21 | |||

SA | 64,916 | 62,515 | 58,527 | 198.57 | |||

2 | Linear | HSA-AR | 69,485 | 67,192 | 63,204 | 133.15 | |

PSO | 64,548 | 60,231 | 58,863 | 302.39 | |||

HS | 65,984 | 63,236 | 59,105 | 235.27 | |||

SA | 65,122 | 64,572 | 58,332 | 268.53 | |||

3 | Quad | HSA-AR | 74,218 | 71,125 | 68,742 | 112.56 | |

PSO | 73,406 | 64,229 | 57,221 | 327.37 | |||

HS | 71,256 | 67,773 | 61.968 | 244.22 | |||

SA | 72,284 | 66,925 | 61,827 | 224.36 | |||

4 | Step | HSA-AR | 70,245 | 67,932 | 63,548 | 126.37 | |

PSO | 66,113 | 62,231 | 59,169 | 278.16 | |||

HS | 70,436 | 65,552 | 61,051 | 295.14 | |||

SA | 67,938 | 62,818 | 58,862 | 248.32 | |||

5 | Quad | Random | HSA-AR | 35,523 | 33,153 | 30,258 | 145.81 |

PSO | 33,361 | 29,827 | 26,886 | 298.54 | |||

HS | 34,252 | 32,331 | 29,319 | 198.21 | |||

SA | 36,228 | 31,241 | 28,433 | 340.13 | |||

6 | Linear | HSA-AR | 36,192 | 34,862 | 32,824 | 110.85 | |

PSO | 34,521 | 31,258 | 27,011 | 242.1 | |||

HS | 36,339 | 33,122 | 31,927 | 144.85 | |||

SA | 35,656 | 32,521 | 29,616 | 168.22 | |||

7 | Quad | HSA-AR | 37,768 | 35,395 | 33,654 | 107.51 | |

PSO | 35,128 | 32,746 | 28,221 | 198.22 | |||

HS | 36,153 | 33,712 | 31.068 | 184.76 | |||

SA | 35,994 | 33,475 | 30,327 | 194.58 | |||

8 | Step | HSA-AR | 36,045 | 34,847 | 32,044 | 116.15 | |

PSO | 33,824 | 30,723 | 28,863 | 233.06 | |||

HS | 34,080 | 32,451 | 30,058 | 165.54 | |||

SA | 33,993 | 31,418 | 28,862 | 188.32 | |||

9 | Step | Random | HSA-AR | 95,724 | 85,848 | 67,451 | 217.41 |

PSO | 84,236 | 68,945 | 60,032 | 478.23 | |||

HS | 89,342 | 72,628 | 65,396 | 422.55 | |||

SA | 96,156 | 74,863 | 64,150 | 462.25 | |||

10 | Linear | HSA-AR | 104,124 | 96,547 | 73,113 | 252.18 | |

PSO | 85,423 | 76,326 | 67,916 | 378.87 | |||

HS | 95,125 | 83,541 | 70,284 | 258.32 | |||

SA | 93,251 | 80,877 | 68,153 | 436.08 | |||

11 | Quad | HSA-AR | 115,768 | 97,395 | 83,742 | 282.82 | |

PSO | 99,442 | 83,946 | 75,221 | 337.2 | |||

HS | 104,161 | 86,411 | 78,968 | 314.71 | |||

SA | 100,562 | 87,752 | 73,827 | 398.28 | |||

12 | Step | HSA-AR | 101,245 | 92,847 | 84,328 | 246.15 | |

PSO | 92,824 | 84,723 | 78,863 | 328.16 | |||

HS | 93,080 | 88,451 | 82,054 | 295.14 | |||

SA | 92,993 | 87,418 | 80,821 | 278.35 |

Performance evaluation of all algorithms using synthetic datasets

We applied LinearBenefit and LinearCost models on the synthetic dataset to compare the performance of four algorithms with different settings. Our model has some configurable parameters, namely the granularity of spatial and time, the number of participants, and budget constraints, which can be altered to meet the needs of specific applications. We analyzed the results of all algorithms in this work by adjusting these parameters, according to the values indicated in Table 2.

Different spatial differentiation granularity

Fig. 6(a) shows the maximum benefit obtained by four algorithms with different spatial granularity. In this simulation, other parameters are assigned as t = 30, $B_c$ = 400 and n = 1000. We made two observations as follows: (1) As the number of squares increases, the maximum benefit benefited by the four algorithms decreases inversely. This is because the specified budget is predetermined. Therefore, when the number of squares increases, the proportion of the selected squares in the entire square decreases inversely, and the calculated spatial coverage also decreases. (2) In contrast to the other three algorithms, the incremental advantage obtained by HAS-AR reduces steadily as the spatial granularity increases. For example, when a square measures 300 × 300, the performance of HAS-AR is improved by approximately 35.8%, 32.8%, and 18.4%, respectively in contrast to PSO, HS, and SA. However, when the square measures 700 × 700, the increments are only 23.7%, 20.7%, and 11.1%, respectively. This is due to the fact that our objective function in the experimental setup is based on square numbers.

Different time granularity

Fig. 6(b) compares the maximum benefits obtained by four algorithms with various temporal levels of granularity. In this simulation, we increased the number of the period from 5 to 50. Other parameters are assigned default values. The execution results of various algorithms increase roughly linearly with an increase in time granularity. Also, the increments of the HAS-AR value are stable based on different temporal granularity compared to HS, SA, and PSO having values of about 12%, 28%, and 34%, respectively.

Different budget constraints

We compare the maximum benefit results obtained by four algorithms with different budget constraints. In this assessment, we increased the budget from 100 to 1,000. As shown in Fig. 6(c), HAS-SA achieved maximum benefits under different budget constraints. Specifically, the performance of HAS-SA is about 28%, 18%, and 13%, which are better than that of HS, SA, and PSO, respectively. It is not surprising that such results are obtained. The efficiency of the HSA-SA algorithm depends on the diversification and enhancement in the search process. Among them, HS has a strong exploration capability and provides a good starting point solution, and SA further explores the neighbors of the solution to improve the quality of the solution.

We compared the performance of the four algorithms with different numbers of participants. In this evaluation, we gradually increased the budget from 100 to 3,000. As shown in Fig. 6(d), as the number of participating candidates increases, the total benefit obtained by various algorithms also increases, with their fitting curves also getting increasingly closer. This is because the greater the number of participants in the search region, the greater the number of participating candidates with high coverage, and thus a superior subset of participants may be located. It is noteworthy that the HAS-SA algorithm continues to outperform alternative baseline algorithms. In particular, average benefit of HSA-SA is 11%, 17%, and 18% higher than that of HS, SA, and PSO, respectively.

Conclusion

This article investigates the participant recruitment problem in mobile crowdsourcing, namely how to recruit a group of participants to maximize the total benefit of the crowdsourcing platform while staying within the budget. To address this problem, we first proposed a new scheme to recruit the optimal set of participants by jointly considering the current and future locations of participants from the perspective of predictable mobility. This scheme evaluated the service quality of participants by taking the spatial coverage as a key metric. To evaluate the benefit of the crowdsourcing platform, we also put forward multiple cost-benefit models with the spatial coverage as a driving factor. Secondly, we proved that participant recruitment is a NP-hard problem, and proposed a harmony search algorithm with annealed randomness to solve the problem. The HS phase demonstrates the anticipated tendency of exploring the search space in early iterations and utilizing good solutions in later iterations, and the SA phase utilizes a linear reduction of the cooling factor to progressively shift the attention from exploring the search space to developing potential search regions. Finally, extensive experimental evaluation on real and simulated datasets is performed, with the experimental results showing that our algorithm has good performance and wide applicability. The proposed method provides a new way to solve the problem of participant recruitment in mobile crowdsourcing.

In this work, we assumed that all participants' trajectories are predetermined. However, this assumption may be impractical in other cases. In the future, we will investigate scenarios where the participant's movement trajectory information is either insufficient or inaccurate.

Author’s Contributions

Conceptualization, JY, KW. Funding acquisition, JY. Writing of the original draft, JY. Writing of the review and editing, JY, DZ. Software, JY, CH, DZ. Formal Analysis, JY, DZ ,CH, KW.

Funding

This research was supported in part by the Humanities and Social Science Fund of Ministry of Education of China (Grant No. 21YJCZH197); the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi (Grant No. 2020L0252).

Competing Interests

The authors declare that they have no competing interests.

Author Biography

**Name:**Jian Yang

**Affiliation: **School of information, Shanxi University of Finance and Economics

**Biography: **Jian Yang received the Ph.D. degree in School of Computer and Communication Engineering from the University of Science and Technology Beijing, China, in 2020. Dr. Yang is currently an assistant professor with the School of information, Shanxi University of Finance and Economics, Taiyuan, China. His current research interests include mobile computing, data management, data sharing/trading. He is a member of the IEEE and CCF.

**Name:**Di Zhang

**Affiliation: **Modern Educational Technology Center, Jiangsu College of Nursing

**Biography: **Di Zhang received the master’s degree in school of Computer Science and Engineering from North Minzu University, Yinchuan, China, in 2015. He is currently a lecturer with the Modern Educational Technology Center, Jiangsu College of Nursing, Huaian, China. His current research interests include IoT, cloud computing, data managment.

**Name:**Chunmei Hu

**Affiliation: **School of Cyber Science and Engineering, Qufu Normal University

**Biography: **Chunmei Hu received the master's degree in School of Computer and Communication Engineering from the University of Science and Technology Beijing, China, in 2017. She is currently a lecturer with the School of Cyber Science and Engineering, Qufu Normal University, Qufu, China. Her current research interests include machine learning, data management, data sharing/trading.

**Name:**Kaixuan Wang

**Affiliation: **School of information, Shanxi University of Finance and Economics

**Biography: **Kaixuan Wang, received the M.S. degree in computer science from the University of Shanxi, China, and the Ph.D. degree in communication and information systems from the State Key Laboratory of Network and Switching at Beijing University of Posts and Telecommunications, China, in 2015. He is currently an assistant professor of management of networks, School of information, Shanxi University of Finance and Economics, Taiyuan, China. His research focuses on management for network communications, big data, and protocols and algorithms for smart grid applications. He has authored journal and conference papers about the communications.

References

[1]
S. R. B. Gummidi, X. Xie, and T. B. Pedersen, “A survey of spatial crowdsourcing,” ACM Transactions on Database Systems, vol. 44, no. 2, pp. 1-46, 2019.

[2]
R. Chen, L. Li, Y. Ma, Y. Gong, Y. Guo, T. Ohtsuki, and M. Pan, “Constructing mobile crowdsourced COVID-19 vulnerability map with geo-indistinguishability,” IEEE Internet of Things Journal, vol. 9, no. 18, pp. 17403-17416, 2022.

[3]
G. Mehmood, M. Z. Khan, A. Waheed, M. Zareei, and E. M. Mohamed, “A trust-based energy-efficient and reliable communication scheme (trust-based ERCS) for remote patient monitoring in wireless body area networks,” IEEE Access, vol. 8, pp. 131397-131413, 2020.

[4]
Y. Lai, Y. Xu, D. Mai, Y. Fan, and F. Yang, “Optimized large-scale road sensing through crowdsourced vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 4, pp. 3878-3889, 2022.

[5]
Q. Jiang, D. Liu, H. Zhu, Y. Qiao, and B. Huang, “Quasi group role assignment with role awareness in self-service spatiotemporal crowdsourcing,” IEEE Transactions on Computational Social Systems, vol. 9, no. 5, pp. 1456-1468, 2022.

[6]
J. Xu, Z. Luo, C. Guan, D. Yang, L. Liu, and Y. Zhang, “Hiring a team from social network: incentive mechanism design for two-tiered social mobile crowdsourcing,” IEEE Transactions on Mobile Computing, 2022.
https://doi.org/10.1109/TMC.2022.3162108

[7]
Y. Wang, Y. Gao, Y. Li, and X. Tong, “A worker-selection incentive mechanism for optimizing platform-centric mobile crowdsourcing systems,” Computer Networks, vol. 171, article no. 107144, 2022.
https://doi.org/10.1016/j.comnet.2020.107144

[8]
Z. Liu, K. Li, X. Zhou, N. Zhu, Y. Gao, and K. Li, “Multi-stage complex task assignment in spatial crowdsourcing,” Information Sciences, vol. 586, pp. 119-139, 2022.

[9]
H. Li, T. Li, W. Wang, and T. Wang, “Dynamic participant selection for large-scale mobile crowd sensing,” IEEE Transactions on Mobile Computing, vol. 18, no. 12, pp. 2842-2855, 2019.

[10]
J. Yang, C. Zhao, and C. Xing, “Big data market optimization pricing model based on data quality,” Complexity, vol. 2019, article no. 5964068, 2019.
https://doi.org/10.1155/2019/5964068

[11]
J. Yang and C. Xing, “Personal data market optimization pricing model based on privacy level,” Information, vol. 10, no. 4, article no. 123, 2019.
https://doi.org/10.3390/info10040123

[12]
H. Cai, Y. Zhu, and Z. Feng, “A truthful incentive mechanism for mobile crowd sensing with location-sensitive weighted tasks,” Computer Networks, vol. 132, pp. 1-14, 2018.

[13]
X. Zhang, Z. Yang, Y. J. Gong, Y. Liu, and S. Tang, “SpatialRecruiter: maximizing sensing coverage in selecting workers for spatial crowdsourcing,” IEEE Transactions on Vehicular Technology, vol. 66, no. 6, pp. 5229-5240, 2017.

[14]
N. Wang and J. Wu, “Cost-efficient heterogeneous worker recruitment under coverage requirement in spatial crowdsourcing,” IEEE Transactions on Big Data, vol. 7, no. 2, pp. 407-420, 2021.

[15]
Y. Chen, P. Lv, D. Guo, T. Zhou, and M. Xu, “Trajectory segment selection with limited budget in mobile crowd sensing,” Pervasive and Mobile Computing, vol. 40, pp. 123-138, 2017.

[16]
S. Liu, Y. Wang, J. Sun, and T. Mao, “An efficient spatial-temporal model based on gated linear units for trajectory prediction,” Neurocomputing, vol. 492, pp. 593-600, 2022.

[17]
F. Zhou, Y. Dai, Q. Gao, P. Wang, and T. Zhong, “Self-supervised human mobility learning for next location prediction and trajectory classification,” Knowledge-Based Systems, vol. 228, article no. 107214, 2021.
https://doi.org/10.1016/j.knosys.2021.107214

[18]
S. Qiao, D. Shen, X. Wang, N. Han, and W. Zhu, “A self-adaptive parameter selection trajectory prediction approach via hidden Markov models,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 1, pp. 284-296, 2015.

[19]
Y. Luo and N. R. Jennings, “A budget-limited mechanism for category-aware crowdsourcing of multiple-choice tasks,” Artificial Intelligence, vol. 299, article no. 103538, 2021.
https://doi.org/10.1016/j.artint.2021.103538

[20]
C. Miao, H. Yu, Z. Shen, and C. Leung, “Balancing quality and budget considerations in mobile crowdsourcing,” Decision Support Systems, vol. 90, pp. 56-64, 2016.

[21]
J. Yang, X. Ban, and C. Xing, “Using greedy random adaptive procedure to solve the user selection problem in mobile crowdsourcing,” Sensors, vol. 19, no. 14, article no. 3158, 2019. https://doi.org/10.3390/s19143158

[22]
J. Yang and C. Xing, “Data source selection based on an improved greedy genetic algorithm,” Symmetry, vol. 11, no. 2, article no. 273, 2019. https://doi.org/10.3390/sym11020273

[23]
G. Lin, “An implementation of harmony search algorithm to the set covering problem,” in Proceedings of 2015 11th International Conference on Natural Computation (ICNC), Zhangjiajie, China, 2015, pp. 200-204.

[24]
Z. W. Geem, J. H. Kim, and G. V. Loganathan, “A new heuristic optimization algorithm: harmony search,” Simulation, vol. 76, no.2, pp. 60-68, 2001.

[25]
J. Skackauskas, T. Kalganova, I. Dear, and M. Janakiram, “Dynamic impact for ant colony optimization algorithm,” Swarm and Evolutionary Computation, vol. 69, article no. 100993, 2022.
https://doi.org/10.1016/j.swevo.2021.100993

[26]
J. Naranjo-Perez, M. Infantes, J. F. Jimenez-Alonso, and A. Saez, “A collaborative machine learning-optimization algorithm to improve the finite element model updating of civil engineering structures,” Engineering Structures, vol. 225, article no. 111327, 2020.
https://doi.org/10.1016/j.engstruct.2020.111327

[27]
A. Costa and V. Fernandez-Viagas, “A modified harmony search for the T-single machine scheduling problem with variable and flexible maintenance,” Expert Systems with Applications, vol. 198, article no. 116897, 2022.
https://doi.org/10.1016/j.eswa.2022.116897

[28]
M. Vahidi Farashah, A. Etebarian, R. Azmi, and R. Ebrahimzadeh Dastjerdi, “An analytics model for TelecoVAS customers’ basket clustering using ensemble learning approach,” Journal of Big Data, vol. 8, article no. 36, 2021.
https://doi.org/10.1186/s40537-021-00421-1

[29]
L. Wang, Y. Xu, Y. Mao, and M. Fei, “A discrete harmony search algorithm,” in Life System Modeling and Intelligent Computing. Heidelberg, Germany: Springer, 2010. pp. 37-43.
https://doi.org/10.1007/978-3-642-15859-9_6

[30]
A. Assad and K. Deep, “A hybrid harmony search and simulated annealing algorithm for continuous optimization,” Information Sciences, vol. 450, pp. 246-266, 2018.

[31]
Y. Wu, Y. Zhu, and B. Li, “Infrastructure-assisted routing in vehicular networks,” in Proceedings IEEE INFOCOM, Orlando, FL, 2012, pp. 1485-1493.

[32]
Y. Qiao, Z. Si, Y. Zhang, F. B. Abdesslem, X. Zhang, and J. Yang, “A hybrid Markov-based model for human mobility prediction,” Neurocomputing, vol. 278, pp. 99-109, 2018.

[33]
E. H. Houssein, A. G. Gad, K. Hussain, and P. N. Suganthan, “Major advances in particle swarm optimization: theory, analysis, and application,” Swarm and Evolutionary Computation, vol. 63, article no. 100868, 2021.
https://doi.org/10.1016/j.swevo.2021.100868

About this article

Cite this article

Jian Yang^{.1,*}, Di Zhang^{2}, ChunMei Hu^{3}, and KaiXuan Wang^{1}, A Spatial Coverage-Based Participant Recruitment Scheme for Mobile Crowdsourcing, Article number: 13:20 (2023) Cite this article 2 Accesses

**Received**24 January 2022**Accepted**10 May 2022**Published**15 May 2023

Share this article

Anyone you share the following link with will be able to read this content:

Get shareable link

http://hcisj.com/articles/?HCIS202313020

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords