홈으로ArticlesAll Issue
ArticlesTargeted Advertising That Protects the Privacy of SocialNetworks Users
  • Carlo Blundo1, Carmen De Maio2,*, Mimmo Parente1, and Luisa Siniscalchi3

Human-centric Computing and Information Sciences volume 11, Article number: 18 (2021)
Cite this article 3 Accesses
https://doi.org/10.22967/HCIS.2021.11.018

Abstract

Nowadays, the massive use of social media provides useful unstructured knowledge that can be used to enhance the efficacy of online brand marketing campaigns. The unstructured nature of social media content and the relevance of the contextual dimension, like time, stress the requirements for extracting users’ interests during the timeline. However, user profiling could have some unpleasant consequences for users’ privacy, thus raising the need to define methodologies capable of avoiding privacy leaks despite the exploitation of interactions over social media. This paper presents both an intelligent method of profiling social media users and a privacy protection technique that isdesigned to match users’ profiles and advertisements, and which could be used by advertising agencies. The proposed method performssemantic data analysis for extracting representations of the contents of messages exchanged by users over social media (e.g.,tweets), by exploiting rough set theory. In this way, users’ interestis obtainedby mining their daily online activity. The proposed framework investigates two-party scenarios, i.e., scenarios composed of a social network owner and an advertising agency willing to promote its client’s products through the social network. This paper presentsthree privacy-preserving matching protocols which enable targeted advertising without compromising the privacy ofeither the users or the advertisers. Starting from a recently proposed advertisement matching protocol, a private layer was addedto ensure that any sensitive information of either party is kept private. In this way, the social network and the advertiser could benefit from a system which allows them to run a matching protocol with the guarantee that sensitive user data (for the social network) and business information (for the advertiser) will not be disclosed. The first two protocols require interaction between the Advertiser and the Online Social Network, while the third one outsources to a semi-trusted service provider some of the computation done during the execution of the advertisement matching. The experimental results are also presented to illustrate the proposed system's good performance to discover potentially interested users given an advertisement as input.


Keywords

Social Media Analysis, Rough Set, Advertisement Recommendation, Private Targeted Advertisement


Introduction

Recent social media statistics relating to consumer adoption and usage show that social media diffusion is growing continuously. As of 2019, the number of social media users worldwide stood at 3.484 billion,representing an increase of 9% year-on-year [1]. Social media are increasingly being used as a customer service platform where customers and potential customers can obtain answers quickly, in realtime, and which is “free”overall. However, the free contents of the web and other related services are all based on the advertising's business conveyed to the users. Until a few years ago, advertising-based services were based on the one-size-fits-all formula essentially tailored to the frame of a website. Advertisers acquired a bunch of web spaces from publishers without really knowing whether such spaces were shown to people who were genuinely interested in their products. Nowadays, the ad business scenario has changed with the advent of intermediaries. As explained in [1], three components play the main roles in the modern online advertising infrastructure, namely, advertisers, publishers, and ad-platforms, whoseultimate goal is to display the right advertisement to the right user in the right time. Today, there is a growing “data-driven” population that is increasingly connected to the Internet, watching or posting photos and videos, and continuously texting, thereby flooding the Web with data,which isleading to ever growing profits. This has driven most companies to use social networks, like Twitter, for their advertisement campaigns (See [2, 3]), turning online advertising into the primary source of income for many publishers by providing advertisements tailored to users’ interests [4].
Personalized advertising services are the most efficient and hence the most effective form of advertising disclosure. A closer match will lead to the delivery of more useful and interesting advertising contents to users. Hence, companies will benefit from services in finding a good match between the interests of a user and a set of given advertisements. Therefore, user-related information is a fundamental asset for the efficient and effective delivery of advertising. However, one major drawback for users of this closer match is that personal and sensitive data concerning users’ preferences are traded in the name of personalized advertising. Such a scenarioinevitably raises evident privacy issues for the users. In fact, personal or sensitive data (e.g.,pertaining to a user’s habits, needs, etc.) are mined by infrastructures that operate in the shadows, with virtually no oversight.
Starting from the present authors’ previous works [57], this paper presents a privacy-preserving matching service capable of determining when and how to present an advertisement to a user of anonline social network (e.g.,a Twitter user) without disclosing their sensitive data.
Adding a privacy level to the matching protocol will benefit both the social platform and the advertiser. Indeed, the platform guarantees that sensitive information about its users will not be disclosed. Simultaneously, the advertiser will be able to keep private the characteristics of ads that might contain some business value. To compute the similarity between the topics characterizing the user’s tweets and those of the given advertisement, this study usesrough set theory [8] (See also [911]). Subsequently, a privacy-preserving matching protocolis used to identify target users potentially interested in a given advertisement. The proposed approach consists of two phases. The first is a semantic data analysis that performs natural language processing and text mining of the contents of unstructured tweets and of the advertisements. This process extracts the underlying concepts and topics (i.e., semantics) to develop the corresponding users’ interests along the timeline (i.e., time awareness). The second phase performs Private Matching between the users’ interests and the topics extracted in the first phase. This activity tries to identify at whom advertising should be targeted (i.e., what) and the hourly band at which it should be sent (i.e., when), without disclosing any sensitive data. An overview of the proposed approach is illustrated in Fig. 1.
In previous solutions, advertising personalization is not directly controlled by ad platforms. Hence, there are fewer incentives for advertisers to bid more money to place their ads. This study’s proposals are completely different from state-of-the-art ones. In both proposals, ads are tailored to the users’ profiles while keeping private both the users’ profiles and the advertisements’ characteristics. In this paper’s first proposal, there is a direct privacy-preserving interaction between the advertisers and the publisher (e.g., an Online Social Platform); while, in the second proposal, the match between advertisements and users is done by a semi-trusted third party. Such a party guarantees the privacy of both users and advertisements and naturally fits the advertising ecosystem in which the ad network can play the role of the semi-trusted third party.

Fig. 1. Overall approach, where tpc stands for the generic topic.

Experiments on the advertisement-user matching service have been conducted on tweets obtained using the Twitter API. The matching was measured using precision, recall, and F-measure, which returned good values. Here, it is worth stressing that, with respect to the privacy-less advertisement-user matching service described in Section 5.1, the privacy layer does not increase by much the overall computational complexity, since the cryptographic tools used are very efficient.
Note that, in a preliminary version of this work [12], a basic version of the privacy layer was added on top of the matching protocol. In this paper, the level of security was improved by the addition of aprivacy layer. In particular, Section 5.2.2 discusses some additional data that an adversary can deduce using the solution in [12],while Section 5.2.3 presents a way to get rid of this leak. Moreover, Section 5.2.4 describes an alternative privacy-layer solution whereby a third actor (i.e., a semi-trusted third part) interacts with both the social network and the advertiser. This solution makes it necessary to trusta third entity, but it has the advantage of being more efficient thanthe previous one.
The rest of this paper is organized as follows: Section 2 discusses some other proposals that are related to this paper’s protocols; Section 3 presents the cryptographic tools needed in the private protocol; Section 4 presentsthe semantic data analysis phase; Section 5 describes the Private Matching protocols; Section 6 presents some experimental results; and Section 7 presents the conclusion.


Related Work

In literature, privacy concerns in online advertising have received considerable attention. In [13], the authors survey the online advertising infrastructure and the most relevant privacy protection mechanisms use. They also carefully summarize the fundamental privacy risks and describe several privacy protection strategies currently adopted in online advertising. All of the described strategies are dissimilar from the one suggested in this work. Some strategies for assisting targeted advertising and, at the same time, protecting personal users’ data were delineated in [14]. The setting adoptedfor the purposes of this paper is distinct from the one assumed in [14], where a client-side agent (referred to as the negotiant) is used by the advertiser as a client-side proxy. Such a negotiant preserves a client’s sensitive information and governs the targeting of advertisements. The setting proposed in [14] imposes some constraints on the advertiser’s control of negotiants. These restrictions are applied to prevent negotiants’ manipulationsaimed at obtaining a user’s profile data. Adnostic [15] is a framework that enables targeting while taking intoconsideration the privacy of users. Such anarchitecture was developed to support online behavioral advertising (OBA) without compromising user privacy. OBA tracks users while visiting websites and presents advertisements according to the browsing patterns. Within Adnostic [15], the whole behavioral profiling process takes place within the user domain (i.e., the browser) in such a way that no private info is disclosed to third parties. To assess the efficacy of this framework, the core targeting system was implemented as a Firefox extension. Other solutions to preserving users’ privacy in online behavioral advertising can be found in [16, 17]. In [16], the authors proposed a system conciliating users’ privacy during web surfing and advertising. In such a system, usersare allowed to define their privacy requirements (i.e.,they can indicate which interests should be hidden from the advertising platforms and which ones can be revealed). The proposed system is meant to be deployed as a web-browser plugin. As a privacy-preserving measure, this system also implements simulated browsing sessions. Also, the system proposed in [17] is based on a web-browser extension. The system helps users to control their advertising viewing history. A user’s profile is built from the observed user’s click-stream. The web-browser extension, based on the user’s profile, enables selective and smart ad-blocking. User privacy in targeted advertising for social networks has also been considered in [18, 19]. In [18], the authors defined the user’s privacy as a combination of the following properties: correctness, fairness, accountability, unframeability, and mis-authentication. Moreover, they proposed PPOAd, a privacy-preserving targeted ad system which, according to the authors, guarantees similar or better revenues for all the entities involved in the advertising ecosystem. Privad [19] guarantees user privacy by maintaining its browsing profile locally. User profiling is done at the user’s computer runningthe client software. Communication between the client and the advertising broker (i.e., the entity that matches users and advertisements in a bid to optimize the effectsof and revenuesgenerated by advertisements) is done through a third-party anonymizing proxy. Regrettably, anonymizing strategies like the one used by Privad have proved to be weak [20]. In [21], advertising systems are classified as either publisher-subscriber systems or push-based ones. In the first scenario, users subscribe to the advertisers’ products, whereas, in the latter one, a server (i.e., an Online Social Platform) receives both the users’ profiles and the advertising requests. Then, the server advertises each product for a set of target users. In [21],the authors propose a privacy preserving group-based advertising (PPAD) system where each advertising request is compared to the profiles of a group of users and not to a single user. If, within the group of users, the advertisement is matched by at least a fixed threshold of users’ profiles, then the advertisement is shown to all members of the group. Matching between users and advertisements is done without requiring the users or advertisers to be online. This is possible by resorting to a trusted party (referred to as a privacy service provider [PSP]). Such a PSP assists publishers (i.e.,Online Social Platforms) in protecting the privacy of their users. In [22], the authors propose a new system called PRIVADO, which improves PPAD’s security level. In particular, PRIVADO replaces the SPS with multiple servers and does not assume trust in any servers that can, additionally, collude with one another. According to [21], this study’s second proposal fits the push-based scenario, but there are some distinctions. Unlike [21, 22], our proposal does not need to hide users’ profiles from the publisher by encrypting them and by partitioning the users into groups, as user profiling is done by the Online Social Platform itself. Moreover, in our systems, each user sees only those advertisements which match her/his profile; while in [21, 22], if an advertisement matches at least a fixed number of users’ profiles in a predefined group, then the advertisement is shown to all members of the group. Hence, users could receive advertisements they are not interested in, thus lowering the effectiveness of the advertising campaign. As in [21, 22], matching is done without requiring the users to be online. Anyway, in one of this study’s proposals the users do not interact with a third party, while in [21] the users have to interact with the PSP (or with the servers in the case of the protocol described in [22]) in order to receive some information related to the group to which they have been assigned. Moreover, in [21, 22], advertising privacy is not guaranteed as the advertiser has to send to the PSP (resp., to the servers) the set of attributes (referred to as topics in the proposed solutions) characterizing the advertisement which the advertiser would like to show to a target audience. Finally, the proposed solutions are more flexible than those presented in [21, 22]. Indeed, in the proposed systems, a user u is considered a target if the topics characterizing u are related to an advertisement’s topics according to the threshold value w (more details on the matching procedure are given in Section 5);whereas, in [21, 22], users are considered to be targets only if there is an exact match between their preferences and the advertisement, which could in effect hinder the advertising campaign in some way.
Finally, [23] exposes in some detail the privacy risks facing users when their data are used in clear to profile them and to show them advertisements. In [23], it is noticed that online advertisements usually use a system of real-time bidding (RTB), which enables advertisers to compete in real-time auctions to show their ads. The authors of [23] observe that this type of mechanism promotes an aggressive user profiling strategy,and goes on to provide some solutions to discourage this phenomenon; however, this is a different approach w.r.t. consider an appropriate level of privacy on top of the matching protocol.
A system that supports targeted mobile ad delivery was recently proposed in [24]. Such a system leverages a cryptographic primitive (i.e. private stream searching) allowing an ad network to perform accurate targeting, while guaranteeing users privacy. In [24], it is assumed that the user (the app on the mobile phone) directly interacts with the ad network,whichin turn selects relevant ads for that specific user. Moreover, the basic protocol assumes that both the user profile and the ad profile are characterized by a fixed number of attributes (e.g., user interests, age, and current location). In this paper’s proposed system, where such attributes are referred to as topics, there is no limitation on the number of attributes (topics) characterizing a user/ad profile. Moreover, unlike this system, the one in [24] does not consider ads’ privacy. Indeed, the ad network knows the advertisement’s attributes values, while the user will recover the profile of any ad that shares at least one attribute value with her/his own profile. This limitation is due to the private stream searching primitive used to construct the ad delivery system.


Cryptographic Primitives

The cryptographic primitives that are at the basis of the private protocols are summarized in the following. First, the notion of the private set intersection cardinality(PSI-CA) protocol was introduced. PSI-CA is the building block of the private matching protocol. Then, a variant of the ElGamalcryptosystem, that will be used to improve the protocol’s privacy, is recalled.

Private Set Intersection CArdinality (PSI-CA)
A PSI-CA protocol is a cryptographic protocol between two players: a Client which has a set X = {$x_1$,...,$x_n$} and a Server which has a set Y = {$y_1$,...,$y_m$}. When the protocol terminates, the Server does not acquire any information except the Client’s set size (i.e. |X|). Instead, the Client learns |X ∩Y| and the size of the set Y.
The work of [25] formally defines the security of a PSI-CA protocol. In the sequel, the security requirements of a PSI-CA protocol are informally recalled. More details and applications can be found in [26[28]]. The security requirements of a PSI-CA protocol are as follows:

Server Privacy: The only information that the Client learns is|Y | and the size of the intersection between X and Y.

Client Privacy: The Server only learns the size of X.

Unlinkability: The players (Client and Server) cannot determine if two running of the protocol are executed on the same inputs, unless this is inferred from the protocol’s output.

In the remaining sections, (⟨δ,|Y |⟩,⟨⊥,|X|⟩) ← PSI-CA(X,Y )is used to indicate that the Clientand the Server, on input, respectively, the setsX and Y, execute a PSI-CA protocol and, at the end, the Client outputs δ = |X ∩ Y | and the size of the Server’s set, while the Server has no output (i.e.,⊥) beyond the size of the Client’s set X.
In this paper, semi-honest security is considered; in other words, the players are assumed to be Honest-but-Curious. In this security setting, neither player can diverge from the protocol flow; however, all players could potentially collect sensitive data from both the execution of the protocol and the protocol’s output.

Exponential ElGamal Variant
The exponential ElGamalvariant is a cryptosystem based on the ElGamalcryptosystem [29]. It does not support computationally efficient decryption, but it does permit the private key owner to check whether a ciphertext corresponds to an encryption of 0. In such a cryptosystem, messages are elements of an additive group of integers modulo a prime p (i.e., messages belong to Zp = {0, . . . , p − 1}). The exponential ElGamalvariant cryptosystem (EGV) can be described by the three algorithms (Gen, Enc, Dec) given below.

Gen($1^λ$): Given the security parameter $λ$, the algorithm Gen randomly generates the pair of private/public keys ($sk,pk$). Such an algorithm performs the following steps:
1. Randomly generates a prime p of $λ$ bits.
2. Computes a generator g of the multiplicative group $Ζ_p^*$ of integers
modulop(i.e.,$Ζ_p^*$={1,...,p-1}).

3. Randomlychoosesκ∈{2,...,p-2}.
4. Setssk=κandpk=(p,g,y=$g^κ$ mod p).
5. Outputs (sk,pk).

Enc(pk,m):Given the public key pk = (p,g,y) and a message m ∈$∈Ζ_p$, the algorithm Enc outputs an encryption c of m. Such an algorithm performs the following steps:
1. Randomlychoosesr∈{2,...,p-2}.
2. Computesα=$g^r$ mod p and β=$y^r$•$g^m$ mod p.
3. Outputs c = ($α, β$).

Dec ($sk,c$): Given the secret key sk = κ and a ciphertext c = ($α,β$), the algorithm Dec outputs the decryption m of c. Such an algorithm performs the following steps:
1. Computesμ = β/$α^k$ modp.
2. Outputs the discrete logarithm modulo p of μ.

The security of the above cryptosystem relies on how the prime number p is chosen.Indeed, p has to be selected in such a way that it is hard (i.e. computationally unfeasible) to compute the discrete logarithmof any value in $Ζ_p^*$ (Given a generator g of $Ζ_p^*$ and a value $a∈Ζ_p^*$, the discrete logarithm of a mod p is the integer $e∈Ζ_p^*$ such that a = gemod p). State-of-the-art algorithms used to compute the discrete logarithm over $Ζ_p^*$. have a O(√p) computational complexity. Hence, p’s bitlength should be quite large; it is suggested to choose p having about 2,048 bits (i.e. the security parameter λ should be at least 2,048). Algorithms Gen and Enc could be efficiently computed, while Dec could not. Indeed, since y = $g^k$ mod p, the following relation holds

image(1)

Therefore, m can be computed in a reasonable amount of time (e.g., by trying all possible values) only if m’s bitlength is small, say about 40 bits. In any case, the private key owner can immediately verify whether a ciphertext is an encryption of 0 as, in this case, μ would be equal to one. Indeed, for some random value $r, Dec$($sk,Enc$($pk$,0))=Dec($g^r$,$y^r$ •$g^0$)=$g^0$ =1.
Note that the Exponential ElGamal Variant provides additive homomorphism. Given the encryption of two messages, say m1and m2, their encryptions can be multiplied component-wise to obtain the encryption of their sum. Indeed, for somerandomvaluesr1,r2∈{2,...,p−2}andapublickeypk,then

image(2)

Given an encryption ($𝛼,𝛽$)= $𝐸𝑛𝑐(𝑝𝑘,𝑚)$ of a message m under a public key pk and a constant value 𝑐 $∈Ζ𝑝∗$, the component-wise exponentiation of $𝐸𝑛𝑐𝑝𝑘,𝑚$ is denoted with $𝐸𝑛𝑐(𝑝𝑘,𝑚)^𝑐$, that is $𝐸𝑛𝑐(𝑝𝑘,𝑚)^𝑐$ = ($𝛼^𝑐$,$𝛽^𝑐$). The following relation𝐸𝑛𝑐(𝑝𝑘,𝑚)𝑐= $𝐸𝑛𝑐(𝑝𝑘,𝑐 • 𝑚)$ holds as

image(3)

Such homomorphic properties will be used in the proposed Private Matching Protocol with enhanced security.

Pseudorandom Permutation
Informally, pseudorandom functions are efficiently computable and deterministic functions which can return pseudorandom output that is indistinguishable from random sequences. A pseudorandom function family (PRF) is a collection of pseudorandom functions, each of which can be chosen using a key κ. More precisely, each function can be indexed by a key κ. Hence, randomly choosing κ will implicitly choose a pseudorandom function within the family. A pseudorandom function family can be described by a function F where ∶ {0,1}$^k$ ×{0,1}$^n$ → {0,1}$^m$ . For any ∈{0,1}$^k$ , the function F (κ,•) defines the mapping x⟼ F (κ,x) that is efficiently computable. Given the previous definition, it is possible to define a pseudorandom permutation family (PRP) as a collection of pseudorandom functions where m = n.


Semantic Data Analysis

Semantic data analysis extracts underlying concepts and topics that correspond to the unstructured input text. More specifically, it performs a semantic enrichment of the text through a Wikify algorithm [30]. Briefly, Wikify is typical practice for linking the most relevant keywords included in a piece of text with the corresponding definition given in Wikipedia (The Wikification service was originally provided by http://wikipedia-miner.cms.waikato.ac.nz/, but is now only available as a private local installation on one’s own hardware. The Wikification source code is available at https://github.com/dnmilne/wikipediaminer).
Wikify assigns a unique identity to entities mentioned in the text. It extracts a set of ⟨tpc, rel⟩ pairs consisting of tpc, one of the Wikipedia page corresponding to detected topics or concepts and the corresponding degree of relevance rel [30].
Following, was reported an example by considering the tweet in Fig. 2.

Fig. 2. Tweet of Barack Obama on September 17, 2019.

tweeti = “Just 16, @GretaThunberg is already one of our planet’s greatest advocates. Recognizing that her generation will bear the brunt of climate change, she’s unafraid to push for real action. She embodies our vision at the @ObamaFoundation: A future shaped by young leaders like her.”

An example of Wikification output is the set of following pairs: ⟨Climatechange, 0.81⟩, ⟨Advocate, 0.72⟩, ⟨Climate, 0.72⟩, ⟨Visualperception, 0.7⟩, . . . .
In addition to the Wikify service, the system can calculate the Semantic Relatedness sr between a pair of Wikipedia pages. Given two topics corresponding to Wikipedia pages, the Semantic Relatedness is a value in the range of [0,1] that describes the distance/closeness of the input pages. It relies on evaluating the ratio of incoming and outgoing links shared by the input Wikipedia pages.


(Private) Matching

As introduced in [7], this research work, when given an advertisement at a certain time, uncovers potentially interested users in that ad. Note that topics characterizing users and ads could contain sensible information which should not be disclosed. Hence, given the advertising matching protocol discussed in Section 5.1, the main goal here is to introduce a privacy layer on the top of it so as toensure that both the advertisements’ topics and those of the users stay private.

Ad Matching Protocol
A matching procedure is defined by taking into account the rough sets representing the user and the advertisement. More formally, given a threshold ε,the users’ posts, the corresponding timestamp, the extracted topics T, and the Semantic Relatedness sr, a binary relation $R^ε$ on T is induced as follows:

image(4)

Let U = U(u, t), the set of Wikipedia concepts (i.e., topics) extracted by all tweets posted by u at time slot t, and let A be the set of concepts extracted from a given advertisement; then consider the restriction of $R^ε$ to U setting:

image(5)

and letting $R_U^ε$(a)= {b ∈ U ┤|a $R_U^ε$ b},} the Orthopair O(A) is defined as O(A)=(L(A),E(A)),where

image(6)

image(7)

During the timeline in a specific advertisement, to identify the degree of interest of users, are determined the number of concepts of U and A that are in L(A) andE(A) (See Procedure 1).
The triple r = ($r_1$,$r_2$,$r_3$) is defined as follow:

image(8)

From definition (8), it is easy to verify that

the topics of U ∩ A are at least the r1% of all the topics of U.

at least the r2% of the topics of U are different from the topics of A;

the topics of U ∩ A are at least the r3% of all the topics of A.

Moreover, setting

image(9)

the value r4 is referred to as the value of uncertainty of u at time t. A simple algebra shows that

image(10)

whereB(A) defined as

image(11)

represents the $R_U^ε$ -boundary region of A. The values $r_1$,$r_2$,$r_3$, and $r_4$ enable the collection of valuable information to understandthe potential level of a user’s interest in a given advertisement. To establish a unique indicator for u, a weight w depending on $r_1$,$r_2$,$r_3$, and $r_4$ was evaluated as follow:

image(12)

where hm($r_1$,$r_3$) is defined as

image(13)

(i.e.,the harmonic mean of $r_1$ and $r_3$). In conclusion, at time t, the advertisement A is shown to the Twitter user u if $w ≥ λ$, for a threshold $λ$ ∈ [0, 1].
Procedure1. orthopair(U, A, ε)
L(A)←∅, E(A)←∅
for allain U do
B←∅
for allbinUdo
ifsr(a, b) ≥ εthen
B = B ∪ {b}
endif
end for
ifB ⊆ Athen
       L(A)← L(A)∪ B
end if
ifB∩ A= ∅ then
E(A)← E(A)∪ B
end if
end for
return(L(A), E(A))


Private Ad Matching
Note thatU and A can contain sensitive information and thus should not be disclosed. For instance, since the topics in U are extracted by user profiling they could contain valuable information like user habits, preferences, and lifestyles. On the other hand, the topics in A have some business value because they represent a way of promoting a product. Hence, the goal of this section is to introduce a privacy layer on top of the advertising matching protocol presented in Section 5.1.
The aim is to ensure that the topics from both the advertisements (i.e., the topics in A) and the user’s messages (i.e., the topics in U) stay private. Once U and A have been computed, the technique of Section 5.1 allows only one player to calculate the matching (either the Advertiser or, more realistically, the Online Social Platform). If only one player (e.g., the Advertiser) computes the matching (i.e., the value w given by Equation (8)), it is easy to see that the Advertiser learns the topics in U. To avoid this issue,a different path was taken: the value wwill be computeinteractively. The Advertiser interacting with the Online Social Platform will compute the value w.
Despite the interaction, the protocol will be able to hide the topics characterizing the advertisement to the Online Social Platform and, at the same time, the Advertiser will be unaware of the topics profiling a given user. In order to calculate wwhile simultaneously guaranteeing the privacy of the topics in A and U,the protocol will make use of a specific multi-party computation protocol [31, 32],namely, the PSI-CA protocol described in Section 3.1 above.

A different view of the matching protocol
As a first step, to add privacy to the advertising matching protocol as discussed in Section 3.1, some of its computations will be rearranged. Then, some computations are transformed in such a way that the topics’ privacy is preserved. Setting $s_L$= |L (A) |,$s_ℇ$ = |ℇ(A)|,$s_A$ = |A|, and $s_U$ = |U|and using Equations (5), (6), and (9), a simple algebra shows that Equation (8) can be written as follows:

image(14)

Indeed,

image(15)

Note that, if $s_L$ = 0, then r1 = r3 = 0 and, according to Equation (8), whas to be 0. This is correctly captured by the above re-writing of w. Therefore, to compute the value w, the only information required is the size of the sets L (A), ℇ(A), A, and U.
It is immediate to see that the execution of the Advertising Matching Protocol can be organized into three phases as follows:

Given U, compute $R_U^ε$ (u), for u ∈ U.

Given $R_U^ε$ ($u_1$ ),…,$R_U^ε$ ($u_{sU}$ ), and$A$, $computes_L$ and $s_ℇ$.

$Givens_U$,$s_A$,$s_L$ and $s_ℇ$, compute w=2∙$\frac{s_L(s_L+s_ℇ)}{s_U(s_A+ s_U)}$.

The Online Social Platform can compute the values occurring in the first phase as it knows the users’ posts and habits from which it is possible to extract the topics characterizing them. The second and third phases can be executed by the Advertiser,which, upon interacting with the Online Social Platform, computes the values $s_U$, $s_A$, $s_L$ and $s_ℇ$, from whichthe parameter w can be derived, according to Equation (10).
Note that all the elements in the set L(A)are elements in U, as this fulfills Relation (3),which states that an element u ∈ U belongs to L(A)if, and only if, $R_U^ε$ (u)⊆A.
In order to test whether $R_U^ε$ (u)⊆A, one could equivalently check whether |$R_U^ε$ (u) ∩ A| = |$R_U^ε$ (u)|. Hence, for any u ∈ U, is derived that u ∈ L(A)iff|$R_U^ε$ (u)∩A| = |$R_U^ε$ (u)|. Therefore, to compute $s_L$ it is enough to count how many $R_U^ε$ (u)’s, with u ∈ U, satisfy |$R_U^ε$ (u)∩ A| = |$R_U^ε$ (u)|. Previous arguments can also be applied to compute $s_ℇ$. In this case, according to Relation (4), the condition to check is whether |$R_U^ε$ (u) ∩ A| = 0.
In the next section, theaforementioned observationsare exploited in order to transform the Advertising Matching Protocol presentedin Section 3.1 into a private one by also resortingto a multiple sequential running of a PSI-CA protocol.

Private matching protocol
Fig. 3 describes the first version of the proposed private matching protocol. Such a protocol is divided into three phases. The first one is executed by the Online Social Platform,which, Given U = {$t_1^u$ ,...$t_{sU}^u$ } and ε, computes the sets $R_i$ = $R_U^ε$ ($t_i^u$), for i = 1, . . . ,sU. During the second phase, Advertiser and Online Social Platform, on input A and $R_i$ respectively, interact by
Fig. 3. Private matching protocol.

running a PSI-CA protocol. The Advertiser, after each protocol’s execution, computes the size δi of the intersection A ∩ $R_i$ and the size of the set $R_i$ , while the Online Social Platform only computes the size of the set A. The third phase is executed by the Advertiser,which computes the value w according to equation (10).
Note that, in order to compute the size of the sets comprising the orthopairO(A) = (L(A),ℇ(A)) (i.e. the values $s_L$ and $s_ℇ$, the Advertiser resorts to the trick described at the end of Section 5.2.1. The Online Social Platform, upon receiving at the end of Phase III the value w from the Advertiser, determines whether to show the advertisement represented by the topics in A to a user characterized by the topics in U.
A note on computingw:Here, the authors draw attention to the fact that, in the private matching protocol presented in Fig. 3, the Advertiser directly computes the value w by itself. Were it otherwise, some information could be leaked. Indeed, the Online Social Platform, in order to quantify the value w that will allow it to distinguish between whether to show or not show an advertisement to a user, has to compute the values $δ_i$ = |A ∩ $R_i$ |, for i = 1,…,$s_U$. The values $δ_i$ can be computed by reversing the computations executed within the PSI-CA protocol. In this way, the Advertiser computes ⟨⊥,|$R_i$ |⟩, while the Online Social Platform computes $⟨δ_i$ ,|A|⟩, or i = 1,...,$s_U$ . This could have some privacy-leakage aftereffects. Indeed, assume that the sets $R_1$ ,...,$R_{SU}$ have different sizes (i.e., |$R_i$ | ≠ |$R_j$ | when i ≠ j). Then, the Online Social Platform, upon computing $δ_i$ = |$R_i$ ∩ A|, can easily check whether $δ_i$ = |$R_i$ |, for some i ∈ {1,...,$s_U$ },discovering that each topic in $R_i$ characterizes the advertisement A. In this way, the Online Social Platform accumulates some limited findings on how A is composed, thereby potentially impairing the Advertiser’s privacy.
A note on the attained privacy level: According to the steps executed in the proposed protocol, the topics in U and A (i.e., respectively, the topics of the Advertiser and of the Online Social Platform) are concealed. Nevertheless, although the protocol preserves topics’ privacy, the Advertiser grasps, in addition to w, some additional information. It was mentioned above that, for i ∈ {1,...,$s_U$}, the Advertiser knows $δ_i$ = |$R_i$ ∩ A|. Since the Advertiser cannot determine whether a topic in A belongs to $R_i$, as well, this event is not considered to be a demanding privacy breach. In any case, Section 5.2.3 will present a technique designed to withstand such a privacy leakage. The solution will be based on the use of fake topics in both $R_i$ and A. In this paper’s improved protocol, because of the use of the EGV cryptosystem, the Advertiser will only verify either whether $R_i$ ∩ A does not contain any element (i.e., |$R_i$ ∩ A| = 0) or whether |$R_i$ ∩ A| = |$R_i$|, for i ∈{1,...,$s_U$}. From such insights, the Advertiser cannot infer any sensitive information on the topics in $R_i$.
The value w computed by the Advertiser can suggest that an unidentified user might be attracted by the advertisement characterized by A. The parameter w describes how A is close to U. Consequently, the Advertiser could infer an estimate of the number of users engaged within the advertising campaign. Knowing such an estimate is not a negative feature of our protocol, theAdvertiser could profitably use such information to infer an assessment of the advertising campaign’s effectiveness by estimating the number of target users likely to be reached. Moreover, to determine the profits to ascribe to the Online Social Platform, the Advertiser usually implements other methods (of which some examples can be found in [33-36]). Therefore, the information leaked by the proposed protocol can anyhow be readily learned by the Advertiser through different processes.

Increasing privacy of the private matching protocol
In the previous section, it is pointed out that the private matching protocol described in Fig. 3 suffers from some minor privacy leaks. Such a privacy problem can be avoided by resorting to the introduction of fake topics and by using the EGV cryptosystem.
In this paper’s amended protocol, to mask the real topics characterizing the advertisement and the users, the set of fake topics Dummywas resorted to such that Dummy ∩ T = ∅.
As such fake topics do not have to represent any particular attribute, they can simply be random strings. Both the Advertiser and the Online Social Platform know the set Dummy, but such a set is used differently by the two players. More precisely, the Advertiser, instead of using the set A, will use the set A' = A∪Dummy; while the Online Social Platform will interact with the Advertiser using the set $R_i$' = $R_i∪D_i$, for a randomly chosen subset $D_i$ $⊆_R$ Dummy. The symbol $⊆_R$ denotes the selection, by the Online Social Platform, of a random subset from Dummy.
Note, however, that knowing the set Dummy does not allow the finding of set A as the two sets Dummy and Aare disjoint sets.
Moreover, since Dummy∩T =∅, and thus|$A$'∩$R_i$' |=|A∩$R_i$ |+|$D_i$ |, the size of the intersection A ∩ $R_i$ is masked with a random value |$D_i$|. Applying the homomorphic properties of the EGV cryptosystem, the Advertiser, by interacting with the Online Social Platform, will obtain an encryption of | A ∩ $R_i$|. Such a value will be encrypted under the Advertiser’s public key.
Therefore, according to the EGV properties, the Advertiser will efficiently decrypt it if, and only if, either $R_i$⊆ A or A ∩ $R_i$ = ∅. In the protocol shownin Fig. 4, ($sk_A$,$pk_A$) denotes the Advertiser’s private/public key pair.
Fig. 4. Enhanced private matching protocol.

The protocols in Figs. 3 and 4 follow the same pattern. Both are divided into three phases,of which the first two are almost identical. The only differences are in the sets A' and $R_i$' used for the execution of the PSI-CA protocol. Anyway, there are few observations to be pointed out. This study resorts to dummy topics to hide the size of the intersection | A ∩ $R_i$|. Indeed, after running the PSI-CA(A',$R_i$'), the Advertiser learns the value |A'∩$R_i$'| = |A∩$R_i$ |+|$D_i$ | where the size of A ∩ $R_i$ is blinded by the value |$D_i$|.
In the third phase, by using the additive homomorphism of the EGV cryptosystem, the Online Social Platform (who randomly selected the set $D_i$) can be allowed to un-blind | A ∩ $R_i$| without knowing the size of the intersection A ∩ $R_i$. The Online Social Platform could accomplish this task simply by computing

image(16)


and sending this value to the Advertiser. Note, however, that this is not enough since the Advertiser, can recover$g^{|A∩R_i|}$ by decrypting Enc($pk_A$,|$A$'∩$R_i$' | ), and since | A∩$R_i$| is a small value, the Advertiser can obtain | A∩$R_i$|. This event must be avoided. To fix such a leakage, the Online Social Platform masks | A∩$R_i$| with a random value $ρ_i$ $∈_R$ $Z_p^*$ by computing

image(17)

Note that, during Phase III of the enhanced private matching protocol, computing ($α,β$) =(〖($α_i$ $α_i'$ )$〗^{ρi}$ ,〖($β_i$ $β_i'$)$〗^{ρi}$) is equivalent to computing(α,β), as shown in (11). This simple trick solves the problem as the Advertiser, by decrypting (α,β),obtains the following:

image(18)

Therefore, $γ_i$ = 1 if, and only if,A∩$R_i$ = ∅and that$γ_i$ = $g^{ρ_i•|R_i|}$ if, and only if,|A∩$R_i$|= |$R_i$| (i.e., $R_i$⊆ A). In the other cases, when A∩$R_i$≠ ∅and |A ∩ $R_i$ | ≠ |$R_i$|, due to the random nature of $ρ_i$, it results that $γ_i$ is just a random value in $Z_p^*$. Hence, the Advertiser does not learn any information on | A∩$R_i$| . Note, however, that the discrete logarithm of $γ_i$ cannot be efficiently computed as $γ_i$ is uniformly distributed in $Z_p^*$ and $ρ_i$ •|A ∩$R_i$ | is not necessarily a small value.

Outsourcing private matching
The protocols described in the previous sections are quite efficient, but they rely on asymmetric cryptographic primitives that are not as efficient as their symmetric counterparts (e.g., hash functions or block-ciphers). Moreover, each PSI-CA execution requires few interactions between the Advertiser and theOnline Social Platform. Hence, to lower the computational burden on both sides, some of the computation done during the execution of the advertisement matching was outsourced to a semi-trusted service provider. In particular, computation of the size of the intersection between the sets A and Riwas outsourced to a service provider, as in secure cloud computing. As such, the privacy of both the topics characterizing the advertisement and the topics profiling a given user would not be jeopardized. Such a service provider can be an Ad Platform,which is an entity of a modern online advertising infrastructure [37] that enables interactions between an Advertiser and an Online Social Platform. The Advertiser creates the demand for advertising services, while the Online Social Platform provides online content and generates the supply. In addition, anAd Platform constitutes the marketplace where the supply of and demand for online advertising services are matched [37].
In this new scenario, to protect topics’ privacy, the dummy topics trick is used again, as in the protocol described in Fig. 4. It is assumed that the Online Social Platform and the Advertiser share a key κ of a pseudorandom permutation F, while the service provider knows the Advertiser’s public key $pk_A$ of the EGV cryptosystem. Starting from a common input, say an n-bits all zeros string (i.e.,$k_0$ = 0), and using the pseudorandom permutation F, the Online Social Platform and the Advertiserthe temporary keys $k_i$= F (κ,$k_{i-1}$ ),for i = 1, . . . ,sU.. The new protocol is quite similar to the one shown in Fig. 4. Its execution is divided into three phases, of which the first is identical to that of the protocol shown in Fig. 4. The second phase is substituted by $s_U$ interactions with the service provider (i.e., with $s_U$ invocation of an outsourced PSI-CA protocol). An outsourced PSI-CA protocol can be devised by considering the following simple steps. For = 1,...,$s_U$, the Advertiser sends to the service provider the set$F_{A_i}$ = {F($k_i$,t) | t ∈ A∪Dummy}; while the Online Social Platform sends to the service provider the set $F_{R_i}$ ={F($k_i$,t)|t∈$R_i$∪$D_i$},forarandomlychosensubset $D_i$⊆Dummy. To be precise, both parties send to the service provider the elements in each set in a random order, i.e., they send a shuffling of the sets. This is necessary to prevent some information, e.g., the lexicographic order of the topics in the sets,being inferred by the service provider from the order of such elements in the received set. Note that the pseudorandom permutation F is defined as F : {0,1}$^k$ × {0,1}$^n$ → {0,1}$^n$, while $k_i$ is n bits long. This is not ad issue at all as, if n > k, then $k_i$ is truncated to its first n bits; while if n<k, then $k_i$ is padded with n-k zeroes.
The application of the pseudorandom permutation to the topics in A ∪ Dummy and in $R_i$∪$D_i$ protectsthe topics’privacy. The service provider computes the value $δ_i$ = |$F_{A_i}$ ∩ $F_{R_i}$ | = |A ∩ $R_i$ | + |$D_i$| and sends Enc($pk_A$,$δ_i$) to the Online Social Platform. The third and last phase is almost identical to the one shown in Fig. 4,the only difference being that the Advertiser does not send to the Online Social Platform the first message (i.e., Enc($pk_A$,$δ_i$)); rather, such a message is received by the Online Social Platform from the service provider.
Note that the protocol discussed in Section 5.2.4 is more efficient in terms of communication because there are fewer interactions between the Online Social Platform and the Advertiser. In addition, from a computational point of view, a large part of the computation is delegated to a (semi-trusted) third party. Consequently, both theOnline Social Platform and the Advertiser compute only evaluations of pseudo-random functions and, instead, the protocol presented in Section 5.2.2 requires them to compute expensive public-key operations. On the other hand, the solution proposed in Section 5.2.2 does not require the existence of a semi-honest third party.


Experiments

This section describes the data set, measures, and results of the experiment conducted to assess the quality of the methodologies and techniques used to detect a sample of target users to whom the ads are to be given.
A real data-setcovering a period of6 months (January to June 2019), and comprising some 400,000 tweets posted from about 6,000 users, was acquired by using the Twitter Streaming API (https://dev.twitter.com/streaming/overview). In order to evaluate the proposed approach, 10 tweets were selected as promoted ads on such topics as sports and games, and the activity of each user, in terms of tweets/re-tweets and likes, was captured. Table 1 shows further details of the dataset statistics that were collected in different time slots (i.e., in [midnight, noon] and [noon, midnight]).

Table 1. Data-set statistics: tweets collected in time slots [midnight, noon] and [noon, midnight]

Time slot 1 Time slot 2 Total
Re-tweets/replies 8,232 12,030 20,262
Tweets 198,323 234,432 426,755
Users 2,736 3,252 5,988
To evaluate the performance of the proposed approach in detecting a set of users interested in a specific advertisement, several measures of information retrieval systems were used: Precision, Recall, F-Measure, False Positive Rate, and Accuracy.
Given the set of all analyzed users U, an advertisement “A”, and a specific time slot t let:

$U^*$ = {$u_1^*$,$u_2^*$,...,$u_m^*$} denote the set of users interested in “A” within the fixed time slot t. Such a set represents the gold set whose elements have been handpicked by domain expert users;

U ̃ = {($u_1$) ̃ ,($u_2$) ̃ ,...,($u_n$) ̃ } denote the set of users selected by the proposed methodology (i.e., the result set).

The measure Precision represents the value of the fraction of retrieved users who are relevant. For the specific time slot t, it is defined as:

image(19)

The measure Recall represents the value of the fraction of relevant users who are actually retrieved. It shows how good the methodology is at detecting target users. For the specific time slot t, it is defined as:

image(20)

The measure F-Measure computes how much (in terms of quality and quantity) the retrieved users are interested in the advertisement. For the specific time slot t, it is referred to as Ftand defined as:

image(21)

The measure False Positive Rate (FPR) is the ratio of the number of uninterested users wrongly categorized as interested to the total number of uninterested users. For the specific time slot t, it is referred to as Ftand defined as:

image(22)

The measure Accuracy is the fraction of predictions which our model got right. For the specific time slot t, it is defined as:

image(23)


Fig. 5 shows the performances tested in terms of these measures for two different time slots (i.e., [midnight, noon] and [noon, midnight]) with a different threshold ε ∈ [0.0,1.0] (See Equation (1)).
Fig. 5. Evaluation results obtained by varying the level of the threshold ε∈ [0, 1] in two time slots [midnight, noon] and [noon, midnight].

As shown in Fig. 5, the proposed approach reveals good performances with a threshold ε∈ [0.6, 0.8], with an acceptable value of Recall and Precision and a good value of Accuracy ≈ 0.9. These metrics quantify how much and with what degree of accuracy the gold set is covered by the result set (of selected users) for a specific time slot, by varying the semantics threshold ε (See Equation (1)).
Furthermore, the graph in Fig. 5 shows that better results are obtained in the second time slot [noon, midnight]. An analysis of the data indicates that the quality of the results is improved when the intensity of the tweets posted is greater than in the other time slot. In the second time slot [noon, midnight], agreater flow is available, allowing a better characterization of the users and, consequently, a more effective correspondence. Furthermore, by comparing the Recall to FPRcurve, the proportion of uninterested users that end up being targeted can be emphasized. As can be seen, by varying the threshold, the proposed approach almost always reveals an acceptably low value for the false positive rate.


Conclusion

This paper presents a methodology for profiling social network users in a temporal window (time slot) in order to target them with advertisements most likely to be of interest to them. Since evident security risks exist, a privacy layer was designed to prevent leakages of the sensitive information of all the playersconcerned, including the users and advertisers. Considering this issue, different future directions may be taken. One may prefer to use completely different linguistics tools than“Wikify!” to extract topics from users’ tweets streams. Moreover, a comparison of the proposed approach based on rough set theory with alternative mathematical tools, like fuzzy sets [38], could also offer helpful insights. This paper does not consider real-time bidding (RTB). It would be interesting to analyze whether our techniques could be profitably used in RTB to make this process privacy friendly without resorting to client-side agents (i.e., proxies, browser’s extensions/plug-ins), as is the case in state-of-the-art works. Furthermore, in future, the authors plan to thoroughly compare the enhanced private matching protocol versus the outsource protocol, and to implement both of themso as to evaluate their performance in terms of running time and communication complexity (i.e., size of the messages exchanged by the Advertiser, the Online Social Platform, and, in the case of the outsourced version, the service provider). The authors will also assesswhether there is an optimal number of topics per ad and per user for minimizing the complexities of the protocol while maximizing the effectiveness of the advertisement campaign.


Acknowledgements

Not applicable.


Author’s Contributions

All the authors have participated in theresearch work and contributed to this manuscript. All the authors have read and approved the final manuscript.


Funding

None.


Competing Interests

The authors declare that they have no competing interests.


Availability of Data and Material

A real data-set was acquired by using the Twitter Streaming API.


Author Biography

image
Name : Carmen De Maio
Affiliation : Dipartimento di Ingegneria dell’Informazione ed Elettrica e Matematica Applicata, Università degli Studi di Salerno, 84084 Fisciano, Italy
Biography : AssistantProfessor, her research interests includeKnowledge Extraction,Social Media Analytics.

image
Name : Mimmo Parente
Affiliation : Dipartimento di Scienze Aziendali-Management and Innovation Systems (DISA-MIS), Università di Salerno, 84084 Fisciano, Italy
Biography : Professor of Computer Science, his research interests includeAutomatic Verificationof Systems and Text Mining.

image
Name : Carlo Blundo
Affiliation : Dipartimento di Scienze Aziendali-Management and Innovation Systems (DISA-MIS), Università di Salerno, 84084 Fisciano, Italy
Biography : Professor of computer science, his main research interests include cryptography and data security.

image
Name : Luisa Siniscalchi
Affiliation : Concordium Blockchain Research Center, Aarhus University, Aarhus, Denmark
Biography : Postdoctoral researcherat the Aarhus University, her research is focused on Cryptography


References

[1] J. Estrada-Jimenez, J. Parra-Arnau, A. Rodriguez-Hoyos, and J. Forne. “Online advertising: analysis of privacy threats and protection approaches,” Computer Communications, vol. 100, pp. 32-51, 2017.
[2] M. R. Solomon and T. L. Tuten, Social Media Marketing. Upper Saddle River, NJ: Pearson Education, 2012.
[3] C. W. Ho and Y. B. Wang. “Re-purchase intentions and virtual customer relationships on social media brand community,” Human-centric Computing and Information Sciences, vol. 5, no. 1, article no. 18, 2015. https://doi.org/10.1186/s13673-015-0038-x
[4] J. Deighton, L. Kornfeld, and M. Gerra, “Economic value of the advertising-supported internet ecosystem,” 2017 [Online]. Available: https://www.iab.com/wp-content/uploads/2017/03/Economic-Value-Study-2017-FINAL2.pdf.
[5] C. De Maio, G. Fenza, M. Gallo, V. Loia, and S. Senatore, “Formal and relational concept analysis for fuzzy-based automatic semantic annotation,” Applied Intelligence, vol. 40, no. 1, pp. 154-177, 2013.
[6] C. De Maio, G. Fenza, V. Loia, and M. Parente, “Time aware knowledge extraction for microblog summarization on Twitter,” Information Fusion, vol. 28, pp. 60-74, 2016.
[7] S. Boffa, C. D. Maio, B. Gerla, and M. Parente, “Context-aware advertisment recommendation on Twitter through rough sets,” in Proceedings of 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Rio de Janeiro, Brazil, 2018, pp. 1-8.
[8] Z. Pawlak, “Rough sets,” International Journal of Computer & Information Sciences, vol. 11, no. 5, pp. 341-356, 1982.
[9] B. Walczak and D. L. Massart, “Rough sets theory,” Chemometrics and Intelligent Laboratory Systems, vol. 47, no. 1, pp. 1-16, 1999.
[10] J. Jarvinen, “Knowledge representation and rough sets,” 1999 [Online]. Available: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.8637&rep=rep1&type=pdf.
[11] A. Skowron and S. Dutta, “Rough sets: past, present, and future,” Natural Computing, vol. 17, no. 4, pp. 855-876, 2018.
[12] C. Blundo, C. D. Maio, M. Parente, and L. Siniscalchi, “An intelligent and private method to profile social network users,” in Proceedings of 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), New Orleans, LA, 2019, pp. 1-6.
[13] J. Estrada-Jiménez, J. Parra-Arnau, A. Rodriguez-Hoyos, and J. Forne, “Online advertising: analysis of privacy threats and protection approaches,” Computer Communications, vol. 100, pp. 32-51, 2017.
[14] A. Juels, “Targeted advertising ... and privacy too,” in Topics in Cryptology - CT-RSA 2001. Berlin, Germany: Springer, 2001, pp. 408-424.
[15] V. Toubiana, A. Narayanan, D. Boneh, H. Nissenbaum, and S. Barocas, “Adnostic: privacy preserving targeted advertising,” in Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, 2010, pp. 1–16.
[16] D. Sanchez and A. Viejo, “Privacy-preserving and advertising-friendly web surfing,” Computer Communications, vol. 130, pp. 113-123, 2018.
[17] J. Parra-Arnau, J. P. Achara, and C. Castelluccia, “MyAdChoices: bringing transparency and control to online advertising,” ACM Transactions on the Web, vol. 11, no. 1, article no. 7, 2017. https://doi.org/10.1145/2996466
[18] E. Androulaki and S. M. Bellovin, “A secure and privacy-preserving targeted ad-system,” in Financial Cryptography and Data Security. Berlin, Germany: Springer, 2010, pp. 123-135.
[19] S. Guha, B. Cheng, and P. Francis, “Privad: practical privacy in online advertising,” in Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, Boston, MA, 2011, pp. 169-182
[20] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” in Proceedings of 2008 IEEE Symposium on Security and Privacy (sp2008), Oakland, CA, 2008, pp. 111-125.
[21] S. T. Boshrooyeh, A. Küpçü, and O. Ozkasap, “PPAD: Privacy Preserving Group-Based ADvertising in online social networks,” in Proceedings of 2018 IFIP Networking Conference (IFIP Networking) and Workshops, Zurich, Switzerland, 2018, pp. 1-9.
[22] T. Boshrooyeh, A. Kupçu, and O. Ozkasap, “Privado: privacy-preserving group-based advertising using multiple independent social network providers,” ACM Transactions on Privacy and Security, vol. 23, no. 3, article no. 12, 2020. https://doi.org/10.1145/3386154
[23] J. Estrada-Jimenez, J. Parra-Arnau, A. Rodriguez-Hoyos, and J. Forne, “On the regulation of personal data distribution in online advertising platforms,” Engineering Applications of Artificial Intelligence, vol. 82, pp. 13-29, 2019.
[24] J. Jiang, Y. Zheng, Z. Shi, X. Yuan, X. Gui, and C. Wang, “A practical system for privacy-aware targeted mobile advertising services,” IEEE Transactions on Services Computing, vol. 13, no. 3, pp. 410-424, 2020.
[25] E. De Cristofaro, P. Gasti, and G. Tsudik, “Fast and private computation of cardinality of set intersection and union,” in Cryptology and Network Security. Heidelberg, Germany: Springer-Verla, 2012, pp. 218-231.
[26] C. Blundo, E. D. Cristofaro, and P. Gasti, “EsPRESSO: efficient privacy-preserving evaluation of sa.mple set similarity,” Journal of Computer Security, vol. 22, no. 3, pp. 355-381, 2014.
[27] E. De Cristofaro and G. Tsudik, “Experimenting with Fast Private Set Intersection,” in Trust and Trustworthy Computing. Heidelberg, Germany: Springer-Verlag, 2012, pp. 55-73.
[28] M. Ciampi and C. Orlandi, “Combining private set-intersection with secure two-party computation,” in Security and Cryptography for Networks. Cham, Switzerland: Springer Nature, 2018, pp. 464-482.
[29] T. Elgamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE Transactions on Information Theory, vol. 31, no. 4, pp. 469-472, 1985.
[30] R. Mihalcea and A. Csomai, “Wikify!: linking documents to encyclopedic knowledge,” in Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisbon, Portugal, 2007, pp. 233-242.
[31] M. Ciampi, R. Ostrovsky, L. Siniscalchi, and I. Visconti, “Round-optimal secure two-party computation from trapdoor permutations,” in Theory of Cryptography. Cham, Switzerland: Springer Nature, 2017, pp. 678-710.
[32] M. Ciampi, R. Ostrovsky, L. Siniscalchi, and I. Visconti, “Delayed-input non-malleable zero knowledge and multi-party coin tossing in four rounds,” in Theory of Cryptography. Cham, Switzerland: Springer Nature, 2017, pp. 711-742.
[33] M. Ion, B. Kreuter, E. Nergiz, S. Patel, S. Saxena, K. Seth, D. Shanahan, and M. Yung, “Private intersection-sum protocol with applications to attributing aggregate ad conversions,” 2017 [Online]. Available: https://eprint.iacr.org/2017/738.
[34] M. Ion, B. Kreuter, A. E. Nergiz, S. Patel, M. Raykova, S. Saxena, K. Seth, D. Shanahan, and M. Yung, “On deploying secure computing: private intersection-sum-with-cardinality,” 2019 [Online]. Available: https://eprint.iacr.org/2019/723.
[35] D. Liu, J. Ni, H. Li, X. Lin, and X. Shen, “Efficient and privacy-preserving ad conversion for v2x-assisted proximity marketing,” in Proceedings of 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Chengdu, China, 2018, pp. 10-18.
[36] Google, “Cookie matching,” 2020 [Online]. Available: https://developers.google.com/authorized-buyers/rtb/cookie-guide.
[37] S. Yuan, A. Z. Abidin, M. Sloan, and J. Wang, “Internet advertising: an interplay among advertisers, online publishers, ad exchanges and web users,” 2012 [Online]. Available: https://arxiv.org/abs/1206.1754.
[38] Y. Yang and C. Hinde, “A new extension of fuzzy sets using rough sets: R-fuzzy sets,” Information Sciences, vol. 180, no. 3, pp. 354–365, 2010.

About this article
Cite this article

Carlo Blundo1, Carmen De Maio2,*, Mimmo Parente1, and Luisa Siniscalchi3, Targeted Advertising That Protects the Privacy of SocialNetworks Users, Article number: 11:18 (2021) Cite this article 3 Accesses

Download citation
  • Recived20 July 2020
  • Accepted12 December 2020
  • Published30 April 2021
Share this article

Anyone you share the following link with will be able to read this content:

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords