ArticlesAll Issue
ArticlesPrivacy-Aware Retrieval of Electronic Medical Records by Fuzzy Keyword Search
• Chunxia Jia1 , Chunyan Jia2 , Lingzhen Kong3, *, Wenmin Lin4 , and Lianyong Qi5

Human-centric Computing and Information Sciences volume 12, Article number: 41 (2022)
https://doi.org/10.22967/HCIS.2022.12.041

Abstract

With the rapid development of information technology, the application of information systems has gradually penetrated into all levels of the healthcare field, which has great significance to the clinical management of hospitals and patient information sharing. The electronic medical records emerged in this situation and have become an important part of the hospital’s information construction. The electronic medical record is a health and medical electronic file for patients, which includes patients’ common information and some medical activities such as examination, diagnosis, and treatment. Therefore, the electrical medical record is a summary of clinical practice, as well as a legal basis for exploring the laws of disease and handling medical disputes. However, the generation of a large number of electronic medical records also causes an information overload problem, which puts a heavy burden on the medical staff for searching and retrieving relative medical records. Meanwhile, in the process of retrieving patients’ case histories, the privacy preservation issue cannot be overlooked. To obtain more accurate retrieval results and protect patients’ privacy rights, we have put forward a privacy-aware retrieval approach of case history by applying a fuzzy keyword search (referred to as PRCHkeywords) in this paper. Through the text description mining of case history and the fuzzy keyword search method, the PRCHkeywords approach can get accurate retrieval results. Moreover, the PRCHkeywords approach can protect people’s privacy well by using the Simhash technology. Finally, a set of experiments are presented to show the effectiveness and efficiency of the PRCHkeywords approach.

Keywords

Electronic Medical Records Retrieval, Fuzzy Keywords Search, Privacy Protection, Simhash

Introduction

With the increasing development of electronic technology, traditional paper-driven medical systems have been converting to efficient electronic records that can be easily checked and transmitted [13]. Moreover, the application of information systems has gradually penetrated across all levels of the healthcare field, which has great significance to the clinical management of hospitals and patient information sharing [4-6]. The electronic medical records have come to the fore in this situation, becoming an important part of the hospital’s information construction.
In patients’ electronic medical records, medical personnel record the occurrence, development, and outcome of patients’ diseases . The electronic medical records also include common information such as the name, age, and sex of patients, and include some medical activities such as examination, diagnosis, and treatment of patients. Therefore, the electrical medical record is a summary of clinical practice, as well as a legal basis for exploring the laws of disease and handling medical disputes, while case history plays an important role in medical treatment, prevention, teaching, scientific research, and hospital management.
However, the generation of a large number of electronic medical records also causes information overload problem [7, 8], which puts a heavy burden on medical staff for searching and retrieving relative medical records. In the process of retrieving target electronic medical records, there are three main challenges as follows. (1) Existing research work related to a keyword search tends to neglect the synonymy and word inflection problem, so the final recommendation results are often unsatisfactory for people. In order to improve experiment results, keyword search methods need to consider the synonymy and word inflection problem. (2) In the healthcare field, case histories of patients often involve their private and sensitive information. For example, case histories contain patients’ common information, diagnosis and therapy information, but patients do not want their privacy information to be disclosed. Since only a few keyword research efforts take the privacy protection problem into real consideration, the issue of how to protect patients’ privacy when ensuring the accuracy and efficiency of retrieval results is presented as an urgent problem to be addressed accordingly. Considering the above drawbacks, we put forward a privacy-aware retrieval approach of case history by the fuzzy keyword search approach (referred to as PRCHkeywords). The PRCHkeywords method releases people from the heavy burden of searching and selecting appropriate case histories through the text description mining of case history and the fuzzy keyword search method. In addition, due to the properties of Simhash technology, which can transform patients’ information into indices, the PRCHkeywords method can preserve patients’ privacy within the case histories in the process of retrieval and recommendation.
All in all, the contributions of this paper are three-fold as follows:
(1) We transform the professional keywords into more general text document and call the transformed results as text description. Moreover, to improve the accuracy of retrieval results, the proposed PRCHkeywords approach takes synonymy and word inflection into account.
(2) We introduce the Simhash technology into this paper. Through Simhash, the PRCHkeywords method can protect patients’ privacy in the retrieval and recommendation process.
(3) Finally, a case study is presented to confirm the validity of our proposed PRCHkeywords approach.
Then, Section 2 illustrates related work, and Section 3 motivates our paper. Then, Section 4 describes the proposed privacy-aware retrieval approach of case history by the fuzzy keyword search approach (referred to as PRCHkeywords) in detail. Thereafter, Section 5 makes use of a case study to verify the validity of our proposed PRCHkeywords approach, while Section 6 summarizes the PRCHkeywords approach and examines future work.

Related Work

In relation to information overload, retrieval methods and recommendation strategies have emerged to relieve the burden on people for searching and selecting useful information. At the same time, privacy protection technology has also been in the spotlight and widely discussed. In this section, we will introduce current related research efforts.

Retrieval Technology
In recent years, the application of retrieval technology is ever more extensive. With the development of deep learning and machine learning technology, it is very popular to use these technologies for retrieval and recommendation [9, 10]. For example, the authors in [11] have made use of BERT (bidirectional encoder representations from transformers) to research ad hoc document retrieval and address the length inconsistency problem between documents. Similar to [11], the authors in [12] also utilize BERT to solve retrieval problem. Additionally, in [13], according to the automatic query expansion approach, the authors make some improvement through utilizing cuckoo search and fuzzy logic. Then, the authors in [14] have combined automatic speech recognition (ASR), deep neural networks (DNN), hidden Markov models (HMM), and Gaussian mixture models (GMM) technologies to calculate the documents’ order of relevance for ranking them.
Subsequently, the authors in [15] said that a lot of deep learning models have been put forward by researchers. Therefore, they compare document retrieval methods to other search tasks and point out research agendas or directions for the future. As shown in [16], the retrieval form is getting ever more complex and retrieval work is significant for people. Accordingly, in [17], the authors have surveyed the development of neural ranking model and information retrieval, and in [18], the authors put forward the LazyBM approach along with experiment results showing that the LazyBM approach performs better. Additionally, the authors in [19] organize documents by using logo-based identification model and adopt a two-stage optimization to get the desired results.

Privacy-Preservation Technology
Preserving patients’ privacy is significant because their case histories often involve sensitive information. For example, case histories contain patients’ common information and diagnosis & therapy information. Thus, we should take into account the privacy-preservation issue in the retrieval process, and a variety of existing work devotes more attention to the privacy problem. The authors in [20] have pointed out that privacy information is vital for users, and utilize machine learning technology such as support vector machine (SVM), k-nearest neighbors (KNN), logistic regression (LR) to solve the privacy leakage problem. Then, Lecuyer et al. [21] has put forward a PixeIDP approach based on differential privacy, while the approach proposed in the study is appropriate for a large dataset and its effectiveness has been proved through experiment results. In [2224], the authors work to address the issue of privacy disclosure through the hashing technique. Then, in [25] and [26], differential privacy technology is widely utilized to achieve the privacy preservation goal when integrating multi-party data.
Subsequently, the authors in [27] pay attention to the privacy problem in people’s healthcare data and have achieved high privacy goals for people’s healthcare data through the access control model proposed in this paper. In relation to this, Tripathy et al. [28] have strived in earnest to hide private citizen’s data and guarantee the availability of original data in the meantime. Thus, they have executed a randomized mechanism through adversarial-trained neural networks. Additionally, in [29], the authors have improved the searchable symmetric encryption (SSE) technique and then put forward the Khons method to improve encrypted data.
Based on this, many other researchers have contributed to privacy protection. For example, Xu et al. [30] point out privacy would be leaked when people utilize generative adversarial network (GAN). In order to solve this problem, the authors have put forward a differentially private GAN and called it as GANobfuscator. Similar to [30], Wu et al. [31] have studied the generalization of GAN from the privacy-preservation aspect. The authors in [32] and [33] have considered the privacy-preserving problem related to medical data and use a large data to explore.
Through the above-mentioned research and analysis of related work, we can know that many scholars have made great achievements in the document retrieval and privacy protection accordingly, and conclude that there are some drawbacks in these research works. Therefore, we have put forward a privacy-aware retrieval approach of case history by the fuzzy keyword search method (i.e., PRCHkeywords). The PRCHkeywords approach can get accurate retrieval results for people through the text description mining of case histories and the fuzzy keyword search method. Additionally, due to the nature of Simhash technology, the PRCHkeywords approach can protect patients’ privacy well.

Motivation

In this section, we illustrate the motivation of our paper through an example in Fig. 1. From Fig. 1, we can see that a user inputs the following keywords to retrieve needed case history from electronic medical records. The keywords are respiratory system (symptoms of cough and asthma), circulatory system (symptoms of high blood pressure and palpitations), nervous system (symptoms of insomnia and disturbance of consciousness), and musculoskeletal system: symptoms of limb muscle numbness and dyskinesia. Thus, through the four keywords and above retrieval process, the user can obtain target case history.

Fig. 1. Keywords-driven retrieval of electronic medical records.

However, in the retrieval process of case histories, there are three problems to be solved:
(1) Users often fail to get their target case histories because some keywords in the healthcare field are professional terms. (2) Existing research work related to keyword search [3436] tends to overlook the synonymy and word inflection problem, so the final recommendation results are often unsatisfactory for people. (3) Case histories often involve sensitive information such as patients’ common information, and diagnosis & therapy information. Patients do not want their privacy information to be disclosed, but few keyword research efforts take privacy protection problem into real consideration [37, 38], which leads to serious issues for people.
Considering such drawbacks, we have put forward a privacy-aware retrieval approach of case history by the fuzzy keyword search approach (i.e., PRCHkeywords). The PRCHkeywords approach releases people from the heavy burden of searching and selecting process. In addition, due to the properties of Simhash technology, which can transform sensitive information into indices, the PRCHkeywords method can preserve private users’ information well. The next section introduces the proposed PRCHkeywords approach in detail.

A Privacy-Aware Retrieval of Case History by Fuzzy Keywords Search: PRCHkeywords

Based on the investigation in the previous sections, we have put forward a privacy-aware retrieval approach of case history by the fuzzy keyword search approach (i.e., PRCHkeywords). As shown in Fig. 2, the PRCHkeywords approach includes three steps in total as follows. Firstly, through analyzing the phrases input by users, we convert these text input into vectors. Meanwhile, we transform the keywords of case histories into vectors as well. Secondly, in this step we employ the Simhash technique to find similar case histories. Lastly, through calculating the keywords’ similarity and selecting case histories, users will obtain their target case histories. In the remainder of Section 4, the operation details of these steps are then introduced.

Fig. 2. Three steps of PRCHkeywords approach.

Step 1

In order to find their target case histories, users often input a more general text document such as paragraphs, phrases, and sentences. Then, through digging up the potential information of text document, we transform each text document into a vector based on natural language processing approaches [39, 40]. In the word embedding process, we have employed the fastText [41] to transform text documents into vectors. The fastText is a word embedding technique improved from the Word2Vector. It is convenient for us to make use of the fastText technique because it is a trained model. Here, we utilize tds to represent the text descriptions, and it includes n paragraphs or sentences ($K_{tds}$ = {$k_1$, …, $k_n$}). We have used Vtds to indicate vectors that we get through the fastText technique, and $V_{tds}$ = {$v_1$, …, $v_n$}. As we all know, each case history contains several keywords to describe its content information. Therefore, we use Kdb to indicate the keywords of case histories, and Kdb = {$k_1$, …, $k_n$}. Then, we employ the fastText technique to transform $K_{db}$ into vectors $V_{db}$, and $V_{db}$ = {$v_{db1}$, …, $v_{dbn}$}, $V_{tds}$ $\subset$ $V_{db}$. In order to illustrate and understand the PRCHkeywords approach, the symbols we utilized in this paper are shown in Table 1.

Table 1. Symbols definition
 Symbol Definition $K_{tds}$ All keywords of the text descriptions $V_{tds}$ A set of vectors of the text descriptions $K_{db}$ All keywords of the document database $V_{db}$ A set of vectors of the document database ch A case history db The document database tds The text descriptions $S_{ch}$ A set of possible selection case histories $S_{qk}$ A set of optimal query keywords RL The recommendation list

Step 2
In this step, we try to find similar case histories according to the Simhash technique. The main idea is as follows. If the keywords of a certain case history are similar to the keywords that people input, the retrieved results are satisfactory to people. Additionally, for case histories that are not similar, the Simhash technique will not release their information. Therefore, the Simhash technology can protect private patients’ information well. In particularly, the Simhash technique includes two steps as follows.

Step 2.1 Building hash indices for patients’ case histories and the text documents
Here, we utilize Fig. 3. to show this process with ch indicating a case history, db indicating the document database, and representing the keywords of all case histories. Accordingly, we generate which are r-dimension vector and each element in Vdb is 0 or 1. Here, We set r is equal to 5, and thereby conclude that Then chi forms a matrix that are T based on the Equation (1).

Fig. 3. Building hash indices: an example.

Then, through replacing 0 with -1, we can acquire a new matrix $h_2$(${ch}_i$).

(1)

Next, we sum up each column of h2(${ch}_i$), so we get matrix h3(${ch}_i$). After the above operation process, we can obtain H(${ch}_i$) which are hash indices of the case history ${ch}_i$. When Step 2.1 is finished, we have built hash indices for case histories and the text documents. Since hash indices hide the original data in the case histories, it protects patients’ privacy. Take note that the indices H(tds) of tds are different from $V_{tds}$.

Step 2.2 Building a group of possible selections case histories for people
In this sub-step, we measure the distance between H(tds) of text descriptions’ hash indices and hash indices H(${ch}_i$) of each case history according to the distance measurement tool of the Hamming Distance. Accordingly, we use Dis(H(tds), H(${ch}_i$)) to indicate the distance between H(tds) and H(${ch}_i$). Then, through the following Equation (2), Dis(H(tds), H(${ch}_i$)) is calculated.
Here, blt is calculated through Equation (3) and its value is a Boolean value. In Equation (3), the symbol “⊕” represents the XOR operation.

(2)

(3)

When the keywords of case histories are similar to the people’s query keywords, we will return the corresponding case histories to people. Since a problem to be considered and solved is how to measure the similarity degree, we introduce Equation (4) to address this issue. As shown in Equation (4), the text description tdsti is supposed to be similar to the case history chi when their Hamming distance is smaller than ⌈r/2⌉. Then, chi can be seen as the possible selection case history to people, and we add chi into the chi set.

(4)

The above two sub-steps complete the Simhash operation and eventually find a set of similar case histories. The Simhash technology realizes the goal of privacy protection when finding a set of similar case histories, and Algorithm 1 shows the pseudocode of this process.

Step 3
In the Step 2, we have obtained the $S_{ch}$ set through a series of calculating and selecting process. The documents in the Sch set include query keywords Moreover, the process of the calculation of keyword similarity and selection of satisfied keyword is introduced.

(5)

Here, we employ cosine similarity to calculate the similarity Sameji of $V_{dbj}$ and Vtdsi in Equation (5). qki represents the case history that are recommended to users and is more similar to the users’ query keywords.

(6)

Next, we use a new set Sqk to store all satisfied keywords, and $S_{qk}$ = {${qk}_1$, …, ${qk}_1$}, and then we traverse $V_{db}$ = {$v_{db1}$, …, $v_{dbn}$}. Eventually, we utilize the RL set to store all satisfied and necessary case histories. Accordingly, Algorithm 2 shows the pseudocode of this process.
After executing the above three steps, we can return the target case histories to users for helping them choose appropriate case histories.

Experiments

Experimental Settings

In this section, we conduct a range of experiments to prove the effectiveness and efficiency of the PRCHkeywords approach, as well as handle the inaccuracy and privacy leakage problem in electronic medical records retrieval. As for the data that we use in this experiment, we construct a dataset as we have not found the suitable real-world dataset. Specifically, we simulate a medical staff’s input document, which contains a text phrase, sentence, or paragraph that depict their target electronic medical records, and we also simulate 20 pieces of electronic medical records. Through the fastText technique, we analyze the input text descriptions and electronic medical records, and then convert them into 256-dimesion, 512-dimension, 1024-dimension vectors, respectively. To evaluate the PRCHkeywords approach more easily, the following two main indicators are utilized to measure the validity and efficiency of our proposed approach as follows.
(1) Accuracy measurement: in the verification set, we make use of the number of retrieval results that meet the medical staff’s satisfaction divided by the total number of the verification set as the accuracy measurement.
(2) Time cost: represents the time that PRCHkeywords approach takes to produce retrieval electronic medical records for medical staff.
Moreover, we compare the PRCHkeywords approach with three relative methods as follows: the Random approach, Euclidean distance method, and Manhattan distance method, respectively. Additionally, the experiments are conducted on a Dell Laptop with 2.40 GHz and 4.0 GB RAM, running on Windows 10 and Python version 3.7.

Results and Analyses

Profile 1. Retrieval accuracy w.r.t. text vector dimensions
In our proposed PRCHkeywords approach, we allow medical staff to input simple and flexible description text documents rather than professional and rigid keywords. Meanwhile, we also have considered the synonymy and word inflections problem through the fastText technique as mentioned above. Through fastText, we have converted text document and electronic medical records into vectors. Therefore, in this profile, we have taken the text vector dimensions as an independent variable to evaluate the retrieval accuracy. Figs. 4–7 show the experimental results.

Fig. 4. Accuracy under different dimensions and same recommend numbers-1.

Fig. 5. Accuracy under different dimensions and same recommend numbers-3.

Fig. 6. Accuracy under different dimensions and same recommend numbers-5.

Fig. 7. Accuracy under different dimensions and same recommend numbers-10.

As shown in Figs. 4–7, the x-axis represents the dimensions of the text vector converted by the fastText technique, and the y-axis represents the retrieval accuracy. In this experiment, we set the text dimensions at 256, 512, and 1024, respectively. Meanwhile, recommended numbers are set at 1, 3, 5, and 10. From the above four figures, we can see that the accuracy of our proposed PRCHkeywords approach is the highest, and the random method is the lowest. We also have found that the more retrieval results return to the user, the more accurate the prediction results get.

Profile 2. Retrieval accuracy w.r.t. recommend numbers
In this profile, we have measured the retrieval accuracy among the PRCHkeywords approach, Random approach, Euclidean distance approach, and Manhattan distance approach with respect to the recommended numbers. Recommended numbers are set at 1, 5, 10, and 20, respectively. Figs. 8–10 show the results.

Fig. 8. Accuracy under same dimensions-256 and different recommend numbers.

Fig. 9. Accuracy under same dimensions-512 and different recommend numbers.

Fig. 10. Accuracy under same dimensions-1024 and different recommend numbers.

As shown in Figs. 8–10, the x-axis represents the numbers of electronic medical records documents recommended to users, and the y-axis represents the retrieval accuracy. In these experiments, the total recommended numbers changes from 1 to 20. From the above three figures, we find that the accuracy of our proposed PRCHkeywords approach is the highest. Through comparing the three pictures, we can see that the dimensions of the text vector have little effect on the retrieval accuracy, which can also be found from profile 1.

Conclusion

Because many existing keywords-driven documents retrieval methods rely on a precise keywords-matching technique, people often fail to obtain satisfied retrieved results. In addition, traditional retrieval operations are susceptible to leaking sensitive people’s information such as patients’ diagnosis and therapy information. Based on the above challenges, we have put forward a privacy-aware retrieval approach of case history by the fuzzy keyword search method (i.e., PRCHkeywords). Through the text description mining of case histories and the fuzzy keyword search method, the PRCHkeywords approach can get accurate retrieval results for people. Moreover, the PRCHkeywords approach can protect people’s privacy well by using the Simhash technology. Moreover, in this paper we utilize a case study to confirm the validity of the PRCHkeywords approach.
In our future research work, we will further refine PRCHkeywords approach by taking into consideration the time factor since time is crucial to most of the information system applications [4245]. In addition, time efficiency is of practical significance towards user query and search [46–49]. Therefore, we will consider the time cost and computational load to further optimize the efficiency of PRCHkeywords.

Author’s Contributions

Conceptualization, CXJ, CYJ, LQ. Funding acquisition, WL. -Investigation and methodology, LK. -Writing of the original draft, CXJ. Writing of the review and editing, LK. Validation, CXJ. Formal analysis, LQ. Data curation, CYJ. Visualization, LK.

Funding

This research was supported by the Natural Science Foundation of Zhejiang Province (No. LQ21F020021), Research Start-up Project funded by Hangzhou Normal University (No. 2020QD2035).

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Name : Chunxia Jia
Affiliation : Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang, China
Biography : Chunxia Jia received her master's degree from Qingdao Ocean University in 2011, majoring in Computer Science and Technology. She is now a lecturer in the Information Technology Teaching Center of Weifang University of Science and Technology, China. Her research direction is computer application.

Name : Chunyan Jia
Affiliation : Shouguang People’s Hospital, Weifang, China
Biography : Chunyan Jia received her bachelor's degree from Shandong University in 2012. Currently, she is the head nurse of the out-patient operating room of Shouguang People's Hospital, China. Her research interests are advanced medicine and healthcare.

Name : Abolfazl Mehbodniya
Affiliation : Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha Area, 7th Ring Road, Kuwait.
Biography : Dr. Mehbodniya is an associate professor and head of ECE department at Kuwait College of Science and Technology (KCST). Before coming to KCST, he worked as a Marie-Curie senior research Fellow at university college Dublin, Ireland and prior to that he worked as an assistant professor at Tohoku University, Japan and as a research scientist in advanced telecommunication research (ATR) international, Kyoto, Japan. DrMehbodniya received her PhD from INRS-EMT University of Quebec, Montreal, Canada in 2010.

Name : Lingzhen Kong
Affiliation : School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Biography : Lingzhen Kong received her bachelor's degree from Zaozhuang University in 2019. Currently, she is pursuing her master degree PhD degree in Nanjing University of Science and Technology, China. Her research interest is healthy big data.

Name : Wenmin Lin
Affiliation : Institute of VR and Intelligent Systems, Hangzhou Normal University, Hangzhou, China
Biography : Wenmin Lin received her PhD degree from Nanjing University, China, in 2014. She is currently a lecturer of Institute of VR and Intelligent Systems, Hangzhou Normal University, China. Her research interest are big data security and privacy protection.

References

[1] X. Hu, S. Peng, B. Guo, and P. Xu, “Accurate AM-FM signal demodulation and separation using nonparametric regularization method,” Signal Processsing, vol. 186, article no. 108131, 2021 https://doi.org/10.1016/j.sigpro.2021.108131
[2] L. Qi, H. Song, X. Zhang, G. Srivastava, X. Xu, and S. Yu, “Compatibility-aware web API recommendation for mashup creation via textual description mining,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 17, no. 15, article no. 20, 2021. https://doi.org/10.1145/3417293
[3] X. Hu, S. Peng, and W. L. Hwang, “EMD revisited: a new understanding of the envelope and resolving the mode-mixing problem in AM-FM signals,” IEEE Transactions on Signal Processing, vol. 60, no. 3, pp. 1075-1086, 2012.
[4] K. K. Singh and A. Singh, “Diagnosis of COVID-19 from chest X-ray images using wavelets-based depthwise convolution network,” Big Data Mining and Analytics, vol. 4, no. 2, pp. 84-93, 2021.
[5] F. Hao and D. S. Park, “CoNavigator: a framework of FCA-based novel coronavirus COVID-19 domain knowledge navigation,” Human-centric Computing and Information Sciences, vol. 11, article no. 6, 2021. https://doi.org/10.22967/HCIS.2021.11.006
[6] N. Yuvaraj, K. Srihari, S. Chandragandhi, R. A. Raja, G. Dhiman, and A. Kaur, “Analysis of protein-ligand interactions of SARS-CoV-2 against selective drug using deep neural networks,” Big Data Mining and Analytics, vol. 4, no. 2, pp. 76-83, 2021.
[7] Y. Liu, Z. Song, X. Xu, W. Rafique, X. Zhang, J. Shen, M. R. Khosravi, and L. Qi, “Bidirectional GRU networks-based next POI category prediction for healthcare,” International Journal of Intelligent Systems, vol. 37, no. 7, pp. 4020-4040, 2021.
[8] L. Qi, Q. He, F. Chen, X. Zhang, W. Dou, and Q. Ni, “Data-driven web APIs recommendation for building web applications,” IEEE Transactions on Big Data, vol. 8, no. 3, pp. 685-698, 2022.
[9] J. Pang, Y. Huang, Z. Xie, J. Li, and Z. Cai, “Collaborative city digital twin for the COVID-19 pandemic: a federated learning solution,” Tsinghua Science and Technology, vol. 26, no. 5, pp. 759-771, 2021.
[10] Y. Xu, L. Qi, W. Dou, and J. Yu, “Privacy-preserving and scalable service recommendation based on SimHash in a distributed cloud environment,” Complexity, vol. 2017, article no. 3437854, 2017. https://doi.org/10.1155/2017/3437854
[11] W. Yang, H. Zhang, and J. Lin, “Simple applications of BERT for ad hoc document retrieval,” 2019 [Online]. Available: https://arxiv.org/abs/1903.10972.
[12] J. Li, A. M. V. V. Sai, X. Cheng, W. Cheng, Z. Tian, and Y. Li, “Sampling-based approximate skyline query in sensor equipped IoT networks,” Tsinghua Science and Technology, vol. 26, no. 2, pp. 219-229, 2021.
[13] D. K. Sharma, R. Pamula, and D. S. Chauhan, “A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system,” Journal of Ambient Intelligence and Humanized Computing, 2019. https://doi.org/10.1007/s12652-019-01247-9
[14] A. Gupta and D. Yadav, “A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing,” Multimedia Tools and Applications, vol. 80, no. 14, pp. 22209-22229, 2021.
[15] S. Zhang, H. Liu, J. He, S. Han, and X. Du, “Deep sequential model for anchor recommendation on live streaming platforms,” Big Data Mining and Analytics, vol. 4, no. 3, pp. 173-182, 2021.
[16] X. Li, X. Yin, and K, Li, “An improved model of document retrieval efficiency based on information theory,” Journal of Physics: Conference Series, vol. 1848, article no. 012094, 2021. https://doi.org/10.1088/1742-6596/1848/1/012094
[17] K. Raveendra, T. Karthikeyan, V. Rajendran, and P. V. N. Reddy, “A novel two-stage optimized model for logo-based document image retrieval based on a soft computing framework,” Soft Computing, vol. 25, no. 2, pp. 963-972, 2021.
[18] O. Khattab, M. Hammoud, and T. Elsayed, “Finding the best of both worlds: faster and more robust top-k document retrieval,” in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 2020, pp. 1031-1040.
[19] J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, and X. Cheng, “A deep look into neural ranking models for information retrieval,” Information Processing & Management, vol. 57, no. 6, article no. 102067, 2020. https://doi.org/10.1016/j.ipm.2019.102067
[20] M. Narksenee and K. Sripanidkulchai, “Can we trust privacy policy: privacy policy classification using machine learning,” in Preceedings of 2019 2nd International Conference of Intelligent Robotic and Control Engineering (IRCE), Singapore, 2019, pp. 133-137.
[21] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana, “Certified robustness to adversarial examples with differential privacy,” in Proceedings of 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, 2019, pp. 656-672.
[22] J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, and X. Cheng, “A deep look into neural ranking models for information retrieval,” Information Processing & Management, vol. 57, no. 6, article no. 102067, 2020. https://doi.org/10.1007/s11280-021-00941-z
[23] L. Qi, C. Hu, X. Zhang, M. R. Khosravi, S. Sharma, S. Pang, and T. Wang, “Privacy-aware data fusion and prediction with spatial-temporal context for smart city industrial environment,” IEEE Transactions on Industrial Informatics, vol. 17, no. 6, pp. 4159-4167, 2021.
[24] Y. Khazbak, J. Fan, S. Zhu, and G. Cao, “Preserving personalized location privacy in ride-hailing service,” Tsinghua Science and Technology, vol. 25, no. 6, pp. 743-757, 2020.
[25] Z. Cai and X. Zheng, “A private and efficient mechanism for data uploading in smart cyber-physical systems,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 2, pp. 766-775, 2020.
[26] X. Zheng and Z. Cai, “Privacy-preserved data sharing towards multiple parties in industrial IoTs,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 5, pp. 968-979, 2020.
[27] B. P. Prince and S. P. J. Lovesum, “Privacy enforced access control model for secured data handling in cloud-based pervasive health care system,” SN Computer Science, vol. 1, no. 5, article no. 239, 2020. https://doi.org/10.1007/s42979-020-00246-4
[28] A. Tripathy, Y. Wang, and P. Ishwar, “Privacy-preserving adversarial networks,” in Proceedings of 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, 2019, pp. 495-505.
[29] J. Li, Y. Huang, Y. Wei, S. Lv, Z. Liu, C. Dong, and W. Lou, “Searchable symmetric encryption with forward search privacy,” IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 1, pp. 460-474, 2021.
[30] C. Xu, J. Ren, D. Zhang, Y. Zhang, Z. Qin, and K. Ren, “GANobfuscator: mitigating information leakage under GAN via differential privacy,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 9, pp. 2358-2371, 2019.
[31] J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, and X. Cheng, “A deep look into neural ranking models for information retrieval,” Information Processing & Management, vol. 57, no. 6, article no. 102067, 2020. https://doi.org/10.1016/j.ipm.2019.102067
[32] J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, and X. Cheng, “A deep look into neural ranking models for information retrieval,” Information Processing & Management, vol. 57, no. 6, article no. 102067, 2020. https://doi.org/10.1016/j.ipm.2019.102067
[33] J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, and X. Cheng, “A deep look into neural ranking models for information retrieval,” Information Processing & Management, vol. 57, no. 6, article no. 102067, 2020. https://doi.org/10.1016/j.ipm.2019.102067
[34] B. Wu, S. Zhao, C. Chen, H. Xu, L. Wang, X. Zhang, G. Sun, and J. Zhou, “Generalization in generative adversarial networks: a novel perspective from privacy protection,” 2019 [Online]. Available: https://arxiv.org/abs/1908.07882.
[35] M. Joseph, J. Mao, S. Neel, and A. Roth, “The role of interactivity in local differential privacy,” in Proceedings of 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), Baltimore, MD, 2019, pp. 94-105.
[36] Y. J. Park and D. D. Shin, “Contextualizing privacy on health-related use of information technology,” Computers in Human Behavior, vol. 105, article no. 106204, 2020. https://doi.org/10.1016/j.chb.2019.106204
[37] L. Qi, Q. He, F. Chen, W. Dou, S. Wan, X. Zhang, and X. Xu, “Finding all you need: web APIs recommendation in web of things through keywords search,” IEEE Transactions on Computational Social Systems, vol. 6, no. 5, pp. 1063-1072, 2019.
[38] N. Bhardwaj and P. Sharma, “An advanced uncertainty measure using fuzzy soft sets: application to decision-making problems,” Big Data Mining and Analytics, vol. 4, no. 2, pp. 94-103, 2021.
[39] W. Gong, C. Lv, Y. Duan, Z. Liu, M. R. Khosravi, L. Qi, and W. Dou, “Keywords-driven web APIs group recommendation for automatic app service creation process,” Software: Practice and Experience, vol. 51, no. 11, pp. 2337-2354, 2021.
[40] H. Kou, H. Liu, Y. Duan, W. Gong, Y. Xu, X. Xu, and L. Qi, “Building trust/distrust relationships on signed social service network through privacy-aware link prediction process,” Applied Soft Computing, vol. 100, article no. 106942, 2021. https://doi.org/10.1016/j.asoc.2020.106942
[41] Z. Sun, H. Liu, C. Yan, and R. An, “Natural disasters warning for enterprises through fuzzy keywords search,” Tsinghua Science and Technology, vol. 26, no. 4, pp. 558-564, 2021.
[42] G. Zhou, Z. Xie, Z. Yu, and J. X. Huang, “DFM: a parameter-shared deep fused model for knowledge base question answering,” Information Sciences, vol. 547, pp. 103-118, 2021.
[43] A. Sur and J. R. Birge, “Asymptotic behavior of solutions: an application to stochastic NLP,” Mathematical Programming, vol. 191, pp. 281-306, 2022.
[44] R. Luo, X. Tan, R. Wang, T. Qin, J. Li, S. Zhao, E. Chen, and T. Y. Liu, “Lightspeech: lightweight and fast text to speech with neural architecture search,” in Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021, pp. 5699-5703.
[45] X. Xia, F. Chen, Q. He, J. Grundy, M. Abdelrazek, and H. Jin, “Online collaborative data caching in edge computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 2, pp. 281-294, 2021.
[46] X. Zhou, Y. Li, and W. Liang, "CNN-RNN based intelligent recommendation for online medical pre-diagnosis support," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 18, no. 3, pp. 912-921, 2021.
[47] R. Luo, X. Tan, R. Wang, T. Qin, J. Li, S. Zhao, E. Chen, and T. Y. Liu, “Lightspeech: lightweight and fast text to speech with neural architecture search,” in Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021, pp. 5699-5703.
[48] X. Zhou, W. Liang, S. Shimizu, J. Ma, and Q. Jin, “Siamese neural network based few-shot learning for anomaly detection in industrial cyber-physical systems,” IEEE Transactions on Industrial Informatics, vol. 17, no. 8, pp. 5790-5798, 2021.
[49] X. Xia, F. Chen, Q. He, G. Cui, J. C. Grundy, M. Abdelrazek, X. Xu, and H. Jin, “Data, user and power allocations for caching in multi-access edge computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 5, pp. 1144-1155, 2022.
[50] X. Zhou, X. Xu, W. Liang, Z. Zeng, and Z. Yan, “Deep-learning-enhanced multitarget detection for end–edge–cloud surveillance in smart IoT,” IEEE Internet of Things Journal, vol. 8, no. 16, pp. 12588-12596, 2021.
[51] Q. He, G. Cui, X. Zhang, F. Chen, S. Deng, H. Jin, Y. Li, and Y. Yang, “A game-theoretical approach for user allocation in edge computing environment,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 3, pp. 515-529, 2020.
[52] X. Zhou, X. Yang, J. Ma, and K. I. K. Wang, “Energy-efficient smart routing based on link correlation mining for wireless edge computing in IoT,” IEEE Internet of Things Journal, vol. 9, no. 16, pp. 14988-14997, 2022.

Chunxia Jia1 , Chunyan Jia2 , Lingzhen Kong3, *, Wenmin Lin4 , and Lianyong Qi5, Privacy-Aware Retrieval of Electronic Medical Records by Fuzzy Keyword Search, Article number: 12:41 (2022) Cite this article 1 Accesses