ArticlesTrustworthy Dynamic Data Awareness Model for Tracking in CPS
With the development of Internet of Things (IoT), the interconnected devices and sensors in cyber-physical systems (CPS) are increasing; these continuously exchange collected data for revealing helpful information about the overall system. In CPS-based monitoring applications, abnormal (including anomalies/outliers) values can lead to severe consequences for skewed judgments. The criteria for determining abnormal values may change over time, making it impossible to detect abnormal in real-time based on a training model or rely on traditional statistical methods to find abnormal values efficiently. When machine learning using, abnormal values in the data set are considered data errors or noise and excluded from analysis for the stability of the results. However, the identified abnormal values contain essential information in some cases, making correct navigation and identifying anomalies even more critical. This paper proposes a Trustworthy Dynamic Data-Awareness (TDD-Awareness) algorithm that extracts the characteristics of continuous sensor data and accurately identifies abnormal values through the subsequent preprocessing process. The TDD-Awareness algorithm extracts the number of generated abnormally, the time of occurrence, and the characteristics and patterns needed to analyze the location of occurrence from the sensor data. The importance of “abnormal values” is determined by effectively exploring the relationship between abnormally to separate containing necessary information.
CPS, Digital Twin, IoT, Preprocessing, Point Anomaly, Contextual Anomaly, Data Awareness, Outlier Detection, Trustworthy
With the advancement of technology, the demand for cyber-physical systems (CPS) has grown significantly over the past few years . CPS is an active area of research in many intelligent domains, such as smart cities, manufacturing, power plants, and traffic control systems. In most cases, CPS has a real-time requirement to detect and process a large number of sensor data in real-time through a monitoring system [2–7]. However, the CPS connected to the complex real world are becoming more complex, so existing monitoring and control methods have limitations in reflecting the current complexity [8–10].
Most sensor data collected for real-world forecasting has the characteristics of time series data, so it has a continuous structure that changes rapidly over time. That can cause abnormal values that deviate from the appropriate level due to instantaneous changes in the data. Also, it can occur with new patterns that differ from the patterns inherent in the time series data [11, 12]. Moreover, the data collected from the Internet of Things (IoT) device may experience abnormal values due to system changes caused by errors or defects in the IoT sensor device. In addition, (systems based on CPS are closely related to the real world), abnormal can occur due to internal system changes due to instant or long-term errors and failures in the device from which the sensor data collecting. Emissions, including various risk factors in the real world, are collected, and stored together by external factors following dynamic changes in real-time . Suppose unstable data is applied in the training and prediction phase without accurate detection and pre-processing of abnormal values. Then it causes serious errors in maintaining sustainable safety. In that case, it is difficult to immediate safety management and diagnosis due to synchronization errors of real-world systems and devices.
The abnormality detection process considering the pattern must perform. Currently, when an abnormality is detected, abnormal values detection is performed according to predefined stability assessment criteria, so there is a limit in estimating emissions following the criteria for each sensor, considering internal and external environmental changes. In addition, abnormal values detection relies only on numerical values of data, and outlier detection, which relies only on machine learning methods, has difficulty detecting hidden characteristics and patterns inside outliers. Also, the noise may cause difficulties in generating meaningful data . For a stable real-world domain and system, proper detection and judgment of outliers is an important step. Still, it is relatively overlooked compared to the prediction step based on CPS and digital twin.
This paper proposes a Trustworthy Dynamic Data-Awareness (TDD-Awareness) algorithm that overcomes the limitations of traditional abnormal values detection based on data characteristics and relationships between data and abnormal values (Fig. 1). The TDD-Awareness algorithm examines cleaned data by dividing it into Point anomaly and Contextual anomaly through the data refining process. After that, the data from the two processes combine to classify the values of the standard part, and this enables abnormalities detection for reliable data generation: (1) abnormal values containing meaningful information in the data; (2) the relationship between data characteristics and abnormal values; (3) independently abnormal values. Then it is possible to diagnose problems that may occur in the domain accurately and solve problems by providing feedback.
Fig. 1. Trustworthy Dynamic Data-Awareness.
As developing IoT sensors, CPS are becoming more critical for the correct prediction of real-world domains. However, since CPS closely connects with the real world, risk factors or non-predictable variables are included in the collected data values. Therefore, it is necessary to perform correct anomaly detection for providing stable data to all steps of CPS and reliable prediction. Various abnormal detection research has been done, from statistical numerical-based machine learning to considering data characteristics and types of outliers. In [15, 16], a supervised learning was used to detect anomalies based on IoT sensor data.The authors of  proposed GAAOD (grid-based approximate average outlier detection) based on the k-nearest neighbor (KNN) algorithm to detect IoT streaming data. GAAOD studies the distribution of value’s distance based on K-th nearest neighbors. In , the authors proposed GILOF (genetic-based incremental local outlier factor), which is available to memorize the distance between past and present values. GILOF solved the distance memorizing problem, so it showed higher anomaly detection accuracy than past research. The detected anomaly is based on optimal distance with studied data value distance. However, it is challenging to detect correct anomaly values if there are not enough neighbors. Also, it is hard to consider correlations between various sensors. Several studies [17–20] proceed anomaly detection based on unsupervised learning method. In , the authors detected anomaly using density-based spatial clustering of applications with noise (DBSCAN) and support vector machine (SVM). After noisy outlier detection based on DBSCAN, apply the SVM method for calculating the distribution density values. Kant and Mahajan  suggested particle swarm optimization (PSO)-aided enhanced K-means. PSO-aided enhanced K-means classified anomaly values using the K-means method and calculating standard deviation based on weight values and weight average of anomaly clustering values. After that, PSO-aided enhanced K-means performed the GRUBBS test with the PSO algorithm by data points, and then the anomaly was detected, which was higher than the GRUBBS test value. Literature [19, 20] used clustering based on the density of values to efficiently detect anomalies from the sensor data. However, the above methods are challenging to consider sensor data’s attributes and anomaly types. They also cannot identify correlation, so it was limited to applying to various sensors.
To complement the limitations of the machine learning algorithm, the authors of [21–30] suggested anomaly detection methods considering characteristics of sensor data and anomaly values and correlation between sensors. Literature  detected anomaly by calculating residual based on the autoregressive integrated moving average (ARIMA) model and processing the Hypothesis testing. Zhou et al.  constructed the ARIMA model according to the sliding window and detected anomalies based on residual values. Literature [21, 22] efficiently detects anomalies according to time flow. However, it is hard to apply to various sensors without identifying the correlation between sensors. Su et al.  suggested multi-cluster feature selection (MCFS) anomaly detection method considering the correlation change of various IoT sensors. MCFS clustered datum based on curve alignment and calculated the correlation values according to window size. After calculating the correlation values, calculating the successive characteristic values within window size based on the correlation values applied in MCFS. MCFS classified data values with independent characteristics, the lowest relevance to standard data, into anomaly values. In [24–26], the authors categorized anomaly patterns into three types (Point anomaly, Contextual anomaly, and collective anomaly), and then described anomaly detection methods on certain types of anomalies.
Literature [27–30] detected anomalies considering types of anomalies. Yu et al.  performed anomaly detection from the data set of CPS. According to tangent error, it predicted error patterns and then categorized them into Point anomaly and Contextual anomaly. Park et al.  suggested that AEDTS (anomalous events detection based on temporal dimension and spatial dimension) detect anomaly values based on real-time sensor data change depending on spatial and temporal neighbors. Considering spatial neighbors, AEDTS can detect anomalies by correlating anomalies and anomaly events. Munir et al.  studied pattern-based anomaly detection, which considers anomaly characteristics and context based on sensor data from the HVAC (heating, ventilation, and air conditioning) system. Pattern-based anomaly detection efficiently identifies the long-term attribute of sensor data by calculating the anomaly importance value. Predictable outliers in data-trends (PODS)  detects anomalies according to continuous data flow data patterns. PODS classified explainable and non-explainable anomalies, available to give feedback on detected anomalies. Performed anomaly detection [29, 30] depends on anomaly and anomaly characteristics; however, they limit considering every anomaly pattern and correlation between sensors simultaneously.
Since various kind of anomaly is included in data values from closely connected with real-world and various IoT sensors, it necessarily conducts anomaly detection considering complex factors. This paper suggests the TDD-Awareness algorithm available to detect anomaly considering data characteristics based on anomalies and the correlation between sensors simultaneously. TDD-Awareness algorithm detects anomalies based on numerical data values and characteristics and correlation with sensors. It then categorizes into significant meaning included anomalies and non-included anomalies for cause analysis, which results in higher accuracy of anomaly detection.
Fig. 2. TDD-Awareness algorithm.
Fig. 2 shows TDD-Awareness algorithm architecture, which constructs two steps:Raw Data Awareness and Trustworthy Awareness. In the Raw Data Awareness step, cleaned data is generated through duplicated, missing, and extreme values. In the Trustworthy Awareness step, anomaly values are detected from the cleaned data. Trustworthy Awareness step constructs of Point anomaly process and Contextual anomaly process. Point anomaly is the process of detecting anomalies based on only data values. Contextual anomaly has two processes of detecting the time-contextual anomaly (T-anomaly), which considers the data time flow, and the spatial-contextual anomaly (S-anomaly) based on sensors correlation. Anomalies from the detecting process of Point anomaly and Contextual anomaly are combined, and the combined anomalies were defined as meaningful anomalies (M-anomaly). M-anomaly is the anomaly that considers anomaly-based of data values and data’s attribute and sensor’s correlation simultaneously. Therefore TDD-Awareness algorithm is available to overcome the limitation of detecting data as an anomaly, including data’s significant meaning, and detect flexible anomaly process according to the correlation between sensors. The TDD-Awareness algorithm is as follows.
With data collected in a real-time and continuous flow, sensor values include missing values because of instantaneous system faults or domain-attached sensors’ external environment factors. Because the data analysis process, including missing values, disrupts the generation of stable data, the missing value treatment considering the data characteristics should be performed. There are various ways of missing value treatment, such as deleting the missing values, filling in the missing values with the representative value of the actual data, and filling in based on the before and after missing values. However, deleting the missing values is difficult to reflect the time flow of data. Also, filling in with the representative value produces data bias and is hard to identify data variability. Therefore TDD-Awareness algorithm deals with missing values by filling in based on the before and after values to efficiently identify the data time flow. TDD-Awareness algorithm uses linear interpolation to process missing values because sensor data is univariate data. There is linear interpolation suitable for univariate data and the MICE, missForest method based on machine learning which is helpful for multivariate data. Linear interpolation estimates missing values considering the back-and-forth of missing value, so the flow and continuity of time series data efficiently reflect.
Since the malfunction of sensor machine or sensor network error, there are values extremely higher or lower than a certain range of data. These values are defined as extreme values which are even outside the standard of outliers. If performing outlier detection with extreme values, processing of extreme values, it occurs errors in setting outlier criteria. So, it needs to remove the extreme values before the outlier detection process to create the correct outlier criteria. TDD-Awareness algorithm set the criteria of extreme value according to interquartile range (IQR) value. IQR is the value defined as Q3–Q1 after dividing data values by Q1 to Q4. Data values higher than Q3 + 3(1.5× IQR) or lower than Q1 – 3 (1.5 × IQR) are defined as extreme values through TDD-Awareness algorithm. If there are extreme values, it is deleted and interpolated to handle extreme values.
With being collected in real time, sensor data includes duplicated values because of momentary system errors. The duplicated values result in mis-output values when calculating data values according to data length. So TDD-Awareness algorithm deletes duplicated values and remain just one value.
Since sensor data is transmitted and stored in real-time, the time interval between collected data values is short. However, the sensor data from the building structure is characterizing by slow progress over a long period; rather than detecting significant fluctuation of data flow in a short period, it is difficult to identify the fluctuation for long-term management of the building structure. TDD-Awareness algorithm readjusts 5 minutes of the data collection time interval into1hour intervals and then reduces data according to mean values by hourly to include correct real-time data flow information. Data reduction is the final process of raw data awareness, and cleaned data generate for the outlier detection process.
Trustworthy awareness is the outlier detection process that detects outliers considering data or anomaly attributes and correlation between sensors based on raw data awareness. TDD-Awareness algorithm divides outliers into two classes, Point anomaly and Contextual anomaly then performs outlier detection. Fig.3 shows a graph when collecting Point anomaly and Contextual anomaly. Point anomaly is the data Point anomaly caused by a transient increase or decrease value. Contextual anomaly is anomaly values that occur continuously, which is categorized into two types, T-anomaly considering time flow and S-anomaly based on the correlation between sensors. Point anomaly is based on data points which is detected according to the z-score value of each data. T-anomaly of Contextual anomaly is derived from ARIMA model residual values according to data flows, and S-anomaly of Contextual anomaly is deduced by the correlation analysis between sensors. Contextual anomaly is the combination of T-anomaly and S-anomaly and then combine with Point anomaly. As a result, by combining Point anomaly and Contextual anomaly, the common values are deduced, which are defined as M-anomaly and non-common values are defined as non-meaningful anomaly (NM-anomaly). Since M-anomaly is anomaly values including explainable data context, so it does not need to remove from data set. But NM-anomaly is changed pattern far from data meaning which is necessary to remove for robust data set. Through trustworthy awareness, the TDD-Awareness algorithm is available to distinguish explainable anomaly and non-explainable anomaly after detecting anomaly and give feedback about being detected anomaly.
Fig. 3. Point anomaly and Contextual anomaly.
Point anomaly detection
Point anomaly detection is the first anomaly detection step of TDD-Awareness algorithm. TDD-Awareness algorithm calculates z-score values which represent how far from the average of data value and data values that z-score values are lower than-1.96 or higher than 1.96 is defined as Point anomaly.
Contextual anomaly detection
After detecting Point anomaly, the second step of TDD-Awareness algorithm is detecting Contextual anomaly. Because Point anomaly focuses only on numerical data values rather than data attributes or sensor correlation, it cannot include anomalies explainable significant data meaning. TDD-Awareness algorithm performs Contextual anomaly steps to complement the limitations of Point anomaly. There are two processes for detecting the Contextual anomaly, the first process is T-anomaly, and the second process is S-anomaly detection. T-anomaly is resulted from residual values according to the ARIMA prediction model which efficiently reflects time series attributes. To build the ARIMA model as follows:
- 1) perform a stationarity test through the Dickey-Fuller test. When the p-value is smaller than 0.05, the data is non-stationary, so it is necessary to be converted into stationary time series data through different processes.
- 2) After that, autocorrelation function (ACF) and partial autocorrelation function (PACF) calculation process to identify the optimal p (lag of the PACF converging to 0), d (number of differences), and q (the lag of ACF converging to 0) for building optimal ARIMA model. The ARIMA model uses optimal p,d,q parameters with the lowest AIC values from ACF and PACF.
- 3) Then prediction values are derived from the ARIMA model.
- 4) After predicting values with the ARIMA model, the residual values are calculated.
- 5) The outside confidence interval of the residual value is defining as T-anomaly.
S-anomaly is detected from the correlation between sensors. TDD-Awareness Algorithm performs Pearson correlation analysis and decide the degree of correlation corresponding a correlation coefficient (α) between sensors. In general, α is the range of -1 ≤ α ≤ 1 and α in the range of -1 ≤ α ≤ -0.3 or 0.3 ≤ α ≤ 1 is judged to have a significant correlation. However, it would happen spurious regression when the non-stationary time series data with seasonality and trend is applied to Pearson correlation analysis based on linear regression equation. Spurious regression is independent time series data relationships even if there is significant correlation between two time series data [31
]. TDD-Awareness algorithm performs Granger causality test based on sensor data significant correlation between two time series data to determine whether there is the spurious regression or not. Granger causality test finds a conditional mean prediction by adding past information of y or x to each variable when analyzing causal relationship through a causal variable x and an outcome variable y. As adding the past information of xto y lag, it is judged that whether the estimation error of lag y is reduced according to x information, or the interpretation of each lag is better in the basis of x information. If y information is explainable through x information, the significant p-value is derived from Granger causality test. Sensors with p-value<0.05 are judged to significant correlation not spurious regression. After that, Pearson correlation analysis is performed based on non-spurious regression sensors according to time and these values are defined as S-anomaly. As TDD-Awareness algorithm combining T-anomaly and S-anomaly, the common values are defined as Contextual anomaly and these values are enabled to detect anomaly based on data attributes and correlation between sensors.
Detecting meaningful anomaly and removing non-meaningful anomaly
The TDD-Awareness algorithm derives the final anomaly, M-anomaly, the common values of Point anomaly, and Contextual anomaly. M-anomaly is the anomaly values considering time-series data attributes and correlation between various sensors among Point anomaly based on the numerical anomaly. NM-anomaly is defined as the changed pattern values which are regardless of correlation context and time-series characteristics of data values. In the case of NM-anomaly, as it might occur danger of producing safety data set and include distorted results when learning and prediction process so, NM-anomaly is necessary to be removed for trustworthy data. As a result, by removing NM-anomaly TDD-Awareness algorithm is available to proceed balanced learning and prediction process in CPS and digital twin, resulting in maintaining a sustainable and stable real-world domain. In addition, M-anomaly from TDD-Awareness algorithm is possible to analyze the reason about changed pattern of collected data for optimized feedbacks on various changeable real-world domain.
TDD-Awareness algorithm experiment is conducted based on sensor data from building construction of sixth basement 10th floors. Sensors consist of 30 cracks, 16 inclinations, two vibrations, and one temperature and humidity, and each of the sensors’ datum is collected from October 2019 to November 2020 in a 5-minute interval. In the case of inclination sensors, inclination sensor values were collected based on the x-axis and the y-axis; two axes are converting into one value based on the polar coordinates for simultaneously identifying variability between two axes. There were two vibration sensors; however, due to under construction around the first vibration sensor section, the TDD-awareness algorithm only used second vibration sensor data. TDD-Awareness algorithm performs the Raw Data Awareness process through raw data and then proceeds Trustworthy awareness process to detect and remove anomaly values from cleaned data by Raw Data Awareness.
Raw Data Awareness Result
Due to real-time changes of sensor networks and the external environment, the raw data including missing, duplicated, and extreme values. Raw Data Awareness processes duplicated values, missing values based on linear interpolation, and extreme values. Raw Data Awareness converts a 5-minute time interval into 1 hour based on average value by the hour for efficiently identifying the long-term fluctuation, and then cleaned data is generating. Fig. 4(a) shows before and after extreme values, and Fig. 4(b) shows before and after missing values process by linear interpolation.
Fig. 4.(a) Before(up) and after(down) processing extreme values.(b) Before(up) and after(down) processing missing values.
Trustworthy Awareness Result
Table 1 shows root mean square error (RMSE) values between data after detecting anomalies and cleaned data. Only Point anomaly RMSE represents the RMSE values of Point anomaly detection result, and TDD-Awareness algorithm RMSE means detecting anomaly result based on Point anomaly and Contextual anomaly. According to Table 1, the RMSE of the TDD-Awareness algorithm considering data and anomaly characteristics with the correlation between various sensors is lower than only Point anomaly detection RMSE.
RMSE of anomaly detection
|Only Point anomaly
Fig. 5 shows the anomaly detection results graph. Fig. 5(a) is based on Point anomaly detection only, and Fig. 5(b) is based on the TDD-Awareness algorithm. The blue graph inside of blue circle represents anomaly detected only considering Point anomaly, and in Fig. 5(b), the blue graph of red circle shows the final anomaly values from TDD-Awareness algorithm process. In the comparison of Fig. 5(a) and 5(b), some of anomaly values are not detected from TDD-Awareness algorithm process even though values are defined as anomaly when Point anomaly process. These values are not defined as anomaly when considering time flows and correlation between sensors, but these are M-anomaly available to explain collected data meaning even though collected data pattern was changed. In Fig. 5(b), the blue graph inside of red circle represents NM-anomaly which are regardless of time flows and correlation between sensors, so it results in distortion data set and needs to remove for trustworthy data. By TDD-Awareness algorithm proceeding appropriate anomaly detection considering diversified situations, the experiment shows that it is more available to produce trustworthy data set the with only removing anomaly values resulting in distortion results of the real world.
Fig. 5.(a) Only Point anomaly detection result graph. (b) TDD-Awareness algorithm anomaly detection result graph.
This section represents the result of long short-term memory (LSTM) prediction to evaluate the stability of the data set generated through preprocessing, anomaly detection and then removing anomaly values. The window size of the LSTM model is 24 for 2 days prediction and then accumulates two hidden layers, which has 24 neurons for each layer according to 24 window size. The activation function is set ReLu function, and the dense is constructed as one for one feature prediction. When compiling the model, the loss is defined as mean square error (MSE), optimizer as Adam. After compiling, in the model fitting process, the batch size is set 128, and the epoch is set 100. Below the table 2 shows the RMSE of LSTM prediction. In Table2, only Point anomaly RMSE represents LSTM prediction process results after removing only Point anomaly values based on crack2 and inclination8(incli8) and TDD-awareness algorithm RMSE represents after removing NM-anomaly from TDD-Awareness algorithm based on crack2 and incli8. Fig.6(a) shows the result of performing LSTM prediction based on the dataset from deleting the Point anomaly. Fig.6(b) shows LSTM prediction result based on dataset from TDD-Awareness algorithm.
Table 2. LSTM prediction for RMSE
Fig. 6.LSTM RMSE with (a) only Point anomaly detection data set and (b) only TDD-Awareness algorithm detection data set.
|Only Point anomaly
As Table 2, when the Point anomaly was removed the LSTM prediction for RMSE of the sensor crack2 was 0.0025, and the RMSE value of the sensor incli8 was 0.0023. Using the trustworthy data set based on the TDD-Awareness algorithm, the LSTM prediction for RMSE value is 0.0015 for crack2, and the RMSE value for the incli8 is 0.0009. It was confirmed that the prediction error rate was lower after prediction through the trustworthy data set from TDD-Awareness algorithm. As a result of the RMSE, when the time-context of the sensor data and the spatial context in which the sensor was built were simultaneously considered and then the anomaly values removal was performed, it was tested that it is possible to create a more stable data set than the method in which the normal anomaly only according to data values was detected. As a result of LSTM prediction, when performing the learning and prediction steps of CPS and digital twin based on the trustworthy data set generated from TDD-Awareness algorithm, it enables stable results without distorted results in the real world. Also, it was confirmed through the prediction performance values that it is possible to realize sustainable feedback and synchronization process through more accurate prediction.
With the developing IoT sensors, the generation of stable data by CPS or digital twin is more important for robust monitoring process, trustworthy prediction, and feedback process. However, CPS or DT closely connects with the complicated and changeable real world; anomaly values collecting due to the risk factors or internal and external environment changes in the real world. Therefore, it is necessary to perform anomaly detection and generate trustworthy data for all the stable processes of CPS or DT. This paper proposes a TDD-Awareness algorithm available to detect anomaly considering sensor data flow and attributes correlation with various sensors. TDD-Awareness algorithm is constructing of Raw Data Awareness and Trustworthy Awareness process. Raw Data Awareness generates cleaned data after dealing with duplicated, missing, and extreme values. Trustworthy Awareness is the anomaly detection process conducted into two processes, Point anomaly, and Contextual anomaly process. After the Point anomaly and Contextual anomaly process, the shared values derive from two processes, and these values are defining as M-anomaly. M-anomaly is anomaly values considering numerical anomaly from Point anomaly and data attributes and correlation with sensors from Contextual anomaly. The TDD-Awareness Algorithm experiment is conducting from the crack, inclination, vibration, temperature, and humid sensors attached to the building structure. The RMSE results from the experiment show that TDD-Awareness algorithm anomaly detection has lower RMSE values than only the Point anomaly detection process. TDD-Awareness algorithm based on data attributes and correlation with sensors is available to analyze the meaning of anomaly through distinguishing anomaly including significant data meaning. The results in higher accuracy of anomaly detection and trustworthy data for CPS and DT constructed in complicated and changeable real world. In the future, more accurate prediction-based anomaly detection and correlation analysis-based outlier detection techniques that can overcome the limitations of time series data should be studied.
Conceptualization, SvK, SuK, YY. Methodology, SvK. Formal analysis, SvK, SuK. Investigation, SvK. Resources, SvK, SuK. Data curation, SuK. Writing—original draft preparation, SvK, SuK. Writing—review and editing, SvK. Visualization, SuK. Supervision, YY. Project administration, SvK. Funding acquisition, YY, SvK. All authors have read and agreed to the published version of the manuscript.
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2018R1D1A1B07047112, NRF-2019R1I1A1A01064054).
The authors declare that they have no competing interests.
Name : Svetlana Kim
Affiliation : Research Professor of IT Engineering, Sookmyung Women's University
Biography : She received her BS and MS degrees in multimedia science from the Sookmyung Woman’s University in 2005 and 2007. She received her PhD in Distribution system at the Sookmyung Woman’s University in 2017. Since 2008. Her research interests are in the area of ubiquitous computing, distributed middleware, mobile computing, MPEG-21, cloud computing, ELearning, N-screen standardization, synchronization, digital communication systems, BigData, Fusion sensors, AI, Edge computing, CPS (Cyber-physical system), situation awareness and Hybrid IoT.
Name : YongIk Yoon
Affiliation : Professor of IT Engineering, Sookmyung Women's University
Biography : He received his BS in Statistics from the Dongguk University in 1983 and MS degree in computer science from Korea Advanced Institute of Science and Technology (KAIST) in 1985. From 1985 to 1997, he served as senior researcher at Electronics and Telecommunications Research Institute (ETRI) in following research projects; the research project of development environment of exchange system, the TDX-10 exchanger development project, the mobile communication development project, and ATM exchange development project. He received his PhD in multimedia science and distribution system from KAIST in 1994. Since 1998, he has been a professor of Sookmyung Woman’s University. His interests include middleware, smart services, IoT, situation awareness, embedded system, ubiquitous computing, distributed system, real-time processing system, real-time OS/DBMS and BigData.
 D. Lee and J. H. Park, “Future trends of AI-based smart systems and services: challenges, opportunities, and solutions,” Journal of Information Processing Systems, vol. 15, no. 4, pp. 717-723, 2019.
 N. Y. Kim, S. Rathore, J. H. Ryu, J. H. Park, and J. H. Park, “A survey on cyber physical system security for IoT: issues, challenges, threats, solutions,” Journal of Information Processing Systems, vol. 14, no. 6, pp. 1361-1384, 2018.
 G. Desogus, E. Quaquero, G. Rubiu, G. Gatto, and C. Perra, “BIM and IoT sensors integration: a framework for consumption and indoor conditions data monitoring of existing buildings,” Sustainability, vol. 13, no. 8, article no. 4496, 2021. https://doi.org/10.3390/su13084496
 J. C. S. Sicato, S. K. Singh, S. Rathore, and J. H. Park, “A comprehensive analyses of intrusion detection system for IoT environment,” Journal of Information Processing Systems, vol. 16, no. 4, pp. 975-990, 2020.
 S. Baek, J. Jeon, B. Jeong, and Y. S. Jeong, “Two-stage hybrid malware detection using deep learning,” Human-Centric Computing and Information Sciences, vol. 11, article no. 27, 2021. https://doi.org/10.22967/HCIS.2021.11.027
 A. Huc and D. Trcek, “Anomaly detection in IoT networks: from architectures to machine learning transparency,” IEEE Access, vol. 9, pp. 60607-60616, 2021.
 J. H. Park, M. M. Salim, J. H. Jo, J. C. S. Sicato, S. Rathore, and J. H. Park, “CIoT-Net: a scalable cognitive IoT based smart city network architecture,” Human-centric Computing and Information Sciences, vol. 9, article no. 29, 2019. https://doi.org/10.1186/s13673-019-0190-9
 A. Bagula, O. Ajayi, and H. Maluleke, “Cyber physical systems dependability using CPS-IoT monitoring,” Sensors, vol. 21, no. 8, article no. 2761, 2021. https://doi.org/10.3390/s21082761
 J. Park and K. Park, “Construction of a remote monitoring system in smart dust environment,” Journal of Information Processing Systems, vol. 16, no. 3, pp. 733-741, 2020.
 R. Al-amri, R. K. Murugesan, M. Man, A. F. Abdulateef, M. A. Al-Sharafi, and A. A. Alkahtani, “A review of machine learning and deep learning techniques for anomaly detection in IoT data,” Applied Sciences, vol. 11, no. 12, article no. 5320, 2021. https://doi.org/10.3390/app11125320
 H. Dai, J. Li, Y. Kuang, J. Liao, Q. Zhang, and Y. Kang, “Multiscale fuzzy entropy and PSO-SVM based fault diagnoses for airborne fuel pumps,” Human-Centric Computing and Information Sciences, vol. 11, article no. 25, 2021. https://doi.org/10.22967/HCIS.2021.11.025
 C. Gao, H. Park, and A. Easwaran, “An anomaly detection framework for digital twin driven cyber-physical systems,” in Proceedings of the ACM/IEEE 12th International Conference on Cyber-Physical Systems, Nashville, TN, 2021, pp. 44-54.
 A. Castellani, S. Schmitt, and S. Squartini, “Real-world anomaly detection by using digital twin systems and weakly supervised learning,” IEEE Transactions on Industrial Informatics, vol. 17, no. 7, pp. 4733-4742, 2020.
 R. Zhu, X. Ji, D. Yu, Z. Tan, L. Zhao, J. Li, and X. Xia, “KNN-based approximate outlier detection algorithm over IoT streaming data,” IEEE Access, vol. 8, pp. 42749-42759, 2020.
 O. Alghushairy, R. Alsini, X. Ma, and T. Soule, “A genetic-based incremental local outlier factor algorithm for efficient data stream processing,” in Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis, Silicon Valley, CA, 2020, pp. 38-49.
 H. SaeediEmadi and S. M. Mazinani, “A novel anomaly detection algorithm using DBSCAN and SVM in wireless sensor networks,” Wireless Personal Communications, vol. 98, pp. 2025-2035, 2018.
 N. Kant and M. Mahajan, “Time-series outlier detection using enhanced k-means in combination with PSO algorithm,” in Engineering Vibration, Communication and Information Processing. Singapore: Springer, 2019, pp. 363-373.
 Q. Yang, J. Singh, and J. Lee, “Isolation-based feature selection for unsupervised outlier detection,” in Proceedings of Annual Conference of the Prognostics and Health Management Society, Scottsdale, AZ, 2019.
 M. Heigl, K. A. Anand, A. Urmann, D. Fiala, M. Schramm, and R. Hable, “On the improvement of the isolation forest algorithm for outlier detection with streaming data,” Electronics, vol. 10, no. 13, article no. 1534, 2021. https://doi.org/10.3390/electronics10131534
 H. N. Akouemo and R. J. Povinelli, “Time series outlier detection and imputation,” in Proceedings of2014 IEEE PES General Meeting| Conference & Exposition, National Harbor, MD, 2014, pp. 1-5.
 Y. Zhou, R. Qin, H. Xu, S. Sadiq, and Y. Yu, “A data quality control method for seafloor observatories: the application of observed time series data in the East China Sea,” Sensors, vol. 18, no. 8, article no. 2628, 2018. https://doi.org/10.3390/s18082628
 S. Su, Y. Sun, X. Gao, J. Qiu, and Z. Tian, “A correlation-change based feature selection method for IoT equipment anomaly detection,” Applied Sciences, vol. 9, no. 3, article no. 437, 2019. https://doi.org/10.3390/app9030437
 A. Blazquez-Garcia, A. Conde, U. Mori, and J. A. Lozano, “A review on outlier/anomaly detection in time series data,” ACM Computing Surveys, vol. 54, no. 3, pp. 1-33, 2021.
 C. U. Carmona, F. X. Aubet, V. Flunkert, and J. Gasthaus, “Neural contextual anomaly detection for time series,” 2021 [Online]. Available: https://arxiv.org/abs/2107.07702
 X. Yu, H. Lu, X. Yang, Y. Chen, H. Song, J. Li, and W. Shi, “An adaptive method based on contextual anomaly detection in Internet of Things through wireless sensor networks,” International Journal of Distributed Sensor Networks, 2020. https://doi.org/10.1177%2F1550147720920478
 S. Park, S. Han, and S. S. Woo, “forecasting error pattern-based anomaly detection in multivariate time series,” in Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track. Cham, Switzerland: 2020, pp. 157-172.
 M. Munir, S. Erkel, A. Dengel, and S. Ahmed, “Pattern-based contextual anomaly detection in HVAC systems,” in Proceedings of 2017 IEEE International Conference on Data Mining Workshops (ICDMW),New Orleans, LA,2017, pp. 1066-1073.
 A. Bessa, J. Freire, T. Dasu, and D. Srivastava, “Effective discovery of meaningful outlier relationships,” ACM Transactions on Data Science, vol. 1, no. 2, pp. 1-33, 2020.
 C. S. Calude and G. Longo, “The deluge of spurious correlations in big data,” Foundations of Science, vol. 22, no. 3, pp. 595-612, 2017.
 J. Fan, F. Han, and H. Liu, “Challenges of big data analysis,” National Science Review, vol. 1, no. 2, pp. 293-314, 2017.
About this article
Cite this article
Svetlana Kim1, Subi Kim2, and YongIk Yoon2,*, Trustworthy Dynamic Data Awareness Model for Tracking in CPS, Article number: 12:13 (2022) Cite this article 1 Accesses
- Recived9 September 2021
- Accepted16 January 2022
- Published30 March 2022
Share this article
Anyone you share the following link with will be able to read this content:
Provided by the Springer Nature SharedIt content-sharing initiative