ArticlesAll Issue
ArticlesSensor-Based Human Activity Recognition in Smart Homes Using Depthwise Separable Convolutions
• Daniyal Alghazzawi1,*, Osama Rabie1, Omaima Bamasaq2, Aiiad Albeshri2, and Muhammad Zubair Asghar3

Human-centric Computing and Information Sciences volume 12, Article number: 50 (2022)
https://doi.org/10.22967/HCIS.2022.12.050

Abstract

The recent enhancement of computerized electronic gadgets has led to the acceptance of smart home sensing applications, stimulating a need for related services and products. As a result, the ever-increasing volume of data necessitates the application of advanced deep learning to the automated identification of human activity. Over the years, several deep learning models that learn to categorize human activities have been proposed, and several experts have used convolutional neural networks. To tackle the human activity recognition (HAR) problem in smart homes, we suggest employing a depthwise separable convolution neural network (DS-CNN). Instead of standard 2D convolution layers, the network uses depth-wise separable convolution layers. DS-CNN is a fantastic performer, particularly with limited datasets. DS-CNN also minimizes the number of trainable parameters while improving learning efficiency by using a compact network. We tested our technique on benchmark HAR-based smart home datasets, and the findings reveal that it outperforms the current state of the art. This study shows that using depthwise separable convolutions significantly improves performance (accuracy=92.960, precision=91.6, recall=90, F-score=93) compared to classical CNN and baseline methods.

Keywords

Human Activity Recognition, Smart Homes, Depthwise Separable Convolutions, Sensors

Introduction

Human activity recognition (HAR) has been a vibrant and demanding study topic in recent decades, owing to its application to many active and assisted living (AAL) sectors as well as the growing need for smart homes [1]. As a result of the growing need for HAR in terms of safety and medicine, particularly for the aged and child care, it has emerged as a notable topic in recent times [2]. The smart home platform controls the integrated lighting, heating, power, and all household elements, but it could also identify the activities of all home inhabitants. In addition, it can use machine learning techniques to figure out what people do and what they need, then make decisions and set up the right equipment and services for them.

Research Motivation
On the other hand, the performance of these approaches depends on the efficiency and effectiveness of manual feature extraction. Furthermore, these techniques are only capable of extracting shallower features. Because of such shortcomings, classic pattern recognition-based behavior identification techniques have limited predictive performance and model generalization. Identifying human activity may be thought of as a significant visual processing issue. Random forests, logistic regression, decision trees, support vector machines (SVM), and other machine learning algorithms have been used to classify human activities in labs with some controlled conditions and a small amount of labeled data [37].
Deep learning has advanced rapidly in recent years, attracting a large number of research projects, particularly in computer vision, time-series processing, computational linguistics, objective reasoning, and other diverse data processing aspects, and has achieved remarkable results [8]. Unlike classic behavior identification methods, deep learning has the potential to minimize feature design effort. It can also be used to learn higher-level and even more essential characteristics with the end-to-end neural net. However, several deep learning models are costly and time-consuming in their evaluation steps, making them unsuitable for HAR in smart home environments due to energy constraints. For example, convolutional neural network (CNN) ability to extract features from images in the earlier domains is good. To better collect image information and enhance classification performance, many research approaches use CNNs in place of classic classification methods. HAR classification for lightweight applications (mobile-based) is difficult since many deep neural networks' evaluation processes are time-consuming and costly [7]. To deal with this problem, we show how to use a depthwise separable convolution (DSC) technique in a CNN-based model for HAR [9]. We present a CNN-based model for HAR that employs a DSC technique [9] to address this issue. DSC was proposed for the first time in [10] and is now widely used in image processing for classification tasks [11, 12]. A factorized variant of the conventional convolution is the DSC. A typical convolution is divided into depthwise and 1x1 pointwise convolutions. Rather than providing every filter to all of the input channels as in conventional convolution, the depthwise convolution layer distributes one filter towards one input signal and then employs a 1x1 pointwise convolution to aggregate the depthwise convolution outcomes. DSC reduces the number of parameters that can be trained and the cost of testing and training.

Problem Statement
This study presents a system for HAR in smart homes based on depth-separable convolutions and smartphone sensor data. Given a human activity image HAR=har1,har2,har3,...,harn as input, the purpose is to construct a model that recognizes and allocates a label (walking upstairs, walking downstairs, walking, sitting, lying) to the 3D human activity sequence. The task may be described as follows: provided a 3D human activity sequence as an input, the classification model must determine the series of events accomplished using kinematic data from human activities performed in an intelligent home environment. We used depth-separable convolution layers instead of the standard 2D convolution layers to build the model efficiently with fewer learnable parameters.

Research Questions
This paper uses a depthwise separable convolution neural network (DS-CNN) approach based on deep learning neural networks to achieve speedy and precise recognition results by utilizing the softmax activation function. This study aims to categorize human activities in a smart home scenario. We are looking for answers to the following research questions shown in Table 1.

Research Contributions
The following is a summary of the study's key contributions:

We present an efficient sensor-based framework for HAR in smart homes using a CNN-based deep learning model with a DSC method.

The proposed multi-layer network can be trained quickly with approximately 50% fewer parameters.

We use DSC units to process sensor-based input data sequences by utilizing their embedded memory configurations and benefit from their power advantage.

In order to solve overfitting, we use a dropout measure. During the training period, we turn off some neurons at random.

The proposed approach is tested on a publicly available dataset, and the results show that it outperforms the current state-of-the-art in terms of recognition.

We evaluate the proposed approach by comparing it to the state-of-the-art study using a publicly available dataset of UCI-HAL sensor data.

The following is how the paper is structured. Following the review of similar research in Section 2, Section 3 explores the theoretical foundation, Section 4 elaborates on the experimental approach, and Section 5 gives the results and discussions. Section 6 provides the conclusions of the work and the study's future scope.

Table 1. Research questions for investigation
 Research question Motivation RQ1. How does the depthwise separable convolution neural network (DS-CNN) effectively classify human activities in a smart home context? Examine the proposed deep neural network model, known as the DS-CNN, to understand how it might be used to classify human activities in the context of a smart home. RQ2. In terms of different performance evaluation metrics, what is the proposed approach's efficiency in contrast to the classical CNN model? Evaluate the recommended deep learning model, called DS-CNN, and see how well it can detect human actions in a smart home environment by employing various evaluation metrics like precision, recall, F1-score, and accuracy. RQ3. What is the efficiency of the proposed strategy in comparison to baseline techniques? In comparison to baseline tests, evaluate the efficacy of the suggested deep learning model, depthwise separable convolution, in classifying human activities in a smart home setting using multiple assessment metrics such as precision, recall, F1-score, and accuracy.

Related Work

Activity detection methods have been investigated a lot in the past. This section will discuss some studies that focus on the smart home environment.

Sensor-based HAR
Before developing techniques capable of detecting activities in real-time, researchers focused on offline procedures that depended on static data sets, in which all data was recorded and then evaluated [12]. Because of how quickly smart homes and other technologies have become popular, much research and development have been done on how to set up a HAR system [13]. Hong and Nugent [14] focused on sensory data segregation to extract each chunk of sequential sensor events associated with a particular action. Toileting, bathing, leaving the house, going to bed, and meal preparation are all detected. They provide three approaches to sensory channel segmentation: a site approach, a design approach, and a dominating-centered design approach. All three techniques performed admirably in terms of separation and activity classification. Nazerfard and Cook [15] introduced an activity-prediction model based on probabilistic Bayesian networks and novel multiple inference approaches for detecting the next activity and the related start time. Chen et al. [16] tried to determine the historical likelihood of an action occurring over a particular period to reduce the margin of error of a classification system. Numerous behavioral models, including frequency map augmentation and stochastic mixing, were evaluated. Tian et al. [17] conducted a time-space feature significance evaluation to assess the value of features for action detection and classification. The feature importance was determined using random forests, naive Bayes, and SVM. Zhang et al. [18] presented an action prediction model for determining activity durations. This method uses a regression tree model to predict actions, superior to linear regression and SVM classifiers. To avoid time-consuming manual labeling, Liu et al. [19] displayed movies in mid-level chunks, where each patch correlates to an action-related conversation. Movement perception is used to identify suitable motion areas to collect mid-level patching more accurately and quickly. According to numerous tests on the MPII cookware library, fine-grained action recognition is better achieved with the suggested strategy. The amplifier was designed using a memory polynomial model presented by Shi et al. [20]. The fingerprint properties of the output signals are derived from the power spectrum characteristics of the signals. It is, therefore, possible to lower the dimensions of the high-dimensional characteristics. Finally, the classifier is employed to identify the amplifier. The experiments' findings suggest that the signal's nonlinear properties may be used to identify the individual sonar transmitters. With this technique, the underwater acoustic sensor networks (UASNs) communication security may be improved. Mathematical multipliers, division operators, subtraction operators, and addition operators are all used by Abualigah et al. [21] to create a novel meta-heuristic approach called the arithmetic optimization algorithm (AOA). Optimization procedures may be performed using AOA's mathematical ideas and implementation in a vast scope of problem domains. Compared to other well-known optimizers, experiments show that the AOA is very good at solving hard optimization problems. This population-based optimization approach, named the Aquila optimizer (AO), was introduced by Abualigah et al. [22] and was motivated by the natural behavior of the Aquila when grabbing its prey. The new AO technique outperforms other well-known meta-heuristic algorithms in experiments. Sharma et al. [23] present a novel framework for the software-defined network to provide robust home automation scenarios (SHSec). By embracing the classic software-defined network paradigm, SHSec aims to provide a scalable generalized architecture and the flexibility of open-source service elements for user-friendly home automation. The suggested model accurately predicts hostile activities with an efficiency of 89.9% and a sensitivity of 91.1%. Park et al. [24] investigated the cognition Internet of Things (CIoT) and presented a CIoT-based smart urban system (CIoT-Net) design that addresses sustainability and adaptability issues. They considered using machine learning, artificial intelligence, and big data to achieve the suggested design. In the end, they discussed potential research difficulties and prospects. When it comes to cyberattacks that might influence the smart home system, Sapalo Sicato et al. [25] have developed a taxonomy to highlight some of the important difficulties with VPNFilter malware that forms part of a large-scale IoT-based botnet malware attack. This paper aims to provide an effective work management system and information about the VPNFilter malware attack. Thanigaivelan et al. [26] offered a concept model for an authentication scheme influenced by human biomechanics. The communication device ecosystem, which includes smartphones, laptops, sensor systems, and desktops, is modeled after the biological neural network. When the new human bio-inspired authentication process is put into use, it will be able to react quickly to major threats without the help of other parts of the system. Van Slyke and Belanger [27] provide a unique viewpoint on how humans and safety objects interplay to enable and restrict cybersecurity. The approach uses Pickering's mangle of practice metaphor to describe human-artifact interaction in data security. There could be a lot of different information security technologies and ways of behaving that could benefit from this point of view.

HAR in Smart Homes using Deep Learning Techniques
As an alternative to such approaches, deep learning algorithms [14] for the action detection problem have been developed, as they are a popular choice for efficiently organizing sequence information. Because of this, earlier research has focused on developing hybrid systems that incorporate both periodic probabilistic models and deep learning techniques. The generic HAR architecture for smartphone wearable sensors proposed in [4] is based on time-series domain-specific long short-term memory (LSTM) networks. The effect of integrating various forms of cellular telemetry data is examined by comparing several benchmark LSTM networks. Furthermore, a multilayer CNN-LSTM composite LSTM network is suggested to increase detection accuracy. The findings demonstrate that the designed multilayer CNN-LSTM network works well in activity recognition, with the recognition rate increasing by up to 2.24% compared to earlier state-of-the-art methods. Mehr et al. [3] proposed a wearable, sensor-based model to recognize activities and their transitions. Compared to most current equivalent models using public HAPT data, the proposed approach (CNN+BILSTM) improves classification accuracy by 95.87% and transition detection accuracy by more than 80%. Jethanandani et al. [13] created the classifier chains approach to address the complex problem of cross-activity detection. Four separate classes are employed as base learners in this multi-label classification approach. Experiments demonstrate that the method efficiently addresses the difficulties inherent in such a complicated assignment. Liciotti et al. [1] employed LSTM to classify spatiotemporal patterns collected by smart home sensors. Using the Center for Advanced Studies in Adaptive Systems datasets, the suggested LSTM-based techniques outperform existing deep learning and machine learning algorithms. The system proposed by [28] has four hidden levels using a pre-trained layer-by-layer technique. In this case, a backpropagation network and conjugate gradient (CG) are used. The findings of the deep learning model were compared to those of the hidden Markov model and naive Bayes. Skocir et al. [12] conducted research on activity detection in a smart home environment. Data is collected from several simple sensors to determine if the door is still open or locked. Two strategies are proposed for detecting movement: one based on a sliding window and another on machine learning. Gumaei et al. [8] at the University of Bristol have created a hybrid deep learning approach that combines statistical recurrent units (SRUs) with gated recurrent units (GRUs). According to the test results, the suggested methods for detecting human activity outperform current state-of-the-art methods.

Depthwise Separable Network Techniques
Using 3D ResNet as a basic model, Zhou et al. [29] proposed a depthwise separable network (DSN) to recognize human activities using UCF101 and HMDB51 datasets. The results show that adding depthwise convolution to the proposed DSN decreases baseline attributes and increases accuracy. For coronavirus disease 2019 (COVID-19) identification and classification, Le et al. [30] introduced a unique AI-infused DS-CNN with deep SVM. The DS-CNN model produced the best results in binary and multiclass classification, with 98.54% accuracy in binary and 95.06% accuracy in multiclass classification. Thu and Han [9] presented a low-energy consumption solution for HAR that uses sensing devices in the health and monitoring domains. The proposed approach uses depth-separable convolution rather than regular convolution to reduce computing costs. The test results show that the proposed model outperforms the classic convolution strategy. Anju and Kavitha [31] propose separable CNNs for improved activity classification using security camera data. It investigates the optimization of the proposed separable CNN. Various performance evaluation measures are employed in this case. A stochastic gradient descent (SGD)-trained separable CNN model achieved 94% accuracy on test data.

Research Gap
However, several machine and deep learning models are costly and time-consuming in their assessment steps due to energy constraints, rendering them unsuitable for HAR in a smart home context. To solve this problem, we describe a CNN-based model for HAR that uses a depth-separable convolution approach.

Basic Concepts

Classical Convolution
The classical convolution is also known as the "typical convolution" in deep learning. The essential processes of conventional convolution are shown in Fig. 1. Classical convolution layers include two sequences of operations: firstly, a depthwise convolution layer screens the input; secondly, a 1x1 (or pointwise) convolution layer merges the screened values to produce new features. Each output feature map is the aggregate of the input feature maps convoluted by the respective kernel in a basic convolutional layer with input and output channels.

Fig. 1. Workflow process of multi-omics fusion.

Depthwise and Pointwise Convolutions The distinction between classic and depth-wise separable and the multilayered characteristics is illustrated in Fig. 2.

Fig. 2. Depthwise separable convolution.

As shown in Fig. 2, the depthwise (DW) and pointwise (PW) convolutions are combined to produce a "depthwise separable" convolutional frame. The depthwise separable convolutional frame performs a similar function to classical convolution, though significantly quicker. There is no pooling layer between the frames in the presented approach since they are depthwise detachable. However, a few depthwise layers include a step of 2 to decrease the spatial dimensionality. In such a scenario, the next pointwise layer also contains the set of output channels. The essential procedures of depthwise convolution and depthwise separable convolution are demonstrated in Fig. 2. Unlike traditional convolution, depthwise convolution produces just one outcome feature map from a specific input feature map twisted by a singular convolution kernel [30].

Methodology

This section describes the methodology of the proposed technique (Fig. 3). The primary goal of our approach is to develop an effective deep learning method for distinguishing human actions in a smart home scenario. As input, the system gets a stream of source data from several sensors, and the output is the activity name or code.

Fig. 3. Block diagram of the proposed system.

HAR Data Collection
For HAR implementations, researchers used a variety of datasets in a smart home context [1]. Due to the high cost, time, and difficulty obtaining real-world data, publicly available datasets are essential for academic researchers. These are also important for evaluating HAR methods and providing a comparative standard. The HAR dataset from the University of California, Irvine (UCI) is one of the most well-known publicly available datasets in the HAR domain [18]. Thirty volunteers, ranging in age from 19 to 48, wore the smartphone around their waist and collected data using the built-in accelerometer and gyroscope. Each individual completed six tasks: standing, sitting, walking, downstairs, upstairs, and lying down. Sensor data was collected using the combined triaxial values of the smartphone's accelerometer, and gyroscope as each participant performed the six pre-programmed workouts. At a constant rate of 50 Hz, triaxial linear acceleration and angular velocity data were obtained. The sensor data was communicated to the linked smartphone and recorded in the system storage of the phone in the CSV (comma-separated values) format with the given label identifier [32]. Table 2 gives an in-depth summary of the UCI-HAR dataset and its elements.

Table 2. Count of occurrences in the dataset for each activity
 Activity code Activity Description Instances WKU Walking upstairs Person involved ascends the steps 1,544 WKD Walking downstairs Taking a walk (downstairs): Person descends the steps 1,406 WAK Walking Walking: In a frontal position, the person moves horizontally ahead. 1,722 SIT Sitting A person takes a seat in a chair. 1,777 STA Standing The person remains still. 1,906 LAY Laying Sleeping or lying down is undertaken by one of the participants. 1,944

Preprocessing
The preprocessing stage included data cleaning and fusion from the UCI-HAL dataset. The input attributes were derived from the raw data acquired from multiple smart home sensors (e.g., M, D, and T). The data aggregation process was designed to capture all modifications in sensor readings throughout the time interval between the start and end of human activity [1]. To remove undesirable noise fluctuations induced by air resistance, a third-order reduced Butterworth filter with a threshold voltage of 20 Hz was utilized to denoise. Because 99% of energy is stored at frequencies less than 15 Hz, it was appropriate for monitoring body acceleration [32]. To remove undesirable noise fluctuations induced by air resistance, a third-order reduced Butterworth filter with a threshold voltage of 20 Hz was utilized to denoise. The dataset contains 10,299 instances divided into two sections: training and testing. The former comprises 7,352 specimens (71.39%), whereas the latter has 2,947 remaining examples (28.61%). The dataset is unbalanced. In this study, we used the F1-score to determine the efficacy of the DS-CNN model, as accuracy alone is inefficient for evaluation and valid assessment [4]. The sensor data was captured in 2.56 seconds sliding windows with a 50% overlap between frames. The following are some of the reasons why this step size and overlap composition were chosen. At least three things are true: (1) the average person walks between 90 and 130 steps per minute [4], or 1.5 steps per second, (2) each window evaluation requires at least one walking session (two steps), and (3) this procedure can benefit individuals with slower speeds, such as the elderly or those with disabilities. According to the research, the minimum speed is a percentage of a regular person's pace [4].

Applying Depthwise Separable Convolution Network for HAR
This study employs depth-separable convolutions. We propose using depth-separable convolutions to replace the comparatively expensive convolutional layers in visual recognition applications. Convolution with depthwise-separable weights reduces weight values and computational complexity. This section will discuss the approach used to generate the DSC. The explanation starts with a look at the math and the essential parts of the techniques, then moves on to a more in-depth look at how to use sensors to detect human activity in a smart home. We can elaborate on the computations as follows.
The two elements of the DSC are depthwise and pointwise convolutional. Equation (1) expresses the depthwise convolution. It applies a single filter on all broadcasts of input vectors using depthwise convolution.

(1)

Where m stands for depthwise convolutional kernels of dimension mxmxcin, and cin stands for the number of convolutional kernels. Put the nth filter in M on the nth stream in N to produce the filtered output of a feature vector F.
A pointwise convolution computes the linearly separable grouping of the depthwise convolution result using a 1x1 convolution. Equation (2) expresses a pointwise convolution.

(2)

where 1x1xcinxcout is the size of the 1x1 convolutional kernel, the channel total number in the result feature vector can be changed by changing m. The dense 1x1 convolutional function, like the mxm (m > 1) convolutional functions, has no limitation on being near to the proximity; therefore, it does not necessitate rearranging the configuration in memory. The process is then carried out horizontally using highly efficient basic matrix multiplication methods. Equation (3) expresses a DSC computation:

(3)

It denotes how much the depthwise convolution and 1x1 point-wise convolutional calculations cost. A depth-wise convolution has the following connection weights: Because the value of m is usually relatively large, the proportion n is roughly comparable to 1/k2. Because this work uses 3x3 DWS convolutions, the computation complexity and parameter density of similar convolutions are 7–8 times lower than standard convolutional layers.
How It Works for Classifying Human Activities
The CNN model, including the depthwise convolution layer configuration, constructed using TensorFlow, is shown in Fig. 4. A depthwise convolution layer succeeds by maxpooling plus an additional convolution layer in the proposed method. The design also has a fully linked layer coupled to the Softmax layer. The convolution and max-pool layers would be 1D or periodic.

Fig. 4. Deep learning CNN model with depthwise convolution.

Fig. 4 shows the proposed design for a DS-CNN. They are used as device feeds from the tri-axial accelerometer and tri-axial gyroscope. We used depthwise convolution, which is an excellent way to cut down on the time it takes to run deep neural network calculations. It has a 1x1 convolution output unit and does spatial convolution on each of the input signals one at a time. We used the outcome of the convolution layer as an input to the rectified linear unit (ReLU) activation function, and we used a 1D maximum pool on the outcome. The first convolutional layer's filter size and depth are set to 60, while the pooled layer's filter size is set at 20 with a stride value of 2. The outcome of the subsequent convolution operation is smoothed down for the fully connected layer input, and it has a step of 6. The convolution layer subsequently implements the filters of different sizes after receiving an input from the maxpooling layer, and it has 10% of the complexity of the maxpooling layer. The convolution layer subsequently implements the filters of different sizes after receiving an input from the maxpooling layer, and it has 10% of the complexity of the maxpooling layer. The outcome is then smoothed down for the fully linked layer input. The above architecture states that the fully associated layer has 1,000 neurons. In this layer, nonlinearity is represented by the tanh function. The softmax function is being used to generate probabilities for the target class tags. The SGD optimization technique was used to mitigate the possible log-likelihood cost function. The flattening layer turns the matrix representation of each function map into a vector. Multiple dropouts are added to the top of the pooling layer to make it less likely that the pool will be too big.
A maxpooling layer is introduced to the proposed system to summarize the feature vectors produced by the convolution layers and decrease computational expenses. After lowering the volume of the function mappings, their size must be lowered as well in order for the DS-CNN to function. The last layer is completely linked, followed by a SoftMax layer to detect HAR in a smart home setup. The proposed approach makes use of depth-separable convolutional layers. Our network has 0.106 million trainable parameters, compared to the 0.214 million parameters for the same network employing conventional convolutional layers. We picked this specific DS-CNN due to its flexibility, efficiency in learning, minimal parameter count, and excellent performance on smaller data sets [4]. Algorithm 1 presents the pseudocode instructions of the suggested DS-CNN model for recognizing human activities in a smart home context.

Result and Discussion

This section shows and evaluates the data collected from several experiments designed to answer the research questions. How It Works for Classifying Human Activities
To find an answer to "RQ1. How does the DS-CNN effectively classify human activities in a smart home context?", we will go through the hardware and software that will be utilized to build the suggested system. The parameters for training and performance of the DS-CNN model are thoroughly discussed.

5.1.1 Hardware and software configuration
For experimental procedures, we used an Intel Core i7 CPU, a 1080 GPU, and Windows 10. Python 3.5 was utilized as the programming language. MATLAB 2019b is used to preprocess image encoding, while the PyTorch toolbox in Python 3.5 is used to apply deep learning. In this paper, we downsize the STIF frames created from the dataset to 224x224x3 to employ the pre-trained model. As mentioned in Section 4.1, we validate the proposed system using the given dataset. The dataset is divided into train and test. We set the step size at 0.001 in the first phase, and it drops by a factor of 0.9 every tenth epoch. We set the batch size to sixteen and shuffled the reading. For optimization reasons, the Adam optimizer is run with a momentum value of 0.999. Until 100 epochs have passed, the training pattern will repeat. Table 3 lists the equipment, program, and parameters utilized to train and test the proposed approach.

Table 3. Setups of equipment, programming, and parameters

 Parameter Value Hardware Intel Core i7 CPU,1080 GPU Software Windows 10, MATLAB, Python 3.5 Learning rate at the beginning 0.0011 Factor influencing learning rate decline 0.91 Optimizer Adam

5.1.2 Parameter setting
Table 4 illustrates the architecture of our proposed convolutional neural network for HAR in the smart home. Following each convolutional layer is a ReLU activation function. The final component of each input specifies the channel count (the depth). The first layer's input is sensor data for nine signals in 128-time increments. Using maxpooling layers with a pooling size of (2x1) is the best way to avoid overfitting.
Table 4. Parameter setting for DS-CNN network
 Input parameter Parameter setting Convolution layer (132x9x1) (str=1,pd=2)Conv(3x3x1)x32 Depth wise separable convolution layer (132x9x32) (str=1)DW$S_{Conv}$ (3x3x1)x32,(1x1x32)x32 Maxpooling (126x7x32) Max_pool(2x1) Convolution layer (64x8x32) (str=1)Conv (3x3x32)x64 Depth wise separable convolution layer (64x7x64) (str=1)DWS_Conv (3x3x1)x64,(1x1x64)x64 Max pooling (64x3x64) Max_pool(2x1) Convolution layer (32x4x64) (str=1)Conv (3x3x64)x128 Max pooling (32x2x132) Max_pool(2x1) Flattern (14x2x128) Flattering Fully connected (1668) Connected_fully (1668x100) Activation function (100) softmax (classifier)

5.1.3 Implementation of HAR recognition in smart homes
Table 5 demonstrates the allocation of processing time for a given logical deduction across network layers to implement HAR recognition in smart homes. The network interpretation took 220.4 ms to finish, leaving enough time for extracting features and handling images if there are three interpretations per second.
Table 5. Layer allocation of processing time and classifier evaluation activities
 Layer Execution time (ms) Millions of operations Conv1 83.0 (42.4%) 2.08 (22.1%) 𝐷𝑊𝑆_𝐶𝑜𝑛𝑣1 7.5 (3.5%) 0.15 (1.3%) Pw_Conv1 12.0 (4.7%) 1.20 (11.1%) 𝐷𝑊𝑆_𝐶𝑜𝑛𝑣2 7.3 (3.2%) 0.14 (1.2%) Pw_Conv2 12.0 (4.7%) 1.20 (10.3%) Pooling_avg 0.7 (0.3%) 0.02 (0.2%)

5.1.4 The suggested algorithm's complexity
The suggested DS-CNN algorithm's computational complexity must be determined. First, we must ascertain the computational complexity of a typical convolution [21].
The number of weights in a standard convolution is:

(4)

The kernel size is represented as K_wx$K_H$.
The computational cost of producing output feature maps of size $f_w$x$f_H$ is:

(5)

where $K_w$ and $K_H$ are the spatial dimensions (height and width) of the kernels, $C_{in}$ and C_out are the count of input and output streams, and $f_w$ and $f_H$ are the spatial measurements maps of the outcome.
Complexity of depthwise convolutions: A depthwise convolution layer has the following computing cost:

(6)

The weights and computation complexity are reduced by $C_{in}$ times using depthwise convolution.
As with standard convolution, it reduces the computational complexity by a factor of β (Equation 7).

(7)

(8)

(9)

(10)

In this case, the DS-CNN performs β times fewer computations (in our case, 13 times, as shown in Table 7) than a typical convolutional neural net, which is a significant improvement.

To answer "RQ2. In terms of different performance evaluation metrics, what is the proposed approach's efficiency compared to the classical CNN model?" Based on the gathered datasets, we compared the efficiency of classical CNN models to testing the proposed DS-CNN model for HAR in smart homes.

5.2.1 Experimental results
We ran the test using the UCI HAR dataset to evaluate the effectiveness of the new HAR model [12]. The data collection includes raw cell phone data from an accelerometer's linear and 3-axial rotation velocity with a continuous 50 Hz sample frequency. Six fundamental actions occur in three static positions (stand, sit, and lay) and three dynamic actions (walking downstairs and upstairs). The data overlapped by 2.56 seconds and 50% (128 reads/window) in fixed-length windows. The frames were further input into neural networks as simulated 2D pictures of varying sizes (128x9x1). Performing 50 trials with each algorithm yields the average accuracy. The overall number of multipliers of one frame of data in convolution-based layers for the forward run is presented in Table 6 as the computing cost. The overall number of learnable model parameters is also computed. A contrast between the standard CNN and the depthwise separable convolutional network is presented in Table 6 to assess the effectiveness of the suggested approach. In one laboratory activity, the confusion matrix of a depthwise separable convolution-based HAR model is demonstrated in Table 7.

Table 6. Performance comparison between DS-CNN and classical CNN
 Average accuracy (%) Computational overhead (in millions) Number of parameters Classical CNN 93.621 17.433 214,362 DS-CNN (proposed) 92.96 1.341 106,844

Table 7. Confusion matrix of a depthwise separable convolution-based HAR model
 Output class Target class WKU WKD WAK SIT STA LAY Recognition rate (%) WKU 519 (16.3) 5 (0.2) 2 (1) 1 (0) 89 (6) 4 (0.2) 91.4 (9.1) WKD 18 (0.5) 399 (13.1) 26 (0.9) 2 (0.1) 1 (0) 0 (0) 92.6 (10.6) WAK 22 (1) 11 (0.4) 589 (19.8) 17 (0.7) 4 (0.3) 8 (0.7) 93.6 (4.7) SIT 34 (2.1) 13 (0.4) 42 (2.2) 652 (20.1) 17 (0.8) 2 (0.1) 91.4 (8.2) STA 8 (0.2) 12 (0.4) 58 (1.5) 92 (3.4) 731 (24.4) 14 (0.45) 90.6 (7.3) LAY 16 (0.9) 7 (0.2) 41 (2.1) 86 (2.9) 23 (1) 811 (26.1) 92.2 (5.1) Recognition rate (%) 95.44 (6.3) 94.08 (8.1) 93.31 (2.4) 96.18 (9.3) 95.43 (6.4) 98.41 (5.4) 94.35 (8.9)
Values are presented as number (%).

5.2.2 Comparison of runtime overhead of traditional CNN and the proposed DWS-CNN
Table 6 shows that, as opposed to regular CNN, employing depth-wise separable convolution-based CNN only affects accuracy by 0.6% while drastically reducing the number of computations (13 times) and trainable parameters. The experimental findings and complexity analysis show that the suggested HAR model may be employed with satisfactory accuracy in smart home-based HAR applications.

5.2.3 Cross-validation results for various classifiers
We carried out experimentation on classification models using 10-fold cross-validation. Table 8 shows the average of accuracy, the standard deviation of accuracy, the precision marco average, the precision marco standard deviation, the recall narco average, the recall marco standard deviation, the average F1-score marco, and the standard deviation of the F1-score marco.

Table 8. Classification techniques get cross-validated
 Accuracy Precision macro Recall macro F1-score macro Avg SD Avg SD Avg SD Avg SD CNN 93.621 0.05 89 0.04 90 0.05 88 0.05 DS-CNN 92.96 0.06 86 0.05 89 0.06 87 0.06

5.2.4 Performance evaluation
Equations (11) and (12) compute precision and recall. We used Four assessment indicators and the recognition time to assess the suggested model's performance. The following are the formulas for calculating these four assessment performance measures: Equation (11) is used to figure out each class's precision, and the results are shown in Table 9. Tc1, Tc2, Tc3, Tc4, TC5, and TC6 are the true positives for each class, and P1, P2, P3, P4, P5, and P6 are the precision.

(11)

Table 9. Proposed DS-CNN model' precision for HAR
 Human activity in smart home P1 (WKU) P2 (WKD) P3 (WAK) P4 (SIT) P5 (STA) LAY (P6) Precision (%) 88 100 80 85 90 93

Table 10 illustrates the recalls (R1, R2, R3, R4, R5, and R6) for each class utilizing Equation (12). Tc1, Tc2, Tc3, Tc4, Tc5, and Tc6 are the actual/true classes, whereas Tn1, Tn2, Tn3, Tn4, Tn5, and Tn6 are the associated true positives.

Table 10. Proposed DS-CNN model' recall for HAR
 Human activity in smart home R1 (WKU) R2 (WKD) R3 (WAK) R4 (SIT) R5 (STA) R6 (LAY) Recall (%) 95 91 89 94 84 87
Table 11 shows the performance measures of DS-CNN model for HAR.

(12)

(13)

(14)

(15)

(16)

(17)

(18)

Table 11. Performance measures of DS-CNN model for HAR
 Model Precision Recall F1-measure Macro Micro Macro Micro Macro Micro WKU 0.86 0.87 0.91 0.84 0.8 0.76 WKD 0.82 0.82 0.8 0.79 0.82 0.8 WAK 0.85 0.83 0.81 0.78 0.81 0.81 SIT 0.84 0.82 0.91 0.79 0.84 0.84 STA 0.84 0.82 0.9 0.85 0.82 0.82 LAY 0.87 0.85 0.89 0.88 0.84 0.85
To address "RQ3. What is the efficiency of the proposed strategy compared to baseline techniques?" we compared the efficiency of the baseline on the given dataset to test the suggested DS-CNN model for HAR in a smart home scenario. Additionally, we performed a statistical evaluation to validate the effectiveness of the proposed system.

5.3.1 Comparison with the base line methods
We chose a study published in [9] that used the 10-fold cross-validation technique on the same dataset to compare the current framework to recent research in activity recognition using smart home-based sensor data. For feature extraction, the authors used statistical functions and time-frequency transformations. They also built a framework using the sallow classification approaches—Table 12 displays the sensitivity and F1 score performance comparison. As can be seen, the sensitivity and F1-score findings of the deep SRUs-GRUs model are higher than those of the baseline framework. The suggested platform's capacity to interpret and retain recognition for detecting human activities from multimodal body sensing data is the cause of this enhancement.

Table 12. Performance (%) comparison with baseline work
 Human activity in smart home Thu and Han [9] Proposed framework Precision Recall F-score Precision Recall F-score WKU 88 91 92 91 95 94 WKD 92 91 96 100 91 99 WAK 82 84 89 89 89 91 SIT 85 91 86 87 94 89 STA 86 81 84 90 84 90 LAY 91 84 91 93 87 95

5.3.2 Statistical analysis
Assume the M1 (CNN) and M2 (CNN) models (DS-CNN). Let N stand for the volume of records in the database. e1 represents the standard CNN error rate, while e2 represents the DS-CNN error rate. The key objective is to see if the empirical difference comparing e1 and e2 is statistically essential [29]. It is written in the following form:

(19)

The error ratio variances are as follows: are the confidence levels, which indicate the confidence interval for dt, which is determined by the given formulas.

(20)

We put the correctness, error rate, and accuracy variance of the performance results of CNN and DS-CNN classifiers in the above equation. Table 13 shows the results of the assessment.

We used a two-sided test to see if dt = 0 or dt ≠ 0 in the previous calculations. After plugging the value into the calculation mentioned above, we get a confidence interval for dt at a 95% confidence level. Because the internal spans values are all zero, we can conclude that the discrepancy is not statistically important at a 95% confidence level. The upper-case result is 0.115457, whereas the lower-case result is 0.684542. We may readily say that the difference is statistically significant because the interior spanning values are not zero or less than 0.
Table 13. Findings of the investigation
 DWS-CNN CNN Accuracy 0.91 0.92 Error rate 0.09 0.08 Accuracy difference - 0.01

Threats to External Validity
As indicated in Section 4.1, the suggested methodology was evaluated internally to assure model stability. To confirm the design's robustness, we gathered two different datasets.
Dataset 2. Recordings of 30 people doing daily tasks (ADL) while holding a smartphone with an accelerometer sensor on their waist are used to make the HAR dataset. When the accelerometer and gyroscope data were collected, a noise filter was used to cut down the signal noise. Before sampling the data, it was divided into 2.56-second sliding windows with a 50% overlap (128 measurements per window) [33].
Dataset 3. This dataset includes 19 unique smartphone users and their unique physical characteristics. This data is credible enough to be used in a real-world setting [34].
By comparing it to the datasets supplied in this section, we show that the proposed model for classifying human activities is accurate and efficient. Using accessible datasets, we build classifiers to evaluate the suggested method. Models trained on the primary dataset (Dataset 1) are tested on the two additional datasets (Datasets 2 and 3). Table 14 provides a summary of the findings. It has been shown that our proposed (DS-CNN) model outperformed both the SVM and CNN baselines [28]. The recommended system (DS-SNN) outperformed the SVM and CNN techniques by 82%. The results of this study back up the proposed model and its ability to improve classification accuracy

Table 14. External validation of the proposed method
 Precision Recall Accuracy Dataset 2 SVM 0.77 0.73 0.76 CNN 0.78 0.75 0.78 Proposed 0.81 0.83 0.8 Dataset 3 SVM 0.74 0.8 0.78 CNN 0.76 0.79 0.77 Proposed 0.83 0.81 0.82

Conclusions and Future Work

This paper proposes the DS-CNN for sensor-based HAR. Our proposed method is broken down into three parts: data acquisition, data preprocessing, and DS-CNN model deployment. We evaluate our results with the original convolutional filters on a particular dataset. In addition, we tested the proposed method on two other datasets, which show how well our technique works. Experimental findings show that the proposed system has outperformed existing classical convolutions. The proposed method is also compared with a similar baseline method of work. The results of our study (accuracy=92.960, precision=91.6, recall=90, F1-score=93) show that the proposed strategy has the best overall performance in terms of several different tests.
However, the suggested model has exploitable weaknesses:

There was just one deep learning-based DSW-CNN model in this study.

There were no other deep learning models.

There were embeddings instead of a pre-trained DNN model.

The proposed technique makes use of embedding rather than a pre-trained system.

A variety of data sets from various HAR domains might be used in future research, as could other settings of deep neural networks and the usage of particular pre-trained DNN architectures like ImageNet, Resnet, and other similar models. Moreover, we anticipate that future studies will help to resolve some of the existing complexities in the suggested model, among some of the indicated activities. Therefore, a method capable of effectively tracking an individual's activities, irrespective of how they use their cellphone or physical parameters, might be implemented. Many businesses or people might find this helpful when they want to keep an eye on or predict the activities of a particular person.

Author’s Contributions

Conceptualization, DA. Methodology, DA, OB. Software, AA, MA. Validation, DA, MZA. Formal analysis, DA, AA. Investigation, OB. Resources, MZA. Data curation, DA, AA, OB. Writing—original draft preparation, MZA. Writing—review and editing, MUA. Visualization, MZA, AA. Supervision, DA. Project administration, OB. Funding acquisition, DA.

Funding

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number IFPRC-106-611-2020 and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Name : Daniyal Alghazzawi
Affiliation : Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia
Biography : Daniyal Alghazzawi is a Professor of Cybersecurity at the Computing Information Systems Department and the head of the Information Security Research Group at King Abdulaziz University. He graduated with a Ph.D. in computer science from the University of Kansas in 2007. He served in a variety of administrative and leadership roles and was awarded the Leadership Management International Certificate (LMI). In 2010, he was appointed Honorary Lecturer at the University of Essex. Daniyal has organized both domestic and international seminars and conferences. In the disciplines of smart e­ learning, cybersecurity, and artificial intelligence, he is the author of multiple scholarly papers and patents. He has also served as a reviewer and editor for a number of local and international conferences, journals, workshops, and contests. Daniyal has worked as a consultant for a number of companies, assisting them in developing information security policies and obtaining certifications such as ABET, ISO27001, ISO22301, and others.

Name : Osama Rabie
Affiliation : Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia
Biography : Osama Rabie is an assistant professor of cyberterrorism prevention at the Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University (KAU). In addition, he is the supervisor of KAU’s Data Science Club. He is a reviewer for several journals (e.g. Journal of Information Systems Management, International Journal of Information Management, Journal of the Southern Association for Information Systems) and conferences (e.g. Hawaii International Conference on System Sciences). In addition, he is a Cybersecurity Advisory Board Member at Dar Al-Hekma University. Dr. Rabie is also Unit Chief of Information Technology, Deanship of E-Learning and Distance Education, KAU. His research mainly related to cyberterrorism prevention, Markov decision-making, the use of value theory in information systems, and biomedical ontology.

Name : Omaimah Bamasaq
Affiliation : Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia
Biography : Omaimah Bamasaq is a Professor of Cybersecurity at the Department of Computer Science, FCIT, KAU, Dean of Community Services and Continuing Education at UJ, Visiting Researcher at MIT. She received, Ph.D. in Computer Science, Electronic Information Security, University of Manchester, UK (2006).

Affiliation : Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia
Biography : Aiiad Albeshri received the M.S. and Ph.D. degrees in information technology from the Queensland University of Technology, Brisbane, Australia, in 2007 and 2013, respectively. He has been an Assistant Professor with the Department of Computer Science, King Abdulaziz University, Jeddah, Saudi Arabia, since 2013. His current research interests include security and trust in cloud computing and big data.(Based on document published on 10 March 2021).

Affiliation : Assistant Professor, ICIT, Gomal University, Pakistan
Biography : Dr. Muhammad Zubair Asghar is an HEC approved supervisor recognized by Higher Education Commission (HEC), Pakistan. Ph.D. research includes recent issues in Opinion Mining and Sentiment Analysis, Computational Linguistics and Natural Language Processing. More than50 publications in journals of international repute (JCR and ISI indexed) and having more than 20 years of University teaching and laboratory experience in Social Computing, Text Mining, Computational Linguistics and Opinion Mining and Sentiment Analysis. Currently, he is acting as Reviewer and Academic Editor of different top-tier journals, such as IEEE ACCESS and PLOS ONE. Furthermore, he is also acted as Special Session Chair (Social Computing) at BESC 2018 International Conference (Taiwan) and Lead Guest Editor, Special Issue

References

[1] D. Liciotti, M. Bernardini, L. Romeo, and E. Frontoni, “A sequential deep learning application for recognising human activities in smart homes,” Neurocomputing, vol. 396, pp. 501-513, 2020.
[2] H. D. Mehr, H. Polat, and A. Cetin, “Resident activity recognition in smart homes by using artificial neural networks,” in Proceedings of 2016 4th International Istanbul Smart Grid Congress and Fair (ICSG), Istanbul, Turkey, 2016, pp. 1-5.
[3] V. Bianchi, M. Bassoli, G. Lombardo, P. Fornacciari, M. Mordonini, and I. De Munari, “IoT wearable sensor and deep learning: an integrated approach for personalized human activity recognition in a smart home environment,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8553-8562, 2019.
[4] S. Mekruksavanich and A. Jitpattanakul, “LSTM networks using smartphone data for sensor-based human activity recognition in smart homes,” Sensors, vol. 21, no. 5, article no. 1636, 2021. https://doi.org/10.3390/s21051636
[5] H. Ayaz, M. Ahmad, D. Tormey, I. McLoughlin, and S. Unnikrishnan, “A hybrid deep model for brain tumor classification,” in Proceedings of 2021 International Conference on Medical Imaging and Computer-Aided Diagnosis. Singapore: Springer, 2021, pp. 282-291.
[6] M. Ahmad, A. K. Bashir, A. M. Khan, M. Mazzara, S. Distefano, and S. Sarfraz, “Multi sensor-based implicit user identification,” 2017 [Online]. Available: https://arxiv.org/abs/1706.01739.
[7] S. Latif, Z. e Huma, S. S. Jamal, F. Ahmed, J. Ahmad, A. Zahid, et al., “Intrusion detection framework for the internet of things using a dense random neural network,” IEEE Transactions on Industrial Informatics, vol. 18, no. 9, pp. 6435-6444, 2022.
[8] A. Gumaei, M. M. Hassan, A. Alelaiwi, and H. Alsalman, “A hybrid deep learning model for human activity recognition using multimodal body sensing data,” IEEE Access, vol. 7, pp. 99152-99160, 2019.
[9] N. T. H. Thu and D. S. Han, “Depthwise separable convolution for human activity recognition,” in Proceedings of the Korean Institute of Communications and Information Sciences, 2020, pp. 1196-1197.
[10] L. Sifre, “Rigid-motion scattering for image classification author,” Ph.D. dissertation, CMAP, Ecole Polytechnique, Cedex, France, 2014.
[11] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 2015, pp. 448-456.
[12] P. Skocir, P. Krivic, M. Tomeljak, M. Kusek, and G. Jezic, “Activity detection in smart home environment,” Procedia Computer Science, vol. 96, pp. 672-681, 2016.
[13] M. Jethanandani, A. Sharma, T. Perumal, and J. R. Chang, “Multi-label classification based ensemble learning for human activity recognition in smart home,” Internet of Things, vol. 12, article no. 100324, 2020. https://doi.org/10.1016/j.iot.2020.100324
[14] X. Hong and C. D. Nugent, “Segmenting sensor data for activity monitoring in smart environments,” Personal and Ubiquitous Computing, vol. 17, no. 3, pp. 545-559, 2013.
[15] E. Nazerfard and D. J. Cook, “CRAFFT: an activity prediction model based on Bayesian networks,” Journal of Ambient Intelligence and Humanized Computing, vol. 6, no. 2, pp. 193-205, 2015.
[16] J. Chen, X. Huang, H. Jiang, and X. Miao, “Low-cost and device-free human activity recognition based on hierarchical learning model,” Sensors, vol. 21, no. 7, article no. 2359, 2021. https://doi.org/10.3390/s21072359
[17] Y. Tian, X. Wang, L. Chen, and Z. Liu, “Wearable sensor-based human activity recognition via two-layer diversity-enhanced multiclassifier recognition method,” Sensors, vol. 19, no. 9, article no. 2039, 2019. https://doi.org/10.3390/s19092039
[18] Y. Zhang, Y. Zhang, Z. Zhang, J. Bao, and Y. Song, “Human activity recognition based on time series analysis using U-Net,” 2018 [Online]. Available: https://arxiv.org/abs/1809.08113.
[19] F. Liu, L. Zhao, X. Cheng, Q. Dai, X. Shi, and J. Qiao, “Fine-grained action recognition by motion saliency and mid-level patches,” Applied Sciences, vol. 10, no. 8, article no. 2811, 2020. https://doi.org/10.3390/app10082811
[20] F. Shi, Z. Chen, and X. Cheng, “Behavior modeling and individual recognition of sonar transmitter for secure communication in UASNs,” IEEE Access, vol. 8, pp. 2447-2454, 2019.
[21] L. Abualigah, A. Diabat, S. Mirjalili, M. Abd Elaziz, and A. H. Gandomi, “arithmetic optimization algorithm,” Computer Methods in Applied Mechanics and Engineering, vol. 376, article no. 113609, 2021. https://doi.org/10.1016/j.cma.2020.113609
[22] L. Abualigah, D. Yousri, M. Abd Elaziz, A. A. Ewees, M. A. Al-Qaness, and A. H. Gandomi, “Aquila optimizer: a novel meta-heuristic optimization algorithm,” Computers & Industrial Engineering, vol. 157, article no. 107250, 2021. https://doi.org/10.1016/j.cie.2021.107250
[23] P. K. Sharma, J. H. Park, Y. S. Jeong, and J. H. Park, “SHSec: SDN based secure smart home network architecture for Internet of Things,” Mobile Networks and Applications, vol. 24, no. 3, pp. 913-924, 2019.
[24] J. H. Park, M. M. Salim, J. H. Jo, J. C. S. Sicato, S. Rathore, and J. H. Park, “CIoT-Net: a scalable cognitive IoT based smart city network architecture,” Human-centric Computing and Information Sciences, vol. 9, article no. 29, 2019. https://doi.org/10.1186/s13673-019-0190-9
[25] J. C. Sapalo Sicato, P. K. Sharma, V. Loia, and J. H. Park, “VPNFilter malware analysis on cyber threat in smart home network,” Applied Sciences, vol. 9, no. 13, article no. 2763, 2019. https://doi.org/10.3390/app9132763
[26] N. K. Thanigaivelan, E. Nigussie, S. Virtanen, and J. Isoaho, “Towards human bio-inspired defence mechanism for cyber security,” in Proceedings of 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, 2018, pp. 276-280.
[27] C. Van Slyke and F. Belanger, “Explaining the interactions of humans and artifacts in insider security behaviors: the mangle of practice perspective,” Computers & Security, vol. 99, article no. 102064, 2020. https://doi.org/10.1016/j.cose.2020.102064
[28] H. Fang and C. Hu, “Recognizing human activity in smart home using deep learning algorithm,” in Proceedings of the 33rd Chinese Control Conference, Nanjing, China, 2014, pp. 4716-4720.
[29] S. Zhou, L. Bai, Y. Yang, H. Wang, and K. Fu, “A depthwise separable network for action recognition,” 2019 [Online]. Available: https://doi.org/10.12783/dtcse/cisnrc2019/33352.
[30] D. N. Le, V. S. Parvathy, D. Gupta, A. Khanna, J. J. Rodrigues, and K. Shankar, “IoT enabled depthwise separable convolution neural network with deep support vector machine for COVID-19 diagnosis and classification,” International Journal of Machine Learning and Cybernetics, vol. 12, no. 11, pp. 3235-3248, 2021.
[31] S. S. Anju and K. V. Kavitha, “Separable convolution neural network for abnormal activity detection in surveillance videos,” in Innovative Data Communication Technologies and Application. Singapore: Springer, 2021, pp. 331-346.
[32] S. Balli, E. A. Sagbas, and M. Peker, “Human activity recognition from smart watch sensor data using a hybrid of principal component analysis and random forest algorithm,” Measurement and Control, vol. 52, no. 1-2, pp. 37-45, 2019.
[33] UCI Machine Learning Repository, “Human activity recognition using smartphones data set,” [Online]. Available: https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones.
[34] D. Garcia-Gonzalez, D. Rivero, E. Fernandez-Blanco, and M. R. Luaces, “A public domain dataset for real-life human activity recognition using smartphone sensors,” Sensors, vol. 20, no. 8, article no. 2200, 2020. https://doi.org/10.3390/s20082200

Daniyal Alghazzawi1,*, Osama Rabie1, Omaima Bamasaq2, Aiiad Albeshri2, and Muhammad Zubair Asghar3, Sensor-Based Human Activity Recognition in Smart Homes Using Depthwise Separable Convolutions, Article number: 12:50 (2022) Cite this article 1 Accesses