홈으로ArticlesAll Issue
ArticlesReal-time Facial Expression Recognition via Dense & Squeeze-and-Excitation Blocks
  • Fan-Hsun Tseng1, Yen-Pin Cheng2, Yu Wang2, and Hung-Yue Suen2,*

Human-centric Computing and Information Sciences volume 12, Article number: 39 (2022)
Cite this article 3 Accesses
https://doi.org/10.22967/HCIS.2022.12.039

Abstract

Due to the coronavirus disease 2019 (COVID-19) pandemic, traditional face-to-face courses have been transformed into online and e-learning courses. Although online courses provide flexible teaching and learning in terms of time and place, teachers cannot be fully aware of their students’ individual learning situation and emotional state. The cognition of learning emotion with facial expression recognition has been a vital issue in recent years. To achieve affective computing, the paper presented a fast recognition model for learning emotions through Dense Squeeze-and-Excitation Networks (DSENet), which rapidly recognizes students’ learning emotions, while the proposed real-time online feedback system notifies teacher instantaneously. Firstly, DSENet is trained and validated by an open dataset called Facial Expression Recognition 2013. Then, we collect students’ learning emotions from e-learning classes and apply transfer learning and data augmentation techniques to improve the testing accuracy. The proposed DSENet model and real-time online feedback system aim to realize effective e-learning for any teaching and learning environments, especially in the COVID-19 environment of late.


Keywords

Affective Computing, e-learning, Emotion Recognition, Facial Expression Recognition, Transfer Learning


Introduction

Because of the coronavirus disease 2019 (COVID-19) pandemic worldwide of late, online courses and e-learning classes [1] are popular as teaching and learning mediums [2]. Fortunately, information technologies nowadays can provide a real-time learning environment through internet connections [3]. The essence of e-learning involving information and communication technologies [4] such as the Internet, computer, multimedia [5], courseware, e-mail, online research and discussion, web-based learning, computerized learning, virtual classrooms, and digital collaboration, and VoIP communication [6]. However, compared to the traditional face-to-face teaching environment, for e-learning classes, it is more difficult to interact with students and recognize students’ learning emotions instantaneously. Many researchers and scholars have started to investigate how to recognize learning emotions in e-learning classes [7]. Therefore, the paper aims to propose a model for recognizing students’ learning emotions and a real-time feedback system for e-learning classes.
Facial expression recognition (FER) [8] offers an efficient communication channel between humans. How to recognize facial expressions precisely and securely is still a top research issue [9], whether in the field of computer vision [10] or artificial intelligence [11]. FER can be applied to a wide range of applications, including health care, employment, human-computer interaction [12], and emotion recognition in e-learning education [13]. FER was performed in lab-controlled environment because FER in the wild is challenging until a real-world FER dataset was presented in 2013, i.e., the FER2013 dataset.
Human emotions can be recognized by several input data, e.g., speaking intonation, physiological information, and FER. However, FER has even more difficulty in achieving emotion recognition because sometimes people present different facial expressions for the same emotion. Moreover, an emotion is hard to recognize when dealing with people’s mircoexpressions. Therefore, some researchers applied a neural network (NN) architecture to tackle using FER, such as ResNet [14] and DenseNet [15]. Compared with NN, metaheuristic algorithms require less training time and parameters, such as an Aquila optimizer [16], reptile search algorithm [17], arithmetic optimization algorithm [18], genetic algorithm (GA) [19], and continuous GA [20]. However, it is still difficult to achieve real-time FER by using these methods, and so a lightweight FER model is of vital importance.
In this paper, we propose the Dense Squeeze-and-Excitation Networks (DSENet) for emotion recognition in an e-learning environment. DSENet is composed of a dense block and squeeze-and-excitation block, and aims to achieve fast facial expression recognition by using less training data and lower computational costs. To realize a real-time online feedback system, the DSENet model is implemented on NVIDIA Jetson Nano. When the system recognizes a student’s negative emotion, it asks the student and then notifies the teacher whether the student confirmed his/her negative emotion. As a result, the teacher can offer remedial teaching immediately. The major contributions of this work are listed as follows:
1) With the FER2013 dataset, the emotion recognition accuracy of the proposed DSENet model and the compared ResNet-34 model is 65.03% and 58.07%, respectively. The DSENet model yields a 7% higher accuracy than that of the ResNet-34 model.
2) We collected emotions by the collaborators from 10 students, including three females and seven males. With the collected dataset, the emotion recognition accuracy of the proposed DSENet model and the compared ResNet-34 model is 64.19% and 58.08%, respectively. The DSENet model yields a 6% higher accuracy than that of the ResNet-34 model.
3) The DSENet model applied the weight from FER2013 to implement transfer learning. The proposed DSENet model with transfer learning achieves a 71.18% recognition accuracy, which is 7% higher than without using transfer learning. The rest of the paper is organized as follows. Section 2 discusses the background and related works of facial expression recognition and learning emotion in e-learning. Section 3 introduces the proposed DSENet model and real-time feedback system. Section 4 introduces the experiment settings and discusses experimental results. The conclusion and future work of the paper are presented in Section 5.


Background and Related Works

Facial Expression Recognition
In a traditional face-to-face teaching environment, teachers recognized students’ learning emotions based on their teaching experience. However, it is hard to directly engage and interact with students in an e-learning course. Therefore, researchers have started to investigate how to monitor students’ engagement through FER [21]. FER is popular but still challenging when applied to unseen images [22] or e-learning environments. In [23], the authors mentioned that there are many algorithms for illumination normalization, such as isotropic diffusion-based normalization, discrete cosine transform-based normalization, difference of Gaussian, homomorphic filtering-based normalization. In [24], head pose normalization is used to normalize the input facial image into a frontal image in the data preprocessing stage. The most popular method of frontalizing unconstrained facial images was proposed in [25]. However, the image processing issues still need to be solved. Image data augmentation is an efficient way to compensate for insufficient training data [26]. A convolution neural network is suitable for face recognition and image classification [27], but its training time and data are unsuitable for a real-time e-learning environment. In this paper, a lightweight model as well as DSENet is proposed to achieve an efficient recognition of learning emotions.

Learning Emotion
Basic emotions are inherent to humans and do not need to be learned. In [28], six kinds of basic emotions are given, i.e., enjoyment, sadness, disgust, fear, surprise, and anger. In general, people have a similar emotion while facing the same situation. Beyond the basic emotions, complex emotions [29] need to be learned by humans, and in similar situations, different emotions can be evoked among different cultures or crowds [30]. Learning emotion is a separate emotion category in the teaching and learning field [31]. In [32], six learning emotions are categorized, i.e., frustration, confusion, boredom, flow, gratification, and surprise. A skillful teacher is experienced in recognizing students’ learning emotions in traditional face-to-face classes; however, it is hard to be achieved in e-learning classes. In ] [33], the awareness of learning emotions in e-learning environments is studied. However, the learning emotions are self-reported by students rather than recognition by an automated system. In this paper, students’ learning emotions are recognized with facial expression by the proposed automated system instantaneously.

Facial Expression Recognition in e-Learning
Although e-learning is more flexible than traditional face-to-face classes, it relatively lacks any human interaction between teachers and students [34]. Facial expression is one of the powerful nonverbal communication methods, and contains a lot of emotional messages. In recent years, researchers have considered that a good e-learning environment needs to acknowledge the status of both teachers and students and have an ability to handle and recognize various learning situations [35]. Therefore, the integration of a facial emotion recognition model into the e-learning environment can effectively complement the above-mentioned e-learning face-to-face shortcomings. Using the emotion recognition model in e-learning to identify and observe the emotional states of students allows teachers to effectively grasp the individual learning state of students and immediately make teaching adjustments and remedies as needed. In recent studies, psychologists, educators, and neuroscientists have demonstrated a high correlation between learning emotions and cognitive activity [36]. A learning environment can be created by adding emotion recognition into the e-learning environment, similar to traditional teaching, and make up for the shortcomings of e-learning to achieve better teaching results. However, most of the existing literatures on FER in e-learning have neglected the immediacy of recognition. In this paper, learning emotions are used for classification of facial recognition models, and students’ learning emotions are reported to teachers in real time as a key learning effectiveness indicator in e-learning classes.


Proposed Framework

System Architecture
The system is divided into five tiers, as shown in Fig. 1. The users layer includes students and teacher. The device layer has four components. Jetson Nano is a resource-limited hardware device equipped with a graphics processing unit (GPU). It is utilized to execute the proposed DSENet model for recognizing students’ learning emotions. Students’ facial expressions are captured by a Logitech webcam and displayed on a LCD touch screen. Moreover, the system interface is shown on the LCD touch screen where students can input data and response questions to a teacher’s cellphone. In the application tier, an e-learning environment is implemented with two engines, i.e., facial detection and emotion authentication engines. The network layer enables communication between the teacher and students through the Internet. Then the data layer collects students’ learning emotions and feedback messages.

Fig. 1. System architecture.


The system first starts from the user stage. At this stage, the login interface is displayed on its LCD touch screen and allows a student to enter his/her login information. After logging in, the system will then open the Logitech web camera C270 to continuously shoot and capture the student’s face. The live stream is then transmitted to the facial detection engine in the next stage in the application layer. All engines in this layer are running on the NVIDIA Jetson Nano development board. In the next phase, using the facial detection engine, the system executes face detection, facial feature extraction and emotion recognition functions in sequence. At this time, if the system detects a negative emotion, the system will return to the LCD touch screen and display the emotion authentication interface, allowing the student to confirm the accuracy of a recognized emotion. In the emotion authentication engine, the student provides an emotion feedback. If he/she disagrees with the negative emotion, the system will restart the camera and facial detection engine for the next round. If the student approves the negative emotion, the system will store the negative emotion record and let him/her provide any problem feedback. Then any feedback and emotion estimation are transmitted to the database of the data stage through the Internet at the next layer. In this database, the facial emotion and problem feedback of the student will be recorded coincidently. Finally, the student’s facial emotion and problem feedback is also sent to the teacher’s mobile phone via Wi-Fi.

Proposed Emotion Recognition Model
Existing recognition methods require more training time, and so they are not suitable for real-time interaction between a teacher and students in an e-learning class. This study proposes a model called DSENet for facial emotion recognition, which is shown in Fig. 2. Its architecture is shown in Fig. 2(a). The DSENet model is composed based on three classic architectures of DenseNet, MobileNet, and SE-Inception-ResNet-v1. The structure contains three dense blocks such as Fig. 2(b), two transition layers as shown in Fig. 2(c), and two squeeze-and-excitation (SE) blocks. Compared with large-scale architectures such as DenseNet and ResNet, the proposed DSENet model uses significantly fewer parameters, thus it is more suitable for resource-limited hardware with lower computational complexity.

Fig. 2. Proposed recognition model: (a) DSENet model, (b) Dense block, and (c) transition layer.


Proposed Real-Time Online Feedback System
The process of this system is shown in Fig. 3.

Fig. 3. System flowchart.


After a student presses the start button, the interface is shown in Fig. 4(a), while the system starts with the login screen as shown in Fig. 4(b). After the student logs in to the system, it begins to perform expression recognition, and will immediately send a message to the teacher’s mobile phone to notify that the student is not in good spirits when it detects any fatigue or tiredness. If a negative emotion is detected and maintained for more than three seconds, the system will return the result of emotion recognition to the student, who, at this stage, will confirm the accuracy of the decision. The interface is as shown in Fig. 4(c). If he/she thinks this sentiment recognition result is incorrect, the system will return to facial recognition for the next round of recognition. If the student confirms the emotion recognition is correct, the system will ask if help is needed or not. The interface is shown in Fig. 4(d). On this page, if his/her problem is not listed in the default options, he/she can select one option among the others and leave a message in the next page. The interface is as shown in Fig. 4(e). If the problem corresponds to one of the default options, it can be submitted directly after selecting the proper option. Then, the system will ask if the student wants to inform the teacher immediately as shown in Fig. 4(f). If the student selects “no,” the teacher will be informed more about the situation after class. The screen is shown in Fig. 4(g). If the student selects “yes,” a message will be sent to the teacher, as shown in Fig. 4(h). Finally, the system will store the results of the student's emotion recognition and problem feedback in the database, and then return to facial recognition to start the next round of recognition.

Fig. 4. User interface: (a) start, (b) login, (c) emotion recognition, (d) problem confirmation, (e) leave message, (f) real-time feedback, (g) feedback confirmation, and (h) notify teacher.



Experimental Results

Experiment Setup
The experiments are conducted with a NVIDIA Jetson Nano, Logitech C270 camera, and 7-inch LCD touch screen. The operating system Ubuntu 18.04 is installed in the NVIDIA Jetson Nano development board, and facial emotion recognition is implemented with the Python deep learning API called Keras. The Firebase API is installed for recording students’ feedbacks to the online feedback system. Teachers are able to use any mobile phone installed with WhatsApp to receive students’ emotion feedback. The details of the hardware and software used in this work are listed in Tables 1 and 2, respectively.

Table 1. Hardware configuration

Hardware Items
Development board NVIDIA Jetson Nano
GPU NVIDIA Maxwell architecture with 128 and NVIDIA CUDA cores
Central processing unit Quad-core ARM Cortex-A57 MPCore processor
Random access memory 4 GB 64-bit LPDDR4, 1600 MHz 25.6 GB/s
Storage microSD 32GB
Camera Logitech C270
Monitor 7-inch LCD touch screen
Communication Wi-Fi on mobile phone


Table 2. Software version
Software Context
Operating system Ubuntu 18.04
Application Python 3.6.9
Deep learning API Keras 2.0.0
Database Firebase API
Feedback application WhatsApp


Model Comparison and Evaluation Metric
In the paper, the proposed DSENet is compared to the residual network with 34 layers through ResNet-34 [7]. The main architecture is composed of residual blocks, while multiple residual blocks from the residual network. The basic concept of ResNet assumes that a Shallow network has reached a saturated accuracy, thus F(x)=0. It means that no new characteristics can be learned. At that time, it adds the identity mapping, defined as F(x)+x. It not only increases the depth of the neural network but also maintains the error rate, when F(x)=0, F(x)+x=x. This means that a deeper neural network architecture should not increase the error of the training set.
The proposed DSENet is compared to the ResNet-34 model in terms of recognition accuracy. Two datasets are experimented based on FER2013 and the dataset collected by ourselves. In addition, the visualization tool confusion matrix is utilized to capture the recognition results of the proposed DSENet and ResNet-34 model with respect to the categories under two dataset.

Training by Open Dataset for Facial Expression Recognition
In order to avoid any privacy issue concerns when training the DSENet model, we first use the open dataset Facial Expression Recognition 2013 (FER2013) for training. In the FER2013 dataset, the size of each image is 48 × 48 pixels and contains a total of seven labels marked as angry, disgust, fear, happy, sad, surprised, and neutral. Note that the training result of using the FER2013 dataset will be further applied to the source domain of transfer learning [37] in experiments. The number of pictures from the FER2013 dataset used in this experiment is shown in Table 3.

Table 3. FER2013 dataset
Index Emotion Number of pictures
0 Angry 4,593
1 Disgust 547
2 Fear 5,121
3 Happy 8,989
4 Sad 6,077
5 Surprised 4,002
6 Neutral 6,198

In this experiment, the DSENet and ResNet-34 recognition models were trained using the FER2013 dataset. Among them, there are 28,709 images for training data and 3,589 images as testing data. The hyperparameter settings of the experiment are shown in Table 4. In this experiment, a total of 100 epochs have been made by using a Nadam optimizer, with a learning rate of 0.002 and batch size of 8. If the validation accuracy does not rise every 5 epochs, the learning rate is multiplied by 0.5. The accuracies of DSENet and ResNet-34 in the FER2013 dataset are captured in Fig. 5.

Table 4. Parameter settings
Parameter Value
Epochs 100
Batch size 8
Optimizer Nadam
Learning rate 0.002
Learning rate reduction 0.5/ every 5 epochs
Beta_1 0.9
Beta_2 0.999



Fig. 5. Accuracy of DSENet and ResNet-34 in FER2013 dataset.


The experimental results showed that the recognition accuracy of the DSENet model was 65.03%, while the ResNet-34 model was 58.07%. The proposed DSENet model is approximately 7% higher in recognition accuracy than the ResNet-34 model. In addition, according to the confusion matrix results of Fig. 6(a) and 6(b), we found that the DSENet model has significantly higher recognition accuracies with respect to disgust, fear and sad emotions than the ResNet-34 model. Furthermore, ResNet-34 is more likely to misjudge disgust emotion as angry emotion. It can be seen that in the FER2013 dataset, the DSENet model designed in this study can effectively and correctly identify facial emotions compared to the ResNet-34 model.

Fig. 6.Confusion matrix of (a) DSENet and (b) ResNet-34 in FER2013 dataset.


Training by Dataset Collected during e-Learning Course
In [38], the authors pointed out that the expressions of learning emotions and basic emotions are not the same. In addition, since there is no open dataset for learning emotion. We collect the dataset of learning emotion by ourselves. All students have agreed with the collection of their learning emotions before starting the e-learning course. There were 10 students invited for collecting learning emotions, including three female and seven male students. They are asked to watch an e-learning course for about 10 minutes. While the students watched the online course, a Logitech webcam C270 recorded their facial expressions. After collecting the pictures of students’ facial expressions, they were requested to identify and label their emotions by themselves. Since learning emotions based on FER are difficult to define by category, the paper used the “operational definitions” described in [32] to categorize the collected pictures, which are listed and defined in Table 5. Finally, 1,166 pictures were collected and categorized into six learning emotions. The number of pictures in each label is shown in Table 6.

Table 5. Operational definitions for learning emotion recognition
Learning emotion Operational definition
Frustrated Facial expression: Frowning and closed mouth.
Description: Feeling unsure, discouraged, irritated, stressed, and anxiety in learning process.
Confused Facial expression: Frowning and mouth movements are slightly open or closed.
Description: Feeling overwhelmed or confusion in learning process.
Bored Facial expression: Upper eyelids are dropping and appear listless, distracted, or the mouth is unnatural.
Description: Feeling bored, tired, and distracted in learning process.
Delightful Facial expression: Smiling or laughing happily.
Description: Feeling satisfied with learning process
Surprised Facial expression: Eyes wide-opened or mouth opened and eyebrows raised.
Description: Feeling surprised and unexpected in learning process.
Flow Facial expression: Eyes focused on watching the monitor.
Description: None.


Table 6. Number of pictures for each label
Learning emotion Example picture Numbers of pictures
Frustrated 67
Confused 229
Bored 270
Delightful 74
Surprised 39
Flow 0


In this experiment, DSENet and ResNet-34 recognition models were trained using self-collected emotion datasets. There are 937 training data and 229 data testing data. The hyperparameters used in this experiment are the same as in Table 4. A total of 100 epochs are executed using a Nadam optimizer, with a learning rate of 0.002 and batch size of 8. If the verification accuracy rate does not increase every 5 epochs, the learning rate is multiplied by 0.5. The validation accuracy of DSENet and ResNet-34 in the learning emotion dataset is captured in Fig. 7. The blue line represents the testing accuracy of DSENet, and the orange line the testing accuracy of ResNet-34. Experimental results show that the recognition accuracy of the DSENet model is 64.19%, while the recognition accuracy of the ResNet-34 model is 58.08%. The DSENet model proposed in this study has a 6% higher accuracy in emotion recognition than that of the ResNet-34 model. In addition, we can find that DSENet can achieve convergence in fewer rounds.


Fig. 7.Accuracy of DSENet and ResNet-34 in e-learning course dataset.


Based on student learning emotions collected from the e-learning curriculum, the confusion matrix of DSENet and ResNet-34 recognition models are shown in Fig. 8(a) and 8(b), respectively. The DSENet model proposed in this study has higher recognition accuracies in the four emotions of confused, flow, frustrated and surprised than the ResNet-34 model. In addition, this experiment concludes that both models are more likely to identify frustrated as flow, and have a higher probability of identifying surprised as flow, gratified, and confused emotions. This result may be caused by a lack of obvious expressions in the dataset pictures and an insufficient number of surprised emotional pictures.

Fig. 8.Confusion matrix of (a) DSENet and (b) ResNet-34 in e-learning course dataset.


Recognition Accuracy with or without Transfer Learning
In order to realize a fast recognition model with an insufficient training dataset, transfer learning is applied in this experiment. Transfer learning facilitates models store knowledge obtained while solving one problem, and then applies it to another related task. The basic concept of transfer learning [37] is to transfer the trained model and parameters to another model, so that the transferred model can use less training data to achieve the same performance. Therefore, there is no need to start training from the beginning, expending more time and effort. Transfer learning divides the domain into a source and target domain. A source domain is the existing knowledge or domain that has been learned, and a target domain is the domain to be learned and trained. Transfer learning assists a target domain in completing its training through the training results transferred from the source domain. Because there is no open dataset as to learning emotions and the learning emotion images collected by ourselves are not sufficient to achieve a high recognition accuracy, the training result gained from the FER2013 dataset is transferred in this experiment. In other words, the source domain is the FER2013 dataset, while the target domain is the learning emotion dataset collected by ourselves.
This experiment studies the recognition effect of transfer learning on the DSENet model in the learning emotion dataset, with 937 training data and 229 testing data. The hyperparameter used in this experiment are the same as in Table 4, and a total of 100 epochs are executed using a Nadam optimizer, with a learning rate of 0.002 and batch size of 8. If the verification accuracy rate does not increase every 5 epochs, the learning rate is multiplied by 0.5. The experimental results are shown in Fig. 9. This experiment found that the best accuracy of the DSENet model through the transfer learning technique is 71.18%, and the accuracy at the 100th epoch is 68.12%. The DSENet model without a transfer learning technique has the best accuracy of 63.76% and an accuracy of 58.52% at the 100th epoch. Therefore, this experiment found that in the learning emotion dataset, the accuracy of using a transfer learning technique is about 7%–10% higher than that of one without a transfer learning technique. In addition, when the DSENet model uses a transfer learning technique, the recognition accuracy has reached its peak of training and gradually converges after about 30 epochs, while one without a transfer learning technique requires more rounds of training. Nevertheless, the recognition accuracy has not yet converged.



Fig. 9.Accuracy of DSENet model with or without transfer learning.




Fig. 10.Confusion matrix of DSENet model with or without transfer learning.


In the learning emotion dataset of the DSENet model, confusion matrices with and without a transfer learning technique are shown in Fig. 10(a) and 10(b), respectively. This experiment found that after the DSENet model uses a transfer learning technique, the recognition accuracy of boredom and flow is much improved, but it is also found that compared to one without a transfer learning technique, using a transfer learning technique is more likely to identify surprised emotions as confused and confused emotions as flow. It is noteworthy that whether a transfer learning technique is used or not, it is more likely to recognize frustrated emotions as flow. This is because the two learning emotions of frustrated and flow are not prone to obvious differences in the students’ facial expressions.


Conclusion and Future Work

In the paper, a fast FER model and real-time online feedback system are proposed to achieve affective computing. The proposed system provides five major functions with promptness, while including facial expression recognition, attention, feedback, analysis, and record. Since there is no open-source learning emotion dataset, we collected learning emotions from students in e-learning classes. Whether in the open-source FER2013 dataset or learning emotion dataset collected by ourselves, the proposed DSENet model yields a higher emotion recognition accuracy compared to the ResNet-34 model. The proposed system is able to recognize student’s emotions, forward student’s feedback to the teacher, record student’s problems in the database, and convert the data stored into visualization data for analysis. In the future work, we intend to enrich the learning emotion dataset for improving recognition accuracy. Moreover, the proposed system will be developed into a mobile phone application, so that teachers and students will have better usage convenience.


Author’s Contributions

Conceptualization, FHT. Funding acquisition, FHT. Investigation and methodology, FHT, YPC, YW. Supervision, FHT. Writing of the original draft, YPC, YW. Writing of the review and editing, FHT, HYS. Software, YPC, YW. Validation, FHT, HYS. Data curation, YPC, YW. Visualization, YPC, YW. All the authors have proofread the final version.


Funding

This work is partly supported by the Young Scholar Fellowship Program under the auspices of the Ministry of Science & Technology (MOST) in Taiwan (Grant No. MOST109-2636-E-003-001), with partial funding from MOST in Taiwan (Grant No. MOST109-2511-H-003-046, MOST110-2222-E-006-011).


Author’s Contributions

The authors declare that they have no competing interests.


Author Biography

Author
Name: Fan-Hsun Tseng
Affiliation: National Cheng Kung University
Biography: Fan-Hsun Tseng received the Ph.D. degree in Computer Science and Information Engineering from the National Central University, Taoyuan, Taiwan, in 2016. In 2021, he joined the faculty of the Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, where he is currently an Assistant Professor. He has served as Associate Editor-in-Chief of Journal of Computers, Associate Editor of IEEE Access, Human-centric Computing and Information Sciences, Journal of Internet Technology, and IET Networks. His research interests include Mobile Networks, Edge Computing, Artificial Intelligence, Machine Learning, and Evolutionary Computing. He is a senior member of the IEEE.

Author
Name: Yen-Pin Cheng
Affiliation: National Taiwan Normal University
Biography: Yen-Pin Cheng received the B.S. degree in Department of Technology Application and Human Resource Development from the National Taiwan Normal University, Taipei, Taiwan, in 2021. He is currently pursuing his M.S. degree in the Institute of Information Systems and Applications at the National Tsing Hua University, Hsinchu, Taiwan. His research interests include Artificial Intelligence and Computer Vision.

Author
Name: Yu Wang
Affiliation: National Taiwan Normal University
Biography: Yu Wang received is currently an undergraduate student in Department of Technology Application and Human Resource Development, National Taiwan Normal University, Taipei, Taiwan. Her research interests include Artificial Intelligence and Facial Expression Recognition.

Author
Name: Hung-Yue Suen
Affiliation: National Taiwan Normal University
Biography: Hung-Yue Suen the Ph.D. degree in management information systems. He is currently an Assistant Professor with the Technology Application and Human Resource Development Department, National Taiwan Normal University. His main research interests include social computing, human–computer interaction, data analytics, and artificial intelligence in human resources development and management.


References

[1] A. M. Maatuk, E. K. Elberkawi, S. Aljawarneh, H. Rashaideh, and H. Alharbi, “The COVID-19 pandemic and E-learning: challenges and opportunities from the perspective of students and instructors,” Journal of Computing in Higher Education, vol. 34, no. 1, pp. 21-38, 2022.
[2] L. Gerard, K. Wiley, A. H. Debarger, S. Bichler, A. Bradford, and M. C. Linn, “Self-directed science learning during COVID-19 and beyond,” Journal of Science Education and Technology, vol. 31, no. 2, pp. 258-271, 2022.
[3] C. Evans and J. P. Fan, “Lifelong learning through the virtual university,” Campus-Wide Information Systems, vol. 19, no. 4, pp. 127-134, 2002.
[4] P. Honey, “E‐learning: a performance appraisal and some suggestions for improvement,” The Learning Organization, vol. 8, no. 5, pp. 200-203, 2001.
[5] A. Khalil, N. Minallah, I. Ahmed, K. Ullah, J. Frnda, and N. Jan, “Robust mobile video transmission using DSTS-SP via three-stage iterative joint source-channel decoding,” Humancentric Computing and Information Sciences, vol. 11, article no. 42, 2021. https://doi.org/10.22967/HCIS.2021.11.042
[6] A. El Azzaoui, M. Y. Choi, C. H. Lee, and J. H. Park, “Scalable lightweight blockchain-based authentication mechanism for secure VoIP communication,” Human-centric Computing and Information Sciences, vol. 12, article no. 8, 2022. https://doi.org/10.22967/HCIS.2022.12.008
[7] K. Bahreini, R. Nadolski, and W. Westera, “Towards multimodal emotion recognition in e-learning environments,” Interactive Learning Environments, vol. 24, no. 3, pp. 590-605, 2016.
[8] Y. I. Tian, T. Kanade, and J. F. Cohn, “Recognizing action units for facial expression analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 97-115, 2001.
[9] A. Boughida, M. N. Kouahla, and Y. Lafifi, “A novel approach for facial expression recognition based on Gabor filters and genetic algorithm,” Evolving Systems, vol. 13, no. 2, pp. 331-345, 2022.
[10] Z. Chen, L. Wu, H. He, Z. Jiao, and L. Wu, “Vision-based skeleton motion phase to evaluate working behavior: case study of ladder climbing safety,” Human-centric Computing and Information Sciences, vol. 12, article no. 1, 2022. https://doi.org/10.22967/HCIS.2022.12.001
[11] L. Zhao, Y. Zhang, and Y. Cui, “A multi-scale U-shaped attention network-based GAN method for single image dehazing,” Human-centric Computing and Information Sciences, vol. 11, article no. 38, 2021. https://doi.org/10.22967/HCIS.2021.11.038
[12] S. Li and W. Deng, “Deep facial expression recognition: a survey,” IEEE Transactions on Affective Computing, 2020. https://doi.org/10.1109/TAFFC.2020.2981446
[13] T. Daouas and H. Lejmi, “Emotions recognition in an intelligent elearning environment,” Interactive Learning Environments, vol. 26, no. 8, pp. 991-1009, 2018.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 770-778.
[15] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 2017, pp. 2261-2269.
[16] L. Abualigah, D. Yousri, M. Abd Elaziz, A. A. Ewees, M. A. Al-Qaness, and A. H. Gandomi, “Aquila optimizer: a novel meta-heuristic optimization algorithm,” Computers & Industrial Engineering, vol. 157, article no. 107250, 2021. https://doi.org/10.1016/j.cie.2021.107250
[17] L. Abualigah, M. Abd Elaziz, P. Sumari, Z. W. Geem, and A. H. Gandomi, “Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with Applications, vol. 191, article no. 116158, 2022. https://doi.org/10.1016/j.eswa.2021.116158
[18] L. Abualigah, A. Diabat, S. Mirjalili, M. Abd Elaziz, and A. H. Gandomi, “The arithmetic optimization algorithm,” Computer Methods in Applied Mechanics and Engineering, vol. 376, article no. 113609, 2021. https://doi.org/10.1016/j.cma.2020.113609
[19] O. A. Arqub and Z. Abo-Hammour, “Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm,” Information Sciences, vol. 279, pp. 396-415, 2014.
[20] Z. Abo-Hammour, O. A. Arqub, O. Alsmadi, S. Momani, and A. Alsaedi, “An optimization algorithm for solving systems of singular boundary value problems,” Applied Mathematics & Information Sciences, vol. 8, no. 6, pp. 2809-2821, 2014.
[21] C. Pabba and P. Kumar, “An intelligent system for monitoring students' engagement in large classroom teaching through facial expression recognition,” Expert Systems, vol. 39, no. 1, article no. e12839, 2022. https://doi.org/10.1111/exsy.12839
[22] A. Mollahosseini, D. Chan, and M. H. Mahoor, “Going deeper in facial expression recognition using deep neural networks,” in Proceedings of 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, 2016, pp. 1-10.
[23] M. Shin, M. Kim, and D. S. Kwon, “CNN structure analysis for facial expression recognition,” in Proceedings of 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), New York, NY, 2016, pp. 724-729.
[24] O. Rudovic, I. Patras, and M. Pantic, “Facial expression invariant head pose normalization using gaussian process regression,” in Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, 2010, pp. 28-33.
[25] T. Hassner, S. Harel, E. Paz, and R. Enbar, “Effective face frontalization in unconstrained images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 4295-4304.
[26] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, article no. 60, 2019. https://doi.org/10.1186/s40537-019-0197-0
[27] K. Wangchuk, P. Riyamongkol, and R. Waranusast, “Real-time Bhutanese sign language digits recognition system using convolutional neural network,” ICT Express, vol. 7, no. 2, pp. 215-220, 2021.
[28] P. Ekman, “Basic emotions,” in Handbook of Cognition and Emotion. Chichester, UK: John Wiley & Sons, 1999, pp. 45-60.
[29] R. L. Solomon, “The opponent-process theory of acquired motivation: the costs of pleasure and the benefits of pain,” American Psychologist, vol. 35, no. 8, pp. 691-712, 1980.
[30] C. E. Izard, Emotions in Personality and Psychopathology. New York, NY: Plenum Press, 2013.
[31] J. Su and W. Yang, “Artificial intelligence in early childhood education: a scoping review,” Computers and Education: Artificial Intelligence, vol. 3, article no. 100049, 2022. https://doi.org/10.1016/j.caeai.2022.100049
[32] P. R. Chung, “Applying facial action units and feature selection methods to develop the learning emotion image database and recognition model,” Master’s thesis, National Chung Hsing University, Taichung, Taiwan, 2018 [Online]. Available: https://hdl.handle.net/11296/tj8335.
[33] M. Feidakis, T. Daradoumis, S. CaballA, and J. Conesa, “Embedding emotion awareness into e-learning environments,” International Journal of Emerging Technologies in Learning (iJET), vol. 9, no. 7, pp. 39-46, 2014.
[34] U. Ayvaz, H. Guruler, and M. O. Devrim, “Use of facial emotion recognition in e-learning systems,” Information Technologies and Learning Tools, vol. 60, no. 4, pp. 95-104, 2017.
[35] C. H. Wu, “New technology for developing facial expression recognition in e-learning,” in Proceedings of 2016 Portland International Conference on Management of Engineering and Technology (PICMET), Honolulu, HI, 2016, pp. 1719-1722.
[36] F. Tian, P. Gao, L. Li, W. Zhang, H. Liang, Y. Qian, and R. Zhao, “Recognizing and regulating e-learners’ emotions based on interactive Chinese texts in e-learning systems. Knowledge-Based Systems, 55, 148-164, 2014
[37] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data engineering, vol. 22, no. 10, pp. 1345-1359, 2009.
[38] J. C. Hung, K. C. Lin, and N. X. Lai, “Recognizing learning emotion based on convolutional neural networks and transfer learning,” Applied Soft Computing, vol. 84, article no. 105724, 2019. https://doi.org/10.1016/j.asoc.2019.105724

About this article
Cite this article

Fan-Hsun Tseng1, Yen-Pin Cheng2, Yu Wang2, and Hung-Yue Suen2,*, Real-time Facial Expression Recognition via Dense & Squeeze-and-Excitation Blocks, Article number: 12:39 (2022) Cite this article 3 Accesses

Download citation
  • Received8 March 2022
  • Accepted22 April 2022
  • Published30 August 2022
Share this article

Anyone you share the following link with will be able to read this content:

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords