ArticlesAll Issue
ArticlesExploring the Role of Deep Learning in Industrial Applications: A Case Study on Coastal Crane Casting Recognition
• Muazzam Maqsood1,*, Irfan Mehmood2, Rupak Kharel3, Khan Muhammad4,*, Jaecheul Lee5,*, Waleed S. Alnumay6

Human-centric Computing and Information Sciences volume 11, Article number: 20 (2021)
https://doi.org/10.22967/HCIS.2021.11.020

Abstract

Deep learning-based visual analytics play an important role in the automation of industrial processes, performing exceptionally well compared to traditional machine learning approaches. However, there are several challenges faced by deep learning methods when employed in industrial applications, such as noisy data, lack of labeled data, computational complexity, and adversarial attacks. In this context, this paper presents a casestudy on deep learning-assisted coastal crane automation. The case study presents real-time container corner casting recognition for efficient loading and unloading of the container. The proposed crane casting recognition consists of a lightweight dehazing method for pre-processing noisy videos to remove haze, fog, and smoke, and end-to-end corner casting recognition by applying a recurrent neural network along with long short-term memory (LSTM) units. The proposed method is real-time, and is verified in the field with an average rate of accuracy of 96%.Thispaper also explores the challenges faced by artificial intelligence and recommends possible future directions in the relevant domain.

Keywords

Deep Learning, Machine Learning, Industrial Process Automation, Dehaze, CNNs, Real-Time Systems, LSTM

Introduction

Recent scientific developments have contributed to the manufacturing of large-scale mechanical equipment and machinery. Such equipment provides a foundation for high-speed, accurate, and automated processes increasing the productivity of human efforts[1]. Therecent rapid advanceof technology has also led to the automation of more complex processes with even better results[2, 3]. The extraordinary performances of these industries are leadingthe business community to invest heavily in this side of the business, especially in the mechanical industry. The core reason for this isthe reliability and performance of the latest machinery andtechnology. Such reliability and trustin the mechanical industry haveresulted in immense growth in both public and private sector businesses recently. These business ventures can lead to more job opportunities and serve as a core pillar of the economy of any country [4]. The introduction of the latest machineries and technologies not only increases the productivity of many sectors but also contributes to the economy by creating more jobs. This is the reason why economies are usually heavily dependent on the mechanical machinery and equipment industries. For this specific reason, failure to maintain this industrial sector often raises drastic economic implications [5].
The failure of or damage to mechanical equipment mayhave consequences ranging fromthe severe to the drastic toeven the unimaginable. Malfunctions of mechanical equipment can affect the whole industrial process, which in turn hampers productivity. The consequences can be direct or indirect for the economy and for human beings. Such damages and failures can also have an adverse impact on the mechanical devices themselves,which are usually very costly. Second, failures of or damages to large machinery often have an effect on human lives. They canalso have a definite negative impact on the economy if left tocontinue un-remedied for a long time [6]. Sometimes such damages or failures are not so severe, yet they nonetheless lead to disastrous scenarios. As such, there is a strong need to develop a mechanism to maintain the smooth functioning of industrial equipment and to control any unavoidable severe consequences. This raises the importance of an intelligent and real-time fault diagnostic system that can monitor equipment and identify any potential failures in advance. The idea was first introduced by countries with advanced technology in the 1970s[1]. Fault diagnostic systems continuously monitor the current status of machinery and predict a potentialfailure. According to one survey, for a 400-MW steam turbine generator, a good failure diagnostic system can save around $17 million in maintenance costs per year if it can reduce the planned shutdown rate and the forced shut down rate to 1% [1]. Thus, efficient and artificial intelligence driven diagnostic systems can reduce maintenance costs as well as economic and human losses. Generally, there is no proper or authentic definition of a mechanical fault. The general, widely used definition of a fault is“an event that makes a machine work abnormally or affects the performance of the machine.” There are certainly many possible or potential faults that can be identified using high-definition surveillance cameras [7, 8]. In the last few decades, deep learning[9] and computer vision havebecome widely popularized in diverse fields[1016]. Therefore, there has been an increase in the use of computer vision-based techniques [17] for fault detection and localization. These computer vision techniques perform exceptionally well for surveillance-based systems [14],especially after the development of the latest deep learning techniques,such asconvolutional neural networks (CNNs), semantic segmentation and, more recently,reinforcement learning. These deep learning techniques deliver exceptional performance for computer vision problems[1821].Recent deep learning techniques produce very good results,yet they also have some limitations when applied to industrial problems like fault detection and localization. Generally, the data received through surveillance cameras are usually of poor quality. On the other hand, data processing in realtime becomes a serious problem if very high-definition cameras are used to capture surveillance data[21, 22]. Extreme weather conditions and haze are other issuesthat hampersurveillance data. These conditions make it practically impossible for a fault diagnostic system to intelligently identify and localize a fault from video or image data captured by cameras. The models producepoor results with poor quality, high definition, or hazy data. The system will not be able to accurately identify and localize faults in such noisy or hazy data. Therefore, it is vital to design an efficient, real-time dehazingbased preprocessing technique, lightweight deep learning models for object detection or segmentation for industrial process automation, and an intelligent fault diagnostic system [4, 23]. This study consists of a detailed analysis of the role of current deep learning technologies in industrial applications,with an emphasis on highlighting the challenges and opportunities. Furthermore, it introduces an efficient real-time deep learning assisted coastal crane automation system that can enhance the quality of captured video streams usingthe dehaze method and process them forthe accurate localization of container castings. The proposed CNN-driven dehazearchitecture automatically maps the relationship between image (captured from surveillance videos) features with haze, and then generates an accurate estimation of medium transmission maps. This scheme ultimately processes images in realtime so as to produce clean imagesthatcan be further used for the detection and segmentation of container corners. The rest of this paper is organized as follows: Section 2 presents a brief history ofartificial intelligence (AI) in the context of industrial automation; Section 3 presents the main challenges and recommendations related to the latest AI techniques used for industrial applications; Section 4presents a case study on industrial process automation systems; and Section 5 presents the conclusion and directions for future research in this domain. Artificial Intelligence and Deep Learning AI is inspired by the human cognitive ability to solve complex industrial problems, which are difficult to solve otherwise. The essence of AI techniques lies in solving complex engineering problems by capturing the subtle relationship between input and output data examples, even when it is difficult to understand the underlying relations. Therefore, AI models are data-driven,and thus need data to learn to make future predictions and decisions. AI has evolved over a period of sixty years,yielding remarkable performances in recent years [24]. The artificial neural network (ANN) was originated in the 1940s when the MP model [25]and the Hebb rule [26]were used to analyze how neurons in the human brain work. In the early 1960s, different chess games were developed, and some other small problems were solved using machine learning. In 1956, a mathematical approach, called perceptron, inspired by the human nervous system, was developed with linear optimization [27]. This work was followed by the adaptive linear unit [28], which was used in practical applications like communication and weather prediction. The major limitation of early AI techniques was theirinability to handle problems that involved non-linear data [29]. The second expansion of AI began in 1980 when the Hopfield network circuit [30]was designed. The non-linear data problem was solved using the back propagation (BP) algorithm in 1974 [31]. Hopfield network and Boltzmann machine (BM) employed random approach for non-linear data problem [32]. In 1997, the kernel functions were proposed for the support vector machine (SVM), which performed exceptionally well with classification and regression problems. However, the human intervention required to fine-tune features and to reduce dimensionality for such traditional models remaineda problem, so the performance of these models remained dependent on human-engineered features. Deep learning has benefited not only from the accumulation of conventional machine learning algorithms but also from statistical learning. An overview of the emergence of these models is given in Fig. 1. Deep learning uses data representation rather than handpicked engineered features and converts raw and unstructured input data into abstract representation to learn the features. In 1986, therestricted Boltzmann machine (RBM) [25] was designed using the probability distribution of the Boltzmann Machine,in which hidden layers were used as features to characterize the input data. This was followed by the design of the auto-encoder, where a layer-by-layer greedy algorithm was used to reduce the loss function [33]. In 1995, a directed topology connection-based neural network was designed for feature learning from sequential data [34]. In 1997, a new model called the long short-term memory (LSTM) model was designed to handle gradient descent problems and complex time sequence problems [35, 36]. In 1998, a conventional neural network model was designed fortwo-dimensional data, such as images, where features learning was achieved using pooling and convolutional layers [37]. As the structure of deep learning models became increasingly complex and deeper, model training and optimization for features became more timeconsuming, which often led to overfitting and local optimization problems. This problem was solved in 2006 with the design ofthe deep belief network, which used bidirectional connections in top layers instead of stacking RBMs directly,leading to a drop in computational complexity [38]. The deep autoencoder was also proposed to handle highly non-linear data, while other variants of the autoencoder were designed to handle sparsity, dimensionality reduction, and ambiguous input data [39]. Deep CNN was also proposed to handle images in a better way by using deep layers,yielding a remarkable improvement in results [40]. Nowadays, many new models for deep learning that show improved performance are being proposed[41]. Fig. 1. The evolutionary phases of artificial intelligence, machine learning, and deep learning. Both deep learning and traditional machine learning algorithms are data-driven and can model complex underlying relationships between input and output. Deep learning is different from traditional machine learning due to itsdeep hierarchy, feature representation and learning, model training, and construction. Deep learning models rely on feature learning and model training in a single step, using kernels or end-to-end optimization. The deep structure with many hidden layers essentially consists of multi-level non-linear operations. It converts the original representation of features into a more abstract representation to understand the complex underlying structure. In general, deep learning is an end-to-endlearning model that requires minimal human interaction,whereas traditional machine learning learns features and performs classification in different steps. The features are extracted after converting raw data into different domains, such as frequency, time, or statistical domain. The next step usually consists in selectingthe most discriminative features in the feature selection stage. Therefore, performance not only depends upon the machine learning algorithm but also on the features used for classification. Generally, feature extraction and selection are time-consuming tasks andheavily dependent on domain knowledge. Therefore, deep learning models are more flexible and adaptive. The deep hierarchy of these models also makes it convenient to model complex non-linear relationships of data. As the sheer volume of data is increasing greatly day by day, the ability of a model to handle data in bulk is also important. Deep learning models perform best when provided with a huge amount of data; and they perform better still when data keeps on increasing for deep learning. As for traditional models, they show no further improvement in their performance after a certain amount of data. The differences between traditional machine learning and deep learning areillustrated in Table 1. Reinforcement learning is the latest machine learning technique to have gained popularity due to its performance. A reward-based mechanism is used for training, whereby a reward is given to the agent when it makes the right decision. In this way, the agent learns to make decisions based on its previous learning and ultimately solves difficult scenarios. Table 1. Difference between traditional machine learning models and deep learning [9] Category Feature learning Model Training Traditional/conventional machine learning models Domain knowledge required for feature selection. Explicitly engineered. Shallow structure. The modelis trained in multiple steps. Handcrafted features are used to train a model. Deep learning Feature learningis done by transforming data into an abstract representation. End-to-end deep hierarchical non-linear multilayer structure. The model is trained jointly. Computational intelligence is a principal area of smart manufacturing that includes performance enhancement, cost control, optimization of processes, shortened product cycle development times, and improved efficiency, as shown in Fig. 2. Machine learning has been scrutinized and explored at various stages of the manufacturing lifecycle. The applications of AI have seenhuge growth in the fields of smart manufacturing, surveillance systems, financial decision making, health care, self-driving cars, and automation. As a step forward in AI, deep learning performs exceptionally well in numerous applications related to speech recognition, image recognition, natural language processing, multimodal image-text, and games. Deep learning permits the automatic processing of data intoextremely nonlinear feature abstraction through a stream composed of many layers, rather than handcrafting the best feature representation of data withthe knowledge of the domain. Deep learning provides advanced analytics tools for processing and analyzing big manufacturing data. Streams of layers for nonlinear processing are used to learn representations of data parallel to variousabstraction stages. The invisible patterns beneath each layer are then recognized and predicted through end-to-end optimization. Deep learning also has the potential to enhance data-driven manufacturing applications, specifically in the big data era [42]. One of the main areas of focus for the application of deep learning is computer vision. The performance of deep learning models has been exceptionally competentwhen applied to computer vision systems. There are many areas in computer vision—such as medical imaging, image retrieval, video processing, and surveillance—that have benefited from the advances madein deep learning. Video surveillance has become an important area where industrial solutions can also be developed. However, a large amount of labeled data, a lightweight model, and an efficient preprocessing technique are required to produce a good system for industry in general. This study aims to make AI-driven computer vision systems more robust when dealing with hazed input images. Fig. 2. Forecast of AI investment opportunities in different industrial sectors [31]. Challenges and Recommendations This section presents a discussion of the main challenges faced by deep learning with regard to industrial applications and possible future directions. Data Availability and Labeling The main difference between traditional machine learning classification models and deep learning residesin their performance in relation to big data. The overall performance of traditional machine learning models enhances as the volume of data increases to a certain level. However, after a certain limit, the increase in training data has no impact on performance or accuracy. In fact, their performance remains almostthe samedespite their being provided with more data.Conversely, an increase in the volume of training data enhances the performance of deep learning.The more the amount of training data is increased, the better the performance of the deep learning model. Therefore, a huge amount of data is required to design highly efficient systems,but the availability of data remains a challenging task. Big data has provided a strong foundation for improvingthe performance of deep learning-based models, but itraises a serious issue concerning data labeling. Deep learning models usually require labeled data and,to that end, all data needs to be labeled by expert annotators from the respective domains. The importance of data labeling has arisen in parallel with the increasing demand for big data and an increase in the performance of more sophisticated machine learning algorithms. The manual annotation of big data is a difficult task due tothe amount of time, cost and effort involved. This problem can be solved, however, by using automatic labeling processesusing pseudo labels. A solution can be designed to label datasets, especially for industrial applications. There is another possible solution to handlingthe labeling process,namelythe use of semi-supervised techniques. These semi-supervised techniques take some labeled data to start the process, and then group the data according to their similarity with labeled data. In this way, the time and cost of labeling can be reduced effectively. Data Quality Datasets collected for different purposes generally contain noise,especially in the field of computer vision. The data in surveillance-based systems are usually obtained from a camera. Generally, high-definition cameras are used to take videos that preserve the fine details of acaptured scene. These high-quality camera devices minimize the effect of general noise, but certain scenarios still require preprocessing proceduresfor refining the data. In industrial application scenarios, there may be extreme weather conditions such as rain, light reflection, fog, snow, or day-night thatneed to be handled before processing. Similarly, an industrial fault diagnostic system may face a scenario in which the captured image is hazed. For the abovementioned scenarios, it is essential to develop techniques that can handle such unavoidable scenarios so as to improve the performance of the industrial applications of deep learning. Deep Models Computational Complexity As discussed earlier, a huge amount of data is usually required to designa highly efficient model. Traditional machine learning models are not scalable andlack the ability to handle such large amounts of data, whereas deep learning-based models require a good amount of data to produce better results. Real-time data processing is subtly different to data availability from the perspective of computational complexity. Data availability poses a challenge for the provision of new and abundant data, but real-time processing is the ability to handle or process data in realtime or almost realtime. Various applications of machine learning, such as surveillance, business intelligence or fraud detection, require the real-time processing of the data. Live streaming can be used to provide live data;however,its efficient handling remains an open research issue due to the high quality of images. Atleast 30 frames per second are required to provide a good solution, especially in the case of video surveillance. Furthermore, it is possible to use more than one camera to achieve high performance in video surveillance. Live data makes the processing of data difficult in realtime. However,the delayed results raise questions aboutthe effectiveness of these systems. Most machine learning models are not lightweight, especially thosewhich rely heavily on large datasets. As discussed earlier, the deep learning-based models perform exceptionally well with hugetraining datasets (big data). Therefore, these models require a lot of time for training and optimization of the best solution. Furthermore, efforts to make these systems lightweight often affect performance. However, a large volume of input data, especially for computer vision-based systems like surveillance systems, needs a high-end computational machine to process it. These machines usually require GPUs to process images or video streams efficiently. Therefore, it is difficult to process these models on the simple or handheld machinesthat are currently being used in smart solutions. Transfer learning is a technique that targets the improvement of learning for a specific domain, known as the“target domain,”and uses data for training acquiredfrom other similar domains. It is usually used to improve model learning with a limited size of available data while using data from different similar domains. It can use the data from different individual source domains or combine these domains using adaptive machine learning techniques. This technique can handle problems like noisy data, limited data, and class imbalance issues. This can also prove helpful for industrial solutions where datasets from one industry or domain can be used in another to improve performance. As transfer learning learns from a source domain and transfers knowledge to a different (target) domain, it can handle data heterogeneity challenges.The ability to transfer instances between a source and a target domain makes it possible tohandle noisy data and data uncertainty. For these reasons, transfer learning can provide a better solution for handling limited and diverse datasets. Adversarial Attacks The recent advances in deep learning have been very useful in different domains,and especially in computer vision. CNN was specifically designed for computer vision tasks to avoid handcrafted feature extraction and learning. However, these deep learning models haverecentlyproved vulnerable to adversarial attacks in which the input image is manipulated in such a way that the attacker’sdesired output class can be achieved. The changes made to images can be performed usingmethods that are sosophisticated that they cannot be judged even by the human eye. Thus, the weakness of deep learning models has been badly exposed by adversarial attacks, and can prove severe to computer vision-based systems, especially for security or surveillance systems. Recently, researchers have been able to design algorithms that can fool deep learning models by making minimum changes to the input image. Researchers are now designing different adversarial attacks and their corresponding defenses. As such, there is a strong need to design more efficient methods that can help deep learning models to overcome this vulnerability. Real-Time Coastal Container Casting Recognition: A Case Study In industries, data quality is often compromised due to the nature of environments, e.g., the presence of steam, smoke, fog, or dust, resulting in hazy visual data. This kind of hazy data reduces the visibility of regions of interest, directly impacting the efficiency of AI-driven automation systems. In this case study, a comparison is performed between pre-processed and processed data in detecting the coastal crane corner detection for automatic loading and unloading tasks [43], as shown in Fig.3. Dehazing is performed on the input data stream as a pre-processing step to remove the effect of fog, mist, dust, or smoke, enhancing the visibility of objects in captured scenes. Since industrial processes require real-time processing, a lightweight pre-processing method is vital. Anything computationally complex could hinder the real-time ability of coastal crane coasting recognition. The proposed dehaze models formulate a hazy image as follows: Fig. 3. Overall view of the container corner casting recognition framework. (1) where$I_{Haze}$,$I_{Dehaze}\$,A,andTrepresent the hazy image,dehaze image, atmospheric light, and medium transmission map, respectively. Commonly, atmospheric light and transmission maps are estimated individually from a hazy image. However, this separate estimation of the A and T values makes the overall dehaze method computationally heavy, ultimately hindering its ability to be deployed in industrial systems.
In this context, a CNN-driven dehaze is employed tocomputeDehaze(x), the atmospheric light and medium transmission in a faster way as given below.

(3)

(4)

Here,b is the bias term with the default constant value 1. The CNN architecture with three concatenation layers is used to compute the dehaze[35, 44].Equation (4) represents the overall framework usedto extract features forthe estimation of atmospheric light and the medium transmission map.

(4)

Filters of various sizes were used to efficiently reconstruct the dehaze image, minimizing color distortions. Finally, the imageis reconstructedusing Equation (5).

(5)

This lightweight dehaze method can be utilized as a preprocessing step in any given visual analytics task. In this case study,it was used in the pre-processing ofvisual data streams captured by the crane automation system. After pre-processingthe input data streams,dehaze images were passed to the corner recognition module [44]. Crane corner recognition is built on deep learning technology. The proposed deep model can encode visual data into 15×20 patches of total dimension 1024 GoogLeNet features. This is complemented with 300 LSTM decoder units, which facilitate accurate regression of the boundary box locations/regions. It was observed that this corner recognition method works in realtime and has the capacity to process 25–30 frames per second to detect corner boundaries. However, its performance varies with the quality of input data. Visual data with poor visibility significantly degrade the accuracy of corner detection. However, by employing dehaze as a pre-processing module, the same corner detection method gives a higher degree of accuracy.
In this case study, a dataset of twenty videos was used to train the Dehaze model and the corner cast recognition model. This datasetwas collected fromthe real environment. These training videos are diverse in nature, consisting of visualcaptured imagesunder rainy, snow, fog, smoke, sunset, night, and dawn conditions. Instead of processing four data streams for each of the four corners of the crane, they were merged into one stream, thus making the proposed system a real-time one. On average the computational complexity of the dehaze and corner recognition deep models is 25–30 frames per second, where each frame is of dimension 1024×1014. A comparative analysis of corner cost recognition with and without haze is shown in Fig.4.
Fig. 4. Comparative analysis of container corner casting recognition on hazy and dehazed visual data streams: (a) sample input hazy images, (b) reconstructed de-hazed images, (c) red bounding box locating corner recognition on hazy and (d) dehazed images.

It has been observed that the segmentation of the proposed corner detection methods improves significantly when dehaze is applied as a preprocessing step. Real-time and accuracy are the two vital parameters for any computer-aided automation system. Traditional machine learning techniques are lightweight but vulnerable to any noisy data. Thus, these machine learning methods attracted littleinterest or appreciation inthe industry. The recent advancements in deep learning technology offer a degree of accuracy suitable to industry requirements. Although these deep models are robust, their great computational complexity is a big hurdle for their adaptation in the industry. For the purposes of this study, a robust and lightweight model was developed so as to be deployable in the real environment. Fig. 5 illustrates the difference in accuracy of the proposed method when applied to raw and pre-processed data.
Fig. 5. Comparative analysis of container corner casting recognition in terms of the F-measure on hazy and dehazed visual data on a dataset of 20 videos.

The comparative analysis (Table 2) shows the superiority of the proposed method. The proposed method, which is compared in terms of precision, recall, and F-measure, proved to be better than semantic segmentation [45], normalized cut segmentation [46], and saliency-driven normalized cut segmentation [47]. The programming environment consists of Python with a PyTorch framework on an Intel Core i5-6600 processor, an 8GB memory, and a NVIDIA GeForce GTX 1060 GPU. It has been observed that the overall time complexity of both the dehaze and corner recognition modules is within 25 frames per second, making it feasible for real-time video analytics tasks. |The average time for dehazing on 800 synthetic hazy images [30] and 20 videos from coastal crane surpasses the existing state-of-the-art dehaze methods. Despite other slower MATLAB implementations, it is fair to compare DehazeNet (Pycaffe version), AOD-Net, and the proposed method. The results illustrate the promising efficiency of the proposed method,which istwelve times faster than DehazeNet and twice as fast as AOD-Net per image.

Table 2. Comparative analysis with state-of-the-art methods
Category Normalized cut segmentation [46] Saliency-driven normalized cut segmentation [47] Semantic segmentation [45] Proposed method
Precision 0.47 0.48 0.77 0.94
Recall 0.51 0.53 0.8 0.95
F-Measure 0.48 0.5 0.78 0.94

Conclusion and Future Work

Industrial advances have seen immense growth in recent decades. Technological advances have contributed to this paradigm shift for theFourthIndustrial Revolution. This study focuses on factors that contribute to the performance of industrial processes for the smooth functioning of industries,ultimately improvingthe country’s economy. The role of artificial intelligence is discussedgenerally,while that of deep learning is discussed specifically. Deep learning has changed the dynamics of industries by providing architectures that can self-learn from available big data and perform extraordinarily well. These efficient systems arenow being widely usedin industrial automation and fault diagnostic systems. However, there are severalproblems faced by deep learning, such as the limited availability of annotated data, the lack of big data, the computational complexity of deep models, and adversarial attacks. As such, this paper discussessome of the major challenges faced by deep learning and provides some recommendations.
The paper presents a case study of a crane automation system built to handledehaze and corner casting recognition. The presented algorithm is a real-time preprocessing technique that can handle noisy data under extreme weather conditions such as heavy rain or snow, nighttime conditions and, more specifically, hazy images. The proposed corner casting recognition used RNN with LSTM architecture. Expressive image features from GoogLeNetwere used to produce intermediate image representations that were further tuned for our system. This converted intermediate image representation to predict the corner castings. The LSTM acted as a controller to pass information to the decoding steps and to ultimately controlthe output. The proposed case study system used back-propagation to allow the joint tuning of all components.
Deep learning has become a major priority of computer vision researchers and developers working in the field of image recognition, going far beyond tagging pictures in social media to such uses as autonomous driving, quality control in manufacturing, and medical imaging. However, data scientists must overcome numerous challenges before deep learning can be widely adopted by industry. Finding and processing the massive datasets for training isa basic challenge initially. While this is not a problem for consumer applications where large amounts of data are easily available, plentiful amounts of training data are rarely available in most industrial applications. Another challenge faced by this technology is the large number of interconnected neurons that capture subtle nuances and variations in data. However, this also means that it is hard to identify hyperparameterswhose values need to be fixed before training. There is also the danger of overfitting data, especially when the number of parameters greatly exceeds the number of independent observations. Moreover, deep learning networks require a lot of time for training, thereby making it very hard to quickly retrain models on the edge using newly available information. Currently, it is difficult to understand how deep learning networks arrive at insights. While this may not be that important in such applications as tagging photos on social media,understanding the decision-making process becomes very important in mission-critical applications like predictive maintenance or clinical decision-making. Finally, deep-learning networks are highly susceptible to the butterfly effect in that small variations in input data can lead to drastically different results, making them inherently unstable. This instability is also opening up new attack surfaces for hackers.Deep learning is also highly vulnerable to adversarial attacks, and often finds it difficult to differentiate between original and manipulated fake images. Therefore, there is a strong need to handle this issue in order to make the use of deep learning safe and secure.

Acknowledgments

Not applicable.

Funding

This work was supported by the Researchers Supporting Project (No. RSP-2020/250), King Saud University, Riyadh, Saudi Arabia.
This work was supported by theInstitute of Information & Communications Technology Planning & Evaluation (IITP) Grant funded by the Korean Government (MSIT) (No. 2019-0-00136, Development of AI-Convergence Technologies for Smart City Industry Productivity Innovation).

Authors’ Contributions

Conceptualization, MM. Visualization, KM, WA. Writing of the original draft, MM. Writing of the review and editing, IM, JL. All the authors have proofread the final version.

Competing Interests

The authors declare no competing interests.

Author Biography

Name : Muazzam Maqsood
Affiliation : COMSATS University Islamabad, Attock Campus, Pakistan
Biography : Dr. Muazzam Maqsood is serving as an Assistant Professor at COMSATS University Islamabad, Attock Campus, Pakistan. He holds a Ph.D. in software engineering with akeen interest in artificial intelligence and deep learning-based systems. He has more than 9 years ofresearch and teaching experience. He has published more than 40 papers in top-ranked journals andconferences. His main research focus is to use the latest machine learning and deep learning algorithms todevelop automated solutions especially in the field of pattern recognition and data analytics.

Name : Irfan Mehmood
Affiliation : Faculty of Engineering & Informatics, School of Media, Design and Technology, University of Bradford, UK
Biography : Irfan Mehmood is an Assistant Professor of Applied Artificial Intelligence at the University of Bradford, UK.His sustainedcontribution at various research and industrycollaborative projects gives him an extra edge tomeet the current challenges faced in the field ofmultimedia analytics. Specifically, he has madesignificant contribution in the areas of videosummarization, medical image analysis, visualsurveillance, information mining, deep learning inindustrial applications, and data encryption.

Name : Rupak Kharel
Affiliation : Department of Computing and Mathematics, Manchester Metropolitan University, UK
Biography : RupakKharel is a Reader at MMU and received the Ph.D. degree in secure communication systems from Northumbria University, U.K., in 2011. His research interests include various use cases and challenges of IoT and cyber physical systems, including Internet of Vehicles (IoV), cyber security, physical layer security, and next generation networks. He is Senior member of the IEEE, member of IET and a Fellow at HEA, U.K. He is Principal Investigator of multiple government and industry funded research projects, including £6 million Greater Manchester Cyber Foundry project.

Affiliation : Department of Software, Sejong University, Seoul, Republic of Korea
Biography : Khan Muhammadreceived his PhD degree in Digital Contents from Sejong University, South Korea. He is currently an Assistant Professor at the Department of Software and is Director of Visual Analytics for Knowledge Laboratory (VIS2KNOW Lab). His research interests include intelligent video surveillance (fire/smoke scene analysis, transportation systems, and disaster management), medical image analysis, (brain MRI, diagnostic hysteroscopy, and wireless capsule endoscopy), information security (steganography, encryption, watermarking, and image hashing), video summarization, multimedia data analysis, computer vision, IoT/IoMT, and smart cities. He has registered 7 patents and has contributed 150+ papers in peer-reviewed journals and conference proceedings.

Name : Jaecheul Lee
Affiliation : Department of Information &Communication Engineering, Sungkyul University, Anyang, South Korea
Biography : Jaecheul Leeis a Professor, CEO/I&T Telecom Inc., R&D Manager for LG Electronicsand active in Circuit Theory and AI. His research interests are computer vision, applied artificial intelligence, and embedded programming.

Name : Waleed S.Alnumay
Affiliation : Department of Computer Science, King Saud University, Riyadh, Saudi Arabia
Biography : Waleed S. Alnumay received his bachelor’s degree in computer science from King Saud University and did master’s degree in computer science from University of Atlanta,USA in the year 1996. He completed his Ph.D. in Computer Science from Oklahoma University, USA in the year 2004.Dr. Alnumay is currently working as an Associate Professor of Mobile Networking in King Saud University. He has published research papers in reputed international conferencesand journals. His research interest is Computer Networks and Distributed Computing that includes but not limited to Mobile Ad hoc and Sensor Networks, Information-Centric Networking and Software Defined Networking.

References

[1] X. Xu, D. Cao, Y. Zhou, and J. Gao, “Application of neural network algorithm in fault diagnosis of mechanical intelligence,” Mechanical Systems and Signal Processing, vol. 141, article no. 106625, 2020. https://doi.org/10.1016/j.ymssp.2020.106625.
[2] Z. Zheng, T. Wang, J. Wen, S. Mumtaz, A. K. Bashir, and S. H. Chauhdary, “Differentially private high-dimensional data publication in internet of things,” IEEE Internet of Things Journal, vol. 7, no. 4, pp. 2640-2650, 2020.
[3] H. Liao, Z. Zhou, X. Zhao, L. Zhang, S. Mumtaz, A. Jolfaei, et al., “Learning-based context-aware resource allocation for edge-computing-empowered industrial IoT,” IEEE Internet of Things Journal, vol. 7, no. 5, pp. 4260-4277, 2020.
[4] S. Schmidt, P. S. Heyns, and K. C. Gryllias, “A pre-processing methodology to enhance novel information for rotating machine diagnostics,” Mechanical Systems and Signal Processing, vol. 124, pp. 541-561, 2019.
[5] J. Jang, C. Ha, B. Chu, and J. Park, “Development of fault diagnosis technology based on spectrum analysis of acceleration signal for paper cup forming machine,” Journal of the Korean Society of Manufacturing Process Engineers, vol. 15, no. 6, pp. 1-8, 2016.
[6] S. Deng, L. Tang, X. Su, and J. Che, “Fault diagnosis technology of plunger pump based on EMMD-teager,” International Journal of Performability Engineering, vol. 15, no. 7, pp. 1912-1919, 2019.
[7] J. Antoni, J. Griffaton, H. Andre, L. D. Avendano-Valencia, F. Bonnardot, O. Cardona-Morales, et al., “Feedback on the surveillance 8 challenge: vibration-based diagnosis of a Safran aircraft engine,” Mechanical Systems and Signal Processing, vol. 97, pp. 112-144, 2017.
[8] J. Antoni and R. B. Randall, “The spectral kurtosis: application to the vibratory surveillance and diagnostics of rotating machines,” Mechanical Systems and Signal Processing, vol. 20, no. 2, pp. 308-331, 2006.
[9] A. Alghamdi, M. Hammad, H. Ugail, A. Abdel-Raheem, K. Muhammad, H. S. Khalifa, and A. A. Abd El-Latif, “Detection of myocardial infarction based on novel deep transfer learning methods for urban healthcare in smart cities,” Multimedia Tools and Applications, 2020.https://doi.org/10.1007/s11042-020-08769-x.
[10] Y. Pu, A. Szmigiel, and D. B. Apel, “Purities prediction in a manufacturing froth flotation plant: the deep learning techniques,” Neural Computing & Applications, vol. 32, no. 17, pp. 13639-13649, 2020.
[11] Y. Liu, Z. Bao, Z. Zhang, D. Tang, and F. Xiong, “Information cascades prediction with attention neural network,” Human-centric Computing and Information Sciences, vol. 10, article no. 13, 2020.https://doi.org/10.1186/s13673-020-00218-w.
[12] D. Cao, Z. Chen, and L. Gao, “An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks,” Human-centric Computing and Information Sciences, vol. 10, article no. 14, 2020.https://doi.org/10.1186/s13673-020-00219-9.
[13] D. Popa, F. Pop, C. Serbanescu, and A. Castiglione, “Deep learning model for home automation and energy reduction in a smart home environment platform,” Neural Computing and Applications, vol. 31, no. 5, pp. 1317-1337, 2019.
[14] P. Legg, J. Smith, and A. Downing, “Visual analytics for collaborative human-machine confidence in human-centric active learning tasks,” Human-centric Computing and Information Sciences, vol. 9, article no. 5, 2029.https://doi.org/10.1186/s13673-019-0167-8.
[15] R. Iqbal, T. Maniak, F. Doctor, and C. Karyotis, “Fault detection and isolation in industrial processes using deep learning approaches,” IEEE Transactions on Industrial Informatics, vol. 15, no. 5, pp. 3077-3084, 2019.
[16] W. Dai, H. Nishi, V. Vyatkin, V. Huang, Y. Shi, and X. Guan, “Industrial edge computing: enabling embedded intelligence,” IEEE Industrial Electronics Magazine, vol. 13, no. 4, pp. 48-56, 2019.
[17] C. Peeters, P. Guillaume, and J. Helsen, “A comparison of cepstral editing methods as signal pre-processing techniques for vibration-based bearing fault detection,” Mechanical Systems and Signal Processing, vol. 91, pp. 354-381, 2017.
[18] X. Yuan, L. Li, Y. A. W. Shardt, Y. Wang, and C. Yang, “Deep learning with spatiotemporal attention-based LSTM for industrial soft sensor model development,” IEEE Transactions on Industrial Electronics, vol. 68, no. 5, pp. 4404-4414, 2020.
[19] T. Kim, I. Y. Jung, and Y. C. Hu, “Automatic, location-privacy preserving dashcam video sharing using blockchain and deep learning,” Human-centric Computing and Information Sciences, vol. 10, article no. 36, 2020.https://doi.org/10.1186/s13673-020-00244-8.
[20] Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, “Deep learning for 3D point clouds: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020. https://10.1109/TPAMI.2020.3005434
[21] J. H. Park, M. M. Salim, J. H. Jo, J. C. S. Sicato, S. Rathore, and J. H. Park, “CIoT-Net: a scalable cognitive IoT based smart city network architecture,” Human-centric Computing and Information Sciences, vol. 9, article no. 29, 2019.https://doi.org/10.1186/s13673-019-0190-9.
[22] J. Vanus, P. Kucera, R. Martinek, and J. Koziorek, “Development and testing of a visualization application software, implemented with wireless control system in smart home care,” Human-centric Computing and Information Sciences, vol. 4, article no. 18, 2014.https://doi.org/10.1186/s13673-014-0019-5.
[23] D. Shah, “AI, Machine learning, & deep learning explained in 5 minutes: the difference between the three and how each of them works.” 2018 [Online]. Available: https://becominghuman.ai/ai-machine-learning-deep-learning-explained-in-5-minutes-b88b6ee65846.
[24] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115-133, 1943.
[25] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol 323, no. 6088, pp. 533-536, 1986.
[26] F. Rosenblatt, “Perceptron simulation experiments,” Proceedings of the IRE, vol. 48, no. 3, pp. 301-309, 1960.
[27] B. Widrow and M. E. Hoff, “Adaptive switching circuits,” in IRE WESCON Convention Record Part 4.New York, NY: IRE, 1960. pp. 96-104.
[28] M. Minsky and S. A. Papert, Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press, 2017.
[29] D. W. Tank and J. J. Hopfield, “Neural computation by concentrating information in time,” Proceedings of the National Academy of Sciences of the United States of America, vol. 84, no. 7, pp. 1896-1900, 1987.
[30] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, 1990.
[31] H. J. Sussmann, “Learning algorithms for Boltzmann machines,” in Proceedings of the 27th IEEE Conference on Decision and Control, Austin, TX, 1988, pp. 786-791.
[32] D. E. Rumelhart and J. L. McClelland, “Information processing in dynamical systems: foundations of harmony theory,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. Cambridge, MA: MIT Press, 1986. pp. 194-281.
[33] S. El Hihi and Y. Bengio, “Hierarchical recurrent neural networks for long-term dependencies,” Advances in Neural Information Processing Systems, vol. 8, pp. 493-499, 1995.
[34] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[35] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “AOD-Net: All-in-One Dehazing Network,” in Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 4780-4788.
[36] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[37] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
[38] L. Deng, M. L. Seltzer, D. Yu, A. Acero, A. R. Mohamed, and G. Hinton, “Binary coding of speech spectrograms using a deep auto-encoder,” in Proceedings of Eleventh Annual Conference of the International Speech Communication Association, Makuhari, Japan, 2010, pp. 1692-1695.
[39] R. Salakhutdinov and G. Hinton, “Deep boltzmann machines,” in Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS) 2009, Clearwater Beach, FL, 2009, pp. 448-455.
[40] R. Teti, K. Jemielniak, G. O’Donnell, and D. Dornfeld, “Advanced monitoring of machining operations,” CIRP Annals, vol. 59, no. 2, pp. 717-739, 2010.
[41] K. Panetta, “Artificial intelligence, machine learning, and smart things promise an intelligent future,” 2016 [Online]. Available: http://www.gartner.com/smarterwithgartner/gartners-top-10-technology-trends-2017/.
[42] J. Loucks, T. Davenport, and D. Schatsky, “State of AI in the enterprise,” 2018 [Online]. https://www2.deloitte.com/content/dam/Deloitte/co/Documents/about-deloitte/DI_State-of-AI-in-the-enterprise-2nd-ed.pdf.
[43] J. Lee, “Deep learning–assisted real-time container corner casting recognition,” International Journal of Distributed Sensor Networks, vol. 15, no. 1, 2019.https://doi.org/10.1177/1550147718824462.
[44] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778.
[45] A. Tao, K. Sapra, and B. Catanzaro, “Hierarchical multi-scale attention for semantic segmentation,” 2020 [Online]. Available: https://arxiv.org/abs/2005.10821.
[46] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000.
[47] I. Mehmood, N. Ejaz, M. Sajjad, and S. W. Baik, “Prioritization of brain MRI volumes using medical image perception model and tumor region segmentation,” Computers in Biology and Medicine, vol. 43, no. 10, pp. 1471-1483, 2013.

Muazzam Maqsood1,*, Irfan Mehmood2, Rupak Kharel3, Khan Muhammad4,*, Jaecheul Lee5,*, Waleed S. Alnumay6, Exploring the Role of Deep Learning in Industrial Applications: A Case Study on Coastal Crane Casting Recognition, Article number: 11:20 (2021) Cite this article 3 Accesses