Human-centric Computing and Information Sciences volume 12, Article number: 06 (2022)
Cite this article 2 Accesses
https://doi.org/10.22967/HCIS.2022.12.006
Recently, speech encryption attracts many researchers because of the various applications of speech communications such as; e-learning, e-banking, military, teleconferencing and other fields. In this work, a new modification on RSA (Rivest–Shamir–Adleman) algorithm is proposed to enhance the performance of conventional RSA up on application in audio cryptosystems. This paper is concerned with speech encryption and decryption based on the well-known RSA algorithm and some of its variants, including our own suggestion. The performance of both the original RSA algorithm and its variants is investigated and tested through estimating some parameters that give the indication of audio cryptography quality. The parameters that are estimated in the experimental test are; mean square error between the original signal and the decrypted signal, linear predictive code measure (LPC), cepstral distance measure (CD), the segmental signal-to-noise ratio (SSNR) and the execution time. Based on the estimated parameters, a performance comparison between the investigated algorithms is introduced. The obtained results show that the RSA algorithm and its variants are efficient to secure the audio communications and our new proposed modification reduces the processing time approximately by 39%–53%, compared to the original RSA algorithm and hence it is efficient in real time applications.
Speech Encryption/Decryption, RAS Algorithm Variants, Linear Predictive Code Measure (LPC), Cepstral Distance Measure (CD), Segmental Signal-to-Noise Ratio (SSNR)
Data reliability, secrecy, accessibility and confidentially are the main issues in communication process security. Cryptography is widely used in communication systems to secure and protect data. Cryptography plays an effective rule in various applications such as e-mail, e-commerce, sending financial information, pay-TV, and so forth. Up on application of cryptography, the meaning of the message is hided and the plain text is converted cipher text through encryption phase and the reverse process is carried out through decryption phase and hence, the insecure physical channel can be regarded as a secure logical channel [1]. Cryptography is classified into two main types known as symmetric cryptography and asymmetric cryptography. In symmetric cryptography a single key for encryption and decryption is used, whereas the asymmetric cryptography uses two keys: public key for encryption and private key for decryption. The asymmetric techniques are more secure than symmetric techniques but they take longer processing time. RSA (Rivest–Shamir–Adleman) is one of the most widely used asymmetric techniques which find applications in many fields such as email encryption, SSL/TLS certificates, cryptocurrencies and many other applications [2–4]. The reasons of the popularity of RSA are its reliability and ease of implementation. A number of modifications on RSA are recently conducted to enhance its performance [5–7].
Recently, audio messages, as an essential form of data, can be exchanged over different communication channels and hence, there is a dire need for audio encryption. In the case of audio encryption, the audio message is transformed to an ambiguous form. There are several research papers which are concerned with the audio encryption on the basis of some well-known encryption algorithms such as scrambling, elliptic curve cryptography (ECC), chaotic encryption and RSA algorithms [3, 8–12]. Various encryption techniques which comprise more than one algorithm have been proposed and applied to audio messages in recent years. Researches in [13], used the hyperchaotic system and the modified Henon map to encrypt the speech signal which is compressed by fast Walsh Hadamard transform. Other group in [14] encrypted the audio data using two stages: the first is block ciphering based on DNA encoding and logistic map and the second stage is based on channel shuffling to enhance the security. Authors in [15], used DNA coding and chaotic systems to encrypt audio messages, the new in this algorithm is the usage of hash value of the message to control the initiation of the chaotic system. Researchers in [16], combined the discrete wavelet transform (DWT) with the measured biometric features extracted from human hand geometry to carry out the speech encryption. In [17], both ECC and 3DES algorithms are used together to achieve audio encryption during transmission through mobile network. A new audio encryption technique based on combining chaotic systems and fast Fourier transform (FFT) has been introduced in [18], this technique uses two chaotic systems: one is the logistic map and the other is 3D Lorenz chaotic system to encrypt the speech message which is initially scrambled using FFT. Authors in [19], combined discrete cosine transform (DCT) with the scrambling algorithm to construct the speech encryption module as a part of speech retrieval process in cloud environment. In [20], the security of audio transmission is enhanced through combining four different encryption techniques: cipher feedback encryption, dynamic DNA coding, chaotic maps, and self-adaptive scrambling encryption. Most of the above stated work focused on the degree of security of the proposed algorithms, but there is another side that should be taken into account especially in real-time applications. This is the encryption and decryption times and from this point of view we think that simple algorithms will be preferred for audio encryption. Audio encryption based on RSA algorithm have been presented by some researchers using various implementation techniques.
The aim of this work is the enhancement of audio encryption through investigating the application of the RSA algorithm and its variants. For this purpose, we surveyed the RSA algorithm and its recent modifications, a new modification on RSA is introduced as one contribution of this work and then audio cryptosystems based on RSA and its variants, including our own developed one, are implemented. The performance of audio encryption is investigated by determining some audio quality metrics such as linear predictive code measure (LPC), mean square error (MSE) between the original signal and the decrypted signal, cepstral distance measure (CD) and the segmental signal-to-noise ratio (SSNR) [21]. Also, in this work, we are concerned with evaluating the encryption and decryption times as they are effective in the case of real time applications.
The rest of this paper is organized as follows. Section 2 presents the related work based on asymmetric cryptography. Section 3 illustrates RSA algorithm and its variants including our proposed modification. The audio transmission with a cryptosystem is presented in Section 4. Section 5 presents the audio quality metrics that are used to investigate the performance of the algorithms. Results and discussion are presented in Section 6, a comparison with some current systems is introduced in Section 7, and finally, Section 8 gives the main conclusions.
In the introduction section a general overview of speech encryption based on various techniques was introduced. This section focuses on the efforts based on classical techniques. There are a considerable number of propositions that were made on the application of asymmetric key cryptography in multimedia transmission. Authors in [22] modified El-Gamal cryptosystem to be applied over gray and color images, both encryption and decryption scenarios worked well. A combination of El-Gamal and scan methods was introduced in [23] to encrypt image. In [24], El-Gamal algorithm was utilized to enhance the security of speech transmission over open and shared networks. Authors in [25] protected the transmission of speech by applying cryptosystem which is based on Diffie-Hellman algorithm. Due to the simplicity of RSA, ease of implementation, low computational complexity and difficulty of breaking, various efforts were introduced to develop voice cryptosystems which are based on RSA algorithm.
The technique presented by researchers in [26], is based on saving different speech words from different speakers in a wave file, extracting data from the wave file and saving it in a text file as integer data and then performing the encryption and decryption processes on the integer data. In [27], a new encryption technique based on symmetric cryptography was suggested and applied for audio encryption, the results of the suggested technique were compared with the obtained results in the case of audio encryption based on RSA, and the suggested method produced a decrypted signal with higher quality. The performance of audio encryption based on RSA in terms of audio quality metrics was investigated in [28], the results obtained in that work ensures the validity of RSA in secure audio transmission as well as high quality of the recovered message. In addition to using RSA for audio encryption, some researchers investigated its application for video encryption. Most of researchers suggested multi-layer techniques for video encryption to increase the security. Video encryption based on RSA and ECC was presented in [29]. Also, [30] utilized dual layer for video encryption, the first layer is based on RSA whereas the second layer is based on pseudo-noise sequence. For further improvement of video encryption security, authors in [31] utilized a hybridization algorithm that consists of three layers: the first is based on RSA algorithm, the second is based on DES algorithm, and the third is a combination of both of them. In this work, we propose a new modification on RSA, in addition to applying RSA and its variants in speech security enhancement. Experimental investigations are concerned with quality metrics measurement of both encrypted and recovered speech signals as well as examining the improvements in RSA speed up on applying different variants in speech encryptions. To evaluate the effectiveness of our proposed technique it will be compared to both classical techniques and the most recently developed techniques which are based on chaotic cryptography [18, 20, 32–34]. In [18], the authors present a new speech encryption technique in which a 3D Lorenz-logistic map is introduced and used to generate three random number sequences which are used to permute the initial speech signal and the real and imaginary parts of its FFT. The author in [20] used three encryption techniques (DNA, self-adaptive scrambling, and cypher feedback encryption) in addition to chaotic maps to secure the audio transmission. Authors in [32] proposed synchronized chaotic systems at both transmitter and receiver to achieve speech encryption in case of multi-user communication. In [33], DWT is combined with the chaotic map in audio encryption to enhance the storage and transfer efficiencies. Authors in [34] encrypted the speech signal using both of cryptography protocols and chaotic maps, in this work different types of one-dimensional maps are used, the protection of parameters are carried out using blowfish algorithm, hashing algorithm is used to authenticate the blowfish key and the shared data which in turn increased the security of the system.
This section introduces an overview on RSA and some of its modified versions.
![]() |
![]() |
![]() |
![]() |
The audio transmission with a cryptosystem shown in Fig. 6, can be summarized as:
Audio data collection from audio signal.
Encryption of the collected data at transmitter using an encryption technique.
Transmission of the encrypted data through the communication channel.
Decryption of the received data at the receiver to recover the original message.
The performance of audio cryptosystem can be evaluated by measuring the quality of the processed speech. There are two categories of speech quality metrics: subjective and objective [21, 31]. The subjective metrics depend on the impression of the listener about the intelligibility of speech, whereas, the objective metrics depend on the original speech and the processed speech, they can be estimated using some mathematical expressions. There are a number of widely used objective metrics such as LPC, also known as log likelihood ratio (LLR), SSNR, CD, MSE between the original signal and the processed signal and correlation between the original signal and the processed signal [21, 35, 36].
$LLR = log\left(\frac{a_xR_ya_x^T}{a_yR_ya_y^T}\right)$(1)
$SSNR=\frac{!0}{M}\displaystyle\sum_{m=0}^{M-1}log_{10}\frac{\displaystyle\sum_{i=Nm}^{Nm+N-1}x^2(i)}{\displaystyle\sum_{i=Nm}^{Nm+N-1}(x(i)-y(i))^2}$(2)
where x(i) is the original speech, y(i) is the processed speech, N represents the frame length and M is the total number of frames 21, 36].$CD=10log_{10}\left[2\displaystyle\sum_{n=1}^{p}{C_x(n)-C_y(n)}^2\right]^{1/2}$(3)
where Cx and Cy are the cepstral vectors of the original speech and the processed speech, respectively [21].$r_{xy}=\frac{C_v(x,y)}{\sqrt{D(x)}\sqrt{D(y)}}$(4)
where Cv(x, y) is the covariance between the original and processed signals. D(x) and D(y) are the variances of x and y, respectively [21].$MSE=\frac{1}{Ns}\displaystyle\sum_{m=1}^{Ns}(x(m)-y(m))^2$(5)
where Ns is the number of samples.
Our goal in this section is the investigation of the quality of the encrypted speech and the decrypted speech up on the application of RSA algorithm variants to the original speech. For this purpose, we used different types of speech signals: a single word spoken by different speakers “zero” and two different long sentences. The performance of RSA variants as well as the performance of our proposed modification have been tested via MATLAB experimental implementation using lab top with Intel processor core i3, 4 GB RAM, 64-bit operating system.
The simulation steps can be summarized as follows. First, different audio files are obtained through recording different words and sentences, by different speakers, and saving them in WAV format using sampling rate of 8 kHz and sample length of 16 bits. Secondly, the MATLAB code is built to implement the encryption and decryption process. The process starts, for each algorithm, with entering the prime numbers required, then developing the code which represents the mathematical equation describing the three phases: key generation, encryption, and decryption illustrated in Section 3. Determining the quality metrics of both the encrypted and decrypted signals is an indication to the security level and performance of the algorithm.
We started our investigation by estimating the quality metrics, stated in Section 4, between the original speech and decrypted speech, as well as processing time for one audio word, recorded by different persons, as illustrated in Tables 1–3.
Table 1. Quality metrics of decryption phase (between original speech and decrypted speech) for speaker 1
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.42E-11 | 46.3226 | -21.4082 | -1.01E-07 | 1 |
New public two primes [5] | 3.42E-11 | 46.3226 | -21.4082 | -1.01E-07 | 1 |
Four primes [6] | 3.42E-11 | 46.3226 | -21.4082 | -1.01E-07 | 1 |
Five primes [7] | 3.42E-11 | 46.3226 | -21.4082 | -1.01E-07 | 1 |
Proposed approach | 3.42E-11 | 46.3226 | -21.4082 | -1.01E-07 | 1 |
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.41E-04 | 35.4115 | 0.1533 | -0.3152 | 0.9955 |
New public two primes [5] | 3.41E-04 | 35.4115 | 0.1504 | -0.3146 | 0.9955 |
Four primes [6] | 4.49E-04 | 35.4115 | 0.6251 | -0.3878 | 0.9904 |
Five primes [7] | 7.11E-04 | 35.4115 | 1.7046 | -0.6367 | 0.9834 |
Proposed approach | 7.07E-04 | 35.4115 | 2.0515 | -0.7401 | 0.9834 |
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 |
New public two primes [5] | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 |
Four primes [6] | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 |
Five primes [7] | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 |
Proposed approach | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 |
Algorithm | Speaker 1 | Speaker 2 | Speaker 3 | |||
---|---|---|---|---|---|---|
SSNR | MSE | SSNR | MSE | SSNR | MSE | |
Original RSA [2] | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 | |
New public two primes [5] | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 | |
Four primes [6] | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 | |
Five primes [7] | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 | |
Proposed approach | 3.41E-11 | 46.9749 | -19.9846 | -2.81E-08 | 1 |
Algorithm | Encryption time (s) | Decryption time (s) | Total processing time (s) | Saving time of the proposed approach |
---|---|---|---|---|
Original RSA [2] | 0.6012 | 0.7215 | 1.3227 | 53% |
New public two primes [5] | 0.7557 | 0.9447 | 1.7004 | 63% |
Four primes [6] | 0.4528 | 0.4528 | 0.9056 | 31% |
Five primes [7] | 0.3671 | 0.3388 | 0.7059 | 12% |
Proposed approach | 0.2972 | 0.3242 | 0.6214 | - |
Algorithm | Encryption time (s) | Decryption time (s) | Total processing time (s) | Saving time of the proposed approach |
---|---|---|---|---|
Original RSA [2] | 0.5636 | 0.6576 | 1.2212 | 48% |
New public two primes [5] | 0.5709 | 0.6661 | 1.237 | 48% |
Four primes [6] | 0.4168 | 0.3789 | 0.7957 | 20% |
Five primes [7] | 0.398 | 0.332 | 0.729 | 12% |
Proposed approach | 0.3074 | 0.333 | 0.6404 | - |
Algorithm | Encryption time (s) | Decryption time (s) | Total processing time (s) | Saving time of the proposed approach |
---|---|---|---|---|
Original RSA [2] | 0.5573 | 0.7431 | 1.3003 | 50% |
New public two primes [5] | 0.8266 | 1.0333 | 1.8596 | 62% |
Four primes [6] | 0.3889 | 0.4278 | 0.8159 | 14% |
Five primes [7] | 0.4362 | 0.3635 | 0.7997 | 7% |
Proposed approach | 0.3395 | 0.3638 | 0.7033 | - |
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
New public two primes [5] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Four primes [6] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Five primes [7] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Proposed approach | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
New public two primes [5] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Four primes [6] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Five primes [7] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Proposed approach | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
New public two primes [5] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Four primes [6] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Five primes [7] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Proposed approach | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
New public two primes [5] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Four primes [6] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Five primes [7] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Proposed approach | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
In this section a comparison between the proposed technique and some current work is made to evaluate the system performance in terms of MSE, SSNR and rxy. To do this comparison, different audio signals with different lengths are applied to the proposed cryptosystem, the quality metrics of the encrypted signals are listed in Table 12. Fig. 12 presents the histogram of one of the used sentences, sentence 1. The comparison between the results of the proposed approach and those of other work found in literature, is displayed in Table 13.
Table 12. Quality metrics of the processed signals via the proposed technique
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
New public two primes [5] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Four primes [6] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Five primes [7] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Proposed approach | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Algorithm | Quality metric | ||||
---|---|---|---|---|---|
MSE | SSNR | CD | LLR | rxy | |
Original RSA [2] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
New public two primes [5] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Four primes [6] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Five primes [7] | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
Proposed approach | 3.00E-11 | 35.4115 | -6.4819 | -4.93E-04 | 1 |
The main aim of audio encryption is to protect audio systems from illegal access, disruption or modification. This work investigates the performance of audio cryptosystems based on RSA algorithm and its variants. The work started by reviewing the modifications introduced to enhance the security of the original RSA algorithm and a new modification is also suggested as one of the contributions of this work. The performance is investigated by measuring some audio quality metrics such as SSNR, LLR, MSE, and CD for the encrypted and decrypted signals. The obtained results ensure the effectiveness of RSA and its variants in audio cryptosystems. On brief, the application of RSA variants enhances the security and reduces the processing time and hence they are efficient in real time applications and our proposed modification strongly competes in this area. In future, our proposed cryptosystem will be enhanced through combining this technique with other encryption techniques such as DNA and chaotic systems. Also, compression techniques can be used before encryption to reduce the running time to make our cryptosystem more suitable for real-time applications.
Conceptualization, SES, EA. Funding acquisition, EA. Investigation and methodology, SES, EA. Project administration SES, EA. Resources SES, EA. Supervision SES. Writing of the original draft SES, EA. Writing of the review and editing SES, EA. Software SES, EA. Validation SES. Formal analysis SES, EA. Data curation SES, EA. Visualization SES, EA. All the authors have proofread the final version.
This research funded by Qassim University.
The researchers would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project
The authors declare that they have no competing interests.
Name : Eman Abouelkheir
Affiliation : Department of Computer Science, College of Science and Arts, Qassim University, Alrass, 51452, Saudi Arabia
Department of Electrical Engineering, College of Engineering Kafrelsheikh, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt
Biography : Was born in Saudi Arabia in 1986. She receives B.Sc. from Faculty of Engineering Kafrelsheikh University in 2008. She received M.Sc and Ph.D. from Faculty of Engineering, Alexandria University. She is currently Assistant Professor in Department of computer Science Faculty of Sciences and Arts, Qassim Univerity. She is also as Lecturer Department of Electrical Engineering Faculty of Engineering Kafrelsheikh University.
Name : Shamia El-sherbiny
Affiliation : Department of Electrical Engineering, College of Engineering Kafrelsheikh, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt
Biography : Was born in Kafelsheikh in 1978. She received the B.Sc and M.Sc from Faculty of Engineering Tanta University. She received Ph.D. from Faculty of Engineering Menoufia University in 2014. She is currently a Lecture in the Department of Electrical Engineering Faculty of Engineering Kafrelsheikh University.
Eman Abouelkheir1,2,* and Shamia El-Sherbiny1, Enhancement of Speech Encryption/Decryption Process Using RSA Algorithm Variants, Article number: 12:06 (2022) Cite this article 2 Accesses
Download citationAnyone you share the following link with will be able to read this content:
Provided by the Springer Nature SharedIt content-sharing initiative