ArticlesAll Issue
ArticlesMusic Plagiarism Detection Based on Siamese CNN
• Kyuwon Park, Seungyeon Baek, Jueun Jeon, and Young-Sik Jeong*

Human-centric Computing and Information Sciences volume 12, Article number: 38 (2022)
https://doi.org/10.22967/HCIS.2022.12.038

Abstract

As music plagiarism has increased, various studies have been conducted on plagiarism detection. Conventional text-based plagiarism detection techniques identify plagiarism by comparing the similarity of musical information such as rhythms and notes. However, detecting plagiarized music that has subtle differences from the original music is still challenging. We propose a music plagiarism detection scheme (MPD-S) based on a Siamese convolutional neural network (CNN), which determines the presence or absence of plagiarism even with small changes in melody using Musical Instrument Digital Interface (MIDI) data. MPD-S converts vectorized MIDI data into grayscale images and then trains a CNN-based Siamese network model to measure the similarity between the original music and plagiarized music. MPD-S detects not only transposition and note plagiarism for a single vocal melody, but also fine melody plagiarism such as swapping and shift. MPD-S achieved a plagiarism detection accuracy of 98.7% for MIDI data, which is approximately 22.67% higher than that of the conventional plagiarism detection model.

Keywords

Music Plagiarism Detection, Melody Similarity, Convolutional Neural Network, Symbolic Domain, Siamese Network

Introduction

As the media industry embraces artificial intelligence techniques, a large amount of music is being created without an independent verification process, leading to an increase in music plagiarism [15]. Furthermore, it is difficult to objectively determine the presence or absence of plagiarism even through the intervention of experts because there are no clear criteria for plagiarism [6, 7]. Therefore, studies are needed to detect plagiarism automatically.
Various studies have attempted to detect plagiarized music using melody similarity [811]. The conventional plagiarism detection method consists of an audio-based detection method and a text-based detection method according to the data format. The audio-based detection method detects plagiarism by generating a spectrogram from audio data through a signal process such as short-time Fourier transform (STFT) and mel-frequency cepstral coefficients (MFCCs), and then calculating the similarity [12, 13]. However, as a signal such as noise is generated, information distortion and loss of spectrogram occur. On the other hand, the text-based detection technique detects plagiarism by measuring similarity through symbolic data such as Musical Instrument Digital Interface (MIDI) files composed of music information, not signals [1416]. However, text-based plagiarism detection techniques have low detection accuracy because they compare melodies composed of text strings on a one-to-one basis and react sensitively to even small changes.
Various studies have attempted to detect plagiarized music using melody similarity [811]. The conventional plagiarism detection method consists of an audio-based detection method and a text-based detection method according to the data format. The audio-based detection method detects plagiarism by generating a spectrogram from audio data through a signal process such as short-time Fourier transform (STFT) and mel-frequency cepstral coefficients (MFCCs), and then calculating the similarity [12, 13]. However, as a signal such as noise is generated, information distortion and loss of spectrogram occur. On the other hand, the text-based detection technique detects plagiarism by measuring similarity through symbolic data such as Musical Instrument Digital Interface (MIDI) files composed of music information, not signals [1416]. However, text-based plagiarism detection techniques have low detection accuracy because they compare melodies composed of text strings on a one-to-one basis and react sensitively to even small changes.
We propose the music plagiarism detection scheme (MPD-S) using a Siamese convolutional neural network (CNN) for the accurate detection of plagiarized music with subtle melodic changes. The MPD-S subdivides rhythm and note vectors generated from MIDI into 8-bar units and converts them into grayscale images. Then it calculates the similarity to the original music by entering two images generated from different pieces of music into the Siamese CNN. The more similar the two pieces of music, the closer the similarity is to 1, and plagiarized music is detected through similarity ranking. Given that collecting a large amount of plagiarized music is difficult, plagiarized music is generated by changing some melodies within the original music using four plagiarized-music generation algorithms.
The proposed MPD-S detects not only music in which entire bars are copied, but also music that has partially plagiarized only certain melodies via swapping and shifting. In addition, MPD-S calculates similarity between unlearned music, and can be used in other fields such as music recommendation.
The rest of this paper is organized as follows. Section 2 introduces previous studies on plagiarized-music detection techniques based on melody similarity. Section 3 describes the proposed MPD-S scheme. Section 4 explains the implementation of the MPD-S. Section 5 evaluates the plagiarism detection performance of MPD-S.

Related Work

Various studies on the detection of music plagiarism based on melody similarity have been conducted. In this section, audio-based detection techniques and text-based detection techniques are described as representative studies.

Audio-based Approach
Borkar et al. [12] suggested a technique to detect plagiarism using audio fingerprinting. They extracted peaks in spectrograms from audio and then generated fingerprinting. This technique detected plagiarism through segment matching with fingerprinting of the original music. However, when the noise level is increased, the similarity between music decreases, resulting in low accuracy. Sie et al. [13] proposed automatic music melody plagiarism detection (AMMPD), which detects plagiarism based on audio signals. The AMMPD identified portions having similar melodies by comparing two pieces of music using the path finding approach and detected the original music that was used to generate the plagiarized music. However, when pitch extraction is performed, noise is also extracted, reducing the accuracy of plagiarism detection.

Text-based Approach
De Prisco et al. [14] suggested a technique to detect plagiarized music using fuzzy vector similarity. After converting rhythms and pitches into vectors, similar melodies were identified using the Jaccard coefficient. If the similarity between melodies is higher than the threshold, it is denoted as plagiarism. However, it is difficult to detect music that has plagiarized only a few bars, not all of them. Mullensiefen and Pendzich [15] proposed a plagiarism detection technique based on multiple similarity algorithms. After extracting statistics about melodies from a large-scale popular music database, the technique detected plagiarism using similarity algorithms such as Ukkonen, Sum common, Edit distance, TF-IDF, and Tversky. However, it is difficult to detect plagiarism for music whose overall structure has been changed, such as transposition.
He et al. [16] suggested the MESMF technique that detects plagiarism for a melodic feature sequence using maximum weight matching and edit distance. After generating and subdividing melody sequences featuring pitch, duration, and downbeat from two pieces of music, MESMF calculated similarity using the edit distance. It then detected plagiarism by identifying a segment pair with high similarity through maximum weight matching. However, since MESMF detects plagiarism by assigning a duration weight, the accuracy is lowered.
The MPD-S proposed in this study converts rhythm and note vectors into images, and extracts similarity between plagiarized music and original music through the constructed Siamese CNN model. Thus, MPD-S detects not only transposition and note plagiarism, but also plagiarized music having minor changes in melody with high accuracy.
Table 1 presents a comparison between conventional plagiarized-music detection techniques and the MPD-S. The features compared are the domain of the used music data, features extracted from music information, and the algorithm used to detect plagiarism.

Table 1. Comparison between previous studies and MPD-S

 Study Domain Feature Algorithm Borkar et al. [12] Audio Spectrogram Comparison of fingerprinting Sie et al. [13] Audio Pitch Path finding approach De Prisco et al. [14] Text Rhythm, Pitch Fuzzy vector of Jaccard coefficient Mullensiefen and Pendzich [15] Text Melody Edit distance, Ukkonen, Sum common, TF-IDF, Tversky He et al. [16] Text Rhythm, Pitch, Downbeat Edit distance, Maximum weight matching MPD-S Text Rhythm, Pitch, Time signature Siamese CNN

Scheme of MPD-S

Architecture
MPD-S detects plagiarism by measuring the similarity between plagiarized music and original music. The overall scheme of MPD-S is illustrated in Fig. 1.

Fig. 1. Scheme of music plagiarism detection based on Siamese CNN.

The MPD-S is composed of a vectorization step, which extracts features from MIDI and converts them into vectors, an imaging step, which converts feature vectors into images, and an image similarity estimation step, which calculates the similarity between images. Vectorization
Vectorization is a preprocessing step for using the MIDI as the input of the Siamese CNN model. It involves feature extraction for extracting major music information, time signature scaling for standardizing different time signatures, and vector decomposition for decomposing MIDI in specific period units.

3.1.1 Feature extraction
In the feature extraction step, pitch, offset, and time signature, which are key pieces of music information, are extracted from MIDI as features [17]. The pitch, which means the frequency of sound, generates note vectors using octave number and note name. The offset generates rhythm vectors that are not affected by tempo change using the index of the pitch as stated in Equation (1). $O_{i+1}$ denotes the i+1th offset, $O_i$ denotes the ith offset, and $R_i$ denotes the ith rhythm vector.

(1)

3.1.2 Time signature scaling
The time signature scaling step performs scaling by 4/4, which is a common time signature to measure the similarity between music with different time signatures. In Equation (2), x denotes the note value per beat, y denotes the number of beats, and $R_i$ denotes the rhythm vector of the ith bar.

(2)

3.1.3 Vector decomposition
The vector decomposition step decomposes rhythm and note vectors in period units to detect music that has partially plagiarized only certain melodies. Here, one period is composed of eight bars.
Imaging
In the imaging step, rhythm and note vectors are converted to grayscale images of the digital audio workstation format to express the changes of rhythm and note vectors over time in a two-dimensional space. The width and height of the generated image indicate the rhythm vector and note vector, respectively.

Image Similarity Estimation
Finally, in the image similarity estimation step, the presence or absence of plagiarism is determined by calculating similarity between original music and plagiarized music using a Siamese CNN model composed of a convolution network and fully connected network. The Siamese CNN is a deep learning model that calculates the distance between major features extracted from each image [18]. The more similar the features are, the smaller the distance becomes [19].
The structure of the Siamese CNN model is shown in Fig. 2. It uses two images as input for similarity comparison. The CNN network shares two weights to enable the features of two images to map in the same space. When measuring the distance between the extracted features, the similarity is calculated using the L1 distance in order to be less affected by outliers. Here, as the two images are more similar, a value closer to 1 is output [20].

Fig. 2. Siamese CNN model architecture.

MPD-S Implementation

To accurately detect plagiarized music, we set up the MPD-S model in an environment composed of Intel Core i9-10800K and GeForce RTX 3090. We used 3,299 MIDIs, excluding those without vocal tracks or polyphony from the Lakh MIDI dataset [2123].

4.1 Vectorization

In the vectorization step, we generated rhythm and note vectors after extracting the pitch, offset, and time signature from the MIDIs of the monophonic vocal melody type. As the rhythm vectors are affected by the change of the time signature, we performed scaling to have the same time signature. Finally, vectors were decomposed by eight bars to detect music that plagiarized only certain melodies

4.1.1 Feature extraction
In the feature extraction step, the pitch, offset, and time signature were extracted from vocal melody tracks using music21, a Python-based MIDI analysis tool. Note vectors that have an integer value between 0 and 127 were generated using the octave number and note name of the extracted pitches. The rest note was set to −1. Furthermore, the rhythm vectors were generated by calculating the interval between two consecutive offsets.

4.1.2 Time signature scaling
When we checked the distribution of extracted time signatures, 4/4 had the highest number, with a total of 152,796, followed by 3/4 with 8,033 and 2/4 with 5,542. In contrast, 5/8 and 7/4 had the lowest numbers of 61 and 13, respectively. Therefore, we performed scaling based on the most frequent time signature, i.e., 4/4.

4.1.3 Vector decomposition
Vector decomposition generated a total of 37,920 data by sliding the rhythm and note vectors by four bars using a window size of eight bars to analyze melodies in detail. The total number of data used in this study was 17,965, excluding the cases where rest notes account for more than 80% of all notes or the same rhythm and note vectors exist in one piece of music.

4.2 Imaging
The total of the rhythm vectors generated through the vector decomposition step was 32. The width of the images was set as 320, considering down to the first decimal place of the vector. The height was set as 128 considering the integer representation range of note vectors. Furthermore, the grayscale images were generated using 0 and 255 to represent the positions of notes according to the change of rhythms.

4.3 Image Similarity Estimation
The architecture of the Siamese CNN model for detecting plagiarism by calculating the similarity between two pieces of music is presented in Table 2. Here, we set the learning rate and weight decay to 3e−4 and 6e−5, respectively. We used an Adam optimizer to generate optimized parameters for plagiarism detection.

Table 2. Siamese CNN model’s architecture

 Layer (type) Input Shape Output Shape Number of parameters Conv2d-1 (1, 128, 320) (16, 63, 159) 144 BatchNorm2d-1 (16, 63, 159) (16, 63, 159) 32 Conv2d-2 (16, 63, 159) (64, 31, 79) 9,216 BatchNorm2d-2 (64, 31, 79) (64, 31, 79) 128 Conv2d-3 (64, 31, 79) (128, 15, 39) 73,728 BatchNorm2d-3 (128, 15, 39) (128, 15, 39) 256 Conv2d-4 (128, 15, 39) (128, 13, 37) 147,456 BatchNorm2d-4 (128, 13, 37) (128, 13, 37) 256 Conv2d-5 (128, 13, 37) (256, 11, 35) 294,912 BatchNorm2d-5 (256, 11, 35) (256, 11, 35) 512 Linear-1 (256, 11, 35) 4,096 403,701,760 BatchNorm1d-1 4,096 4,096 8,192 Linear-2 4,096 512 2,097,152 BatchNorm1d-2 512 512 1,024 Linear-3 512 1 512

Performance Evaluation

The performance of the proposed MPD-S was evaluated to verify whether it accurately detects plagiarized music. For calculating accuracy, we used similarity ranking, which determines plagiarism through the highest similarity score, which is calculated by matching the original music and plagiarized music on a one-to-one basis [23].
Fig. 3 shows the training and validation loss when the plagiarized-music detection performance of the MPD-S using SimpleCNN [24], ResNet-50 [25], ResNeXt-50 [26], and EfficientNet [27] was evaluated. Overall, the training loss measured by four CNN models decreased gradually as the epoch increased. In contrast, when we validated the MPD-S, all the CNN models except SimpleCNN showed unstable loss values due to the occurrence of overfitting.

Fig. 3. (a) Training loss and (b) validation loss according to various CNN networks.

Table 3 presents the accuracy of plagiarized-music detection according to the CNN network. Compared to other CNN networks with complex structures, the SimpleCNN composed of five convolution layers achieved the highest performance of 98.7%. We confirmed that SimpleCNN is appropriate for extracting features from grayscale images having low resolutions and determining the presence or absence of plagiarism.

Table 3. Plagiarism detection accuracy of MPD-S depending on various CNN models
 CNN model Accuracy (%) SimpleCNN [24] 98.7 ResNet-50 [25] 95.65 ResNeXt-50 [26] 93.95 EfficientNet [27] 31.94

Table 4 presents the comparison of the detection performance between conventional text-based similarity comparison algorithms and MPD-S. Among the conventional algorithms, the edit distance achieved the highest plagiarism detection accuracy of 76.03%. By contrast, MPD-S achieved an accuracy of 98.7%, i.e., 22.67% higher, because it also accurately detected plagiarized music that had only changed melodies partially.

Table 4. Comparison of performance between conventional text-based similarity comparison algorithms and MPD-S
 Plagiarism detection Data type Accuracy (%) Sum common Vector 39.91 Ukkonen Vector 46.07 Tversky-equal Vector 46.58 Edit distance Vector 76.03 MPD-S Image 98.7

We used five types of datasets to further verify the performance of MPD-S. POP909 [28] is a vocal melody for a pop song, and Musedata [29] is composed of classical orchestral and symphony songs. MTC-LC-1.0, MTC-ANN-2.01, and MTC-FS-1.0 are German music datasets composed of various melodies and folk songs [30]. For all datasets, we only used MIDI data consisting of a single vocal melody and instrument.
Table 5 presents the plagiarism detection accuracy of MPD-S according to various dataset types. MDP-S detected plagiarism of various instruments in addition to the vocals with high accuracy of more than 94%.

Table 5.Plagiarism detection accuracy of MPD-S according to dataset type
 Dataset Number of the music data Number of the decomposition data Accuracy (%) POP909 [28] 909 1,285 94.79 Musedata [29] 439 615 99.68 MTC-LC-1.0 [30] 4,830 3,690 99.21 MTC-ANN-2.01 [30] 360 200 99 MTC-FS-1.0 [30] 4,120 2,526 99.05

Conclusion

We proposed the MPD-S to accurately detect the presence or absence of plagiarism in music that is widely distributed rapidly in the media market. The MPD-S detected plagiarized music by calculating the similarity between two pieces of music through a Siamese CNN after converting MIDI into grayscale images. To construct the MPD-S, a large number of plagiarized music pieces were generated from original music using four plagiarized-music generation methods. Our MPD-S successfully detected the melodies of plagiarized music that were changed only slightly with an accuracy of 98.7%. This accuracy was 22.67% higher than that of the conventional text-based similarity comparison algorithms. The proposed approach detected plagiarized music with a high accuracy for single vocal melodies. The MPD-S can be used in music recommendation and search fields by identifying music with high similarity.
However, it is difficult for MPD-S to detect plagiarism for music composed of multiple tracks and polyphonic vocal melodies. In addition, MPD-S does not consider the structure of music, so it cannot detect plagiarized music with different styles. Moreover, the plagiarism concerns mainly arise in popular genres such as jazz and hip-hop, which use multiple tracks simultaneously, and most of them exist in the audio data format.
Therefore, we plan to improve the performance of MPD-S so that it can detect not only plagiarism of various styles by considering the structure of the music, but also plagiarism of polyphonic vocal melodies. In addition, we will study a detection of plagiarized music that considers all tracks simultaneously by extracting features from the audio domain, and generation of creative songs

Author’s Contributions

Conceptualization, KP. Investigation and methodology, KP, SB. Project administration, SB, JJ, YSJ. Supervision, YSJ. Writing of the original draft, KP, SB. Writing of the review and editing, JJ, YSJ. Software, SB. Validation, SB, JJ. All the authors have proofread the final version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1A2C1088383).

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Name : Kyuwon Park
Affiliation : Dept. of Multimedia Engineering, Dongguk University, Seoul, Korea
Biography : Kyuwon Park received her B.A. degree in music composition from Chugye University for the Arts in Seoul, Korea in 2012 and M.A. degree in musical technology from Korea National University of Arts in Seoul, Korea in 2015. She is a Ph.D. student of the department of multimedia engineering at Dongguk University, Korea. Her current research interests include deep learning based music score reconstruction and music plagiarism detection.

Name : Seungyeon Baek
Affiliation : Dept. of Multimedia Engineering, Dongguk University, Seoul, Korea
Biography : Seungyeon Baek received his B.S. degree in multimedia engineering from Dongguk University in Seoul, Korea in 2021. He is a M.S. student of the department of multimedia engineering at Dongguk University, Korea. His current research interests include information security for IoT devices and malware detection using deep learning.

Name : Jueun Jeon
Affiliation : Dept. of Multimedia Engineering, Dongguk University, Seoul, Korea
Biography : Jueun Jeon received her B.S. and M.S. degrees in multimedia engineering from Dongguk University in Seoul, Korea in 2018 and 2020. She is a Ph.D. student of the department of multimedia engineering at Dongguk University, Korea. Her current research interests include information security for cloud computing and the Internet of Things (IoT).

Name : Young-Sik Jeong
Affiliation : Dept. of Multimedia Engineering, Dongguk University, Seoul, Korea
Biography : Young-Sik Jeong received his B.S. degree in mathematics and his M.S. and Ph.D. degrees in computer science and engineering from Korea University in Seoul, Korea in 1987, 1989, and 1993, respectively. He was a Professor in the Department of Computer Engineering at Wonkwang University, Korea from 1993 to 2012. He worked and conducted research at Michigan State University and Wayne State University as a Visiting Professor in 1997 and 2004. He currently works in the Department of Multimedia Engineering at Dongguk University, Korea. His research interests include multimedia cloud computing, information security for cloud computing, and the Internet of Things.

References

[1] J. P. Briot and F. Pachet, “Deep learning for music generation: challenges and directions,” Neural Computing and Applications, vol. 32, no. 4, pp. 981-993, 2020.
[2] A. Mohanarathinam, S. Kamalraj, G. K. D. Prasanna Venkatesan, R. V. Ravi, and C. S. Manikandababu, “Digital watermarking techniques for image security: a review,” Journal of Ambient Intelligence and Humanized Computing, vol. 11, no. 8, pp. 3221-3229, 2020.
[3] M. Miric and L. B. Jeppesen, “Does piracy lead to product abandonment or stimulate new product development? Evidence from mobile platform‐based developer firms,” Strategic Management Journal, vol. 41, no. 12, pp. 2155-2184, 2020.
[4] R. Brovig-Hanssen and E. Jones, “Remix’s retreat? Content moderation, copyright law and mashup music,” New Media & Society, 2021. https://doi.org/10.1177/14614448211026059
[5] N. Wang, H. Xu, F. Xu, and L. Cheng, “The algorithmic composition for music copyright protection under deep learning and blockchain,” Applied Soft Computing, vol. 112, article no. 107763, 2021. https://doi.org/10.1016/j.asoc.2021.107763
[6] J. P. Fishman, “Originality's other path,” California Law Review, vol. 109, no. 3, pp. 861-916, 2021.
[7] E. Ranger-Murdock, “‘Blurred Lines’ to ‘Stairway to Heaven’: applicability of selection and arrangement infringement actions in musical compositions,” UCLA Law Review, vol. 67, no. 4, pp. 1066-1105, 2020.
[8] M. Sheikh Fathollahi and F. Razzazi, “Music similarity measurement and recommendation system using convolutional neural networks,” International Journal of Multimedia Information Retrieval, vol. 10, no. 1, pp. 43-53, 2021.
[9] B. Jia, J. Lv, X. Peng, Y. Chen, and S. Yang, “Hierarchical regulated iterative network for joint task of music detection and music relative loudness estimation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1-13, 2020.
[10] R. De Prisco, D. Malandrino, G. Zaccagnino, and R. Zaccagnino, “A computational intelligence text-based detection system of music plagiarism,” in Proceedings of 2017 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, China, 2017, pp. 519-524.
[11] V. Velardo, M. Vallati, and S. Jan, “Symbolic melodic similarity: State of the art and future challenges,” Computer Music Journal, vol. 40, no. 2, pp. 70-83, 2016.
[12] N. Borkar, S. Patre, R. S. Khalsa, R. Kawale, and P. Chakurkar, “music plagiarism detection using audio fingerprinting and segment matching,” in Proceedings of 2021 Smart Technologies, Communication and Robotics (STCR), Sathyamangalam, India, 2021, pp. 1-4.
[13] M.S. Sie, C.C. Chiang, H.C. Yang, and Y.L. Liu, “A novel method of plagiarism detection for music melodies,” International Journal of Artificial Intelligence and Applications, vol. 8, no. 5, pp. 15-32, 2017.
[14] R. De Prisco, D. Malandrino, G. Zaccagnino, and R. Zaccagnino, “Fuzzy vectorial-based similarity detection of music plagiarism,” in Proceedings of 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 2017, pp. 1-6.
[15] D. Müllensiefen and M. Pendzich, “Court decisions on music plagiarism and the predictive value of similarity algorithms,” Musicae Scientiae, vol. 13, no. 1(suppl), pp. 257-295, 2009.
[16] T. He, W. Liu, C. Gong, J. Yan, and N. Zhang, “Music plagiarism detection via bipartite graph matching,” 2021 [Online]. Available: https://arxiv.org/abs/2107.09889.
[17] J. Calvo-Zaragoza, J. Hajic, and A. Pacha, “Understanding optical music recognition,” ACM Computing Surveys, vol. 53, no. 4, article no. 77, 2021. https://doi.org/10.1145/3397499
[18] J. Bromley, I. Guyon, Y. LeCun, E. Sackinger, and R. Shah, “Signature verification using a "Siamese" time delay neural network,” Advances in Neural Information Processing Systems, vol. 6, pp. 737-744, 1993.
[19] X. Zhou, W. Liang, S. Shimizu, J. Ma, and Q. Jin, “Siamese neural network based few-shot learning for anomaly detection in industrial cyber-physical systems,” IEEE Transactions on Industrial Informatics, vol. 17, no. 8, pp. 5790-5798, 2021.
[20] J. Kim, J. Park, M. Shin, J. Lee, and N. Moon, “The method for generating recommended candidates through prediction of multi-criteria ratings using CNN-BiLSTM,” Journal of Information Processing Systems, vol. 17, no. 4, pp. 707-720, 2021.
[21] C. Raffel, Learning-based Methods for Comparing Sequences, with Applications to Audio-to-Midi Alignment and Matching. New York, NY: Columbia University, 2016.
[22] D. Zeng, M. E. He, Z. Zhou, and C. Tang, “An interactive genetic algorithm with an alternation ranking method and its application to product customization,” Human-centric Computing and Information Sciences, vol. 11, article no. 15, 2021. https://doi.org/10.22967/HCIS.2021.11.015
[23] D. Zeng, M. E. He, Z. Zhou, and C. Tang, “An interactive genetic algorithm with an alternation ranking method and its application to product customization,” Human-centric Computing and Information Sciences, vol. 11, article no. 15, 2021. https://doi.org/10.22967/HCIS.2021.11.015
[24] Y. Zeng, R. Zhang, L. Yang, and S. Song, “Cross-domain text sentiment classification method based on the CNN-BiLSTM-TE model,” Journal of Information Processing Systems, vol. 17, no. 4, pp. 818-833, 2021.
[25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 770-778.
[26] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 2017, pp. 5987-5995.
[27] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, 2019, pp. 6105-6114.
[28] Z. Wang, K. Chen, J. Jiang, Y. Zhang, M. Xu, S. Dai, X. Gu, and G. Xia, “POP909: a pop-song dataset for music arrangement generation,” 2020 [Online]. Available: https://arxiv.org/abs/2008.07142.
[29] N. Boulanger-Lewandowski, Y. Bengio, and R. Vincent, “Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription,” in Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, Scotland, 2012.
[30] P. van Kranenburg, B. Janssen, and A. Volk, The Meertens Tune Collections: The Annotated Corpus (MTC-ANN) Versions 1.1 and 2.0.1. Amsterdam, Netherlands: Meertens Instituut, 2016.

Kyuwon Park, Seungyeon Baek, Jueun Jeon, and Young-Sik Jeong*, Music Plagiarism Detection Based on Siamese CNN, Article number: 12:38 (2022) Cite this article 2 Accesses