홈으로ArticlesAll Issue
ArticlesFile Recovery Method in NTFS-Based Damaged RAID System
  • Jong-Hyun Choi and Sangjin Lee*

Human-centric Computing and Information Sciences volume 12, Article number: 40 (2022)
Cite this article 1 Accesses
https://doi.org/10.22967/HCIS.2022.12.040

Abstract

Due to the recent demand for mass storage devices, a redundant array of independent disks (RAID) is used in network-attached storage (NAS), direct-attached storage (DAS), servers, and workstations in addition to laptops and PCs. RAID makes multiple disks into volumes, and alternately stores stripe sizes on member disks. Due to these characteristics, RAID systems create several research issues in digital forensics. One of them, a damaged RAID system, is a case where the RAID configuration information is known, but some member disks are lost. The damaged RAID system has lost some member disks, so it stores a striped filesystem and files when reassembled into volumes. Striped file systems and files are distinctive forms in which data is fragmented within volumes so that meaningful data must be found in the fragmented data. This form is not supported by previous research or other digital forensics tools, and is unknown. In this paper, targeting the NTFS file system, which is the most used file system, we propose and verify a file recovery method from a damaged RAID system by combining RAID reconstruction, file system analysis, striped file system analysis, file carving, and striped file analysis.


Keywords

Damaged RAID, Striped File System, Striped File, File Recovery


Introduction

Large storage capacity devices have increased through improved storage capacities attributed to advances in information and communications technology (ICT). Storage manufacturers downsized the network-attached storage (NAS), direct-attached storage (DAS), server, and workstation and released and offered small office-home office (SOHO) products and home appliances. With ample file storage space, these products can store an even more significant number of files than other devices. Therefore, these devices have become the primary targets for analysis in digital forensic investigations. A redundant array of independent disk (RAID) combines multiple disks to create volumes [1] . There are hardware and software RAID methods for creating volumes, and most of such RAID formats are public [2, 3]. RAID stores RAID configuration information in the metadata section. The RAID configuration information consists of the RAID level, number of disks, disk order, stripe size, and stripe map [1, 2]. RAID 0, 1, 5, 6, etc., are its levels. The number of disks indicates the ones that compose RAID. The disk order refers to the member disk order in the RAID system. The stripe size is the minimum storage unit of the RAID system, and the member disk is alternately utilized up to this size to store data. Stripe sizes in kB of 4, 8, 16, 32, 64, 128, 256, 512 kB, etc., are used. The schematic diagram of RAID 0 is illustrated in Fig. 1.

Fig. 1. RAID level 0.


RAID has three disks in the order of Disks #1, #2, and #3 in Fig. 1. As shown in Fig. 1, the stripe size is used alternately in uniform size from Disks #1, #2, and Disk #3. The stripe map is only used in RAID 5 or 6, revealing the parity block pattern. RAID 5 is categorized into forwarding parity, backward parity, forward dynamic parity, and backward dynamic parity, respectively. Fig. 2 shows the stripe map type according to the block pattern for RAID level 5. Even for the same RAID level, the block order and parity block order differ according to the stripe map.

Fig. 2. Stripe map of RAID level 5.


RAID system creates a volume by combining member disks with RAID configuration information, and the volume has a filesystem. The file system stores directories and files. Since the RAID system alternately stores data on disks by a specific size, the filesystem and files existing therein are physically distributed and stored on the disks. Due to the nature of these RAID systems, a research issue is created when a member disk becomes defective or lost. A member disk that is damaged in a RAID system is called a damaged RAID system. A damaged RAID system has all data, including volumes, filesystems, and files, damaged, and as far as we know, no research has been done on damaged RAID systems. Thus, we studied file recovery in a damaged RAID environment by combining RAID reconstruction, file system analysis, fragmented file system analysis, file carving, and corrupted file analysis techniques from the existing digital forensics fields.

Analysis of member disk structure of a damaged RAID system.

Design and propose file recovery in a damaged RAID system.

Provide damaged RAID system dataset using NTFS.

Compare and summarize the proposed system to show that it is superior to the traditional system. The rest of this paper is organized as follows. Section 2 addresses related works. In Section 3, we deal with the internal structure of the damaged RAID system and explain that the volume, filesystem, and file are striped as the member disk is lost. It also describes the file recovery procedure from a damaged RAID system. Then Section 4 presents the dataset and experimental results. Finally, in Section 5, we discuss conclusions.


Related Work

This section discusses existing related research studies for damaged RAID systems with research issues and challenges. Then we show the key considerations for the proposed system and address the challenges of existing studies.

Seminal Contribution
As far as we know, there is no research on damaged RAID systems, so we compared whether the damaged RAID system environment was considered in RAID system analysis and file recovery.

2.1.1 RAID system analysis
The RAID system requires the configuration information and member disks to form an undamaged volume, but one of them could be missing. The RAID compositions, according to the RAID configuration information and member disks, are classified as follows:

Type A: Both RAID configuration information and member disks are present.

Type B: Missing RAID configuration information with member disks present.

Type C: RAID configuration information is present with partially lost member disks.

In the first composition, where both RAID configuration information and member disks are present, the distinctively stored structure for each RAID system can be analyzed and parsed with the RAID configuration information to be combined with the member disks and reconstruct RAID. Moulton [4] first studied the RAID reconstruction method and introduced RAID types such as RAID 0, 1, and 5. This paper only describes basic RAID types. Choi et al. [5] performed a structural analysis of the Linux-based hybrid RAID and extracted the RAID configuration information stored in the member disks to study the RAID reconstruction method. Kim et al. [6] performed a structural analysis of the storage space of windows and suggested a method to rebuild RAID and a digital forensic investigation methodology. Hilgert et al. [7, 8] analyzed BTRFS, which is a pooled storage file system, and RAID supported by BTRFS.
The RAID configuration information must be estimated in the second composition, where all member disks are present without the RAID configuration information. Zoubek et al. [9] presented a method to estimate the RAID configuration information using the heuristics method based on the block-entropy value. Additionally, ACELab [10] presented a method of estimating the RAID configuration information using the file histogram.
The third composition signifies a damaged RAID system, where RAID configuration information is present, but the member disk is partially missing. This is a complicated composition in which the filesystem and file are all damaged, without any studies having examined this composition yet. A damaged RAID system results from occasional hardware-related issues in RAID systems, such as broken disks.
Table 1 is a related study by RAID type. Types A and B have been studied, but Type C, a damaged RAID system, has not been. Therefore, research on a damaged RAID system from digital forensics is necessary.

Table 1. Related works by RAID system type
RAID system type RAID configuration information Member disks Related works
A Existence Existence Moulton [4], Choi et al. [5],
Kim et al. [6], Hilgert et al. [7, 8]
B Missing Existence Zoubek [9], ACELab [10]
C Existence Partially lost -

2.1.2 File recovery
File system analysis has been one of the oldest challenges in digital forensics. The file system manages and stores files. In the metadata section of a file system, name, extension, allocation area, and time information are stored. Files are stored in a specific part of a file system and are stored contiguously or fragmented. In general, the file can be recovered with the information in the metadata section of a filesystem, but if the metadata section is damaged, the file is recovered by carving it based on its characteristics. Since each file system has a different structure and mechanism, much research has been done to increase the recovery rate of fragmented files even by a little [1117]. This is important because it can recover potential evidence in digital forensics for an investigation. The NTFS filesystem, which is the analysis target in this study, stores the filesystem metadata in the master file table (MFT). MFT is a collection of MFT entries, and the file metadata are recorded in the MFT entry, where the file name, extension, size, and allocated area for the file can be identified, while data deletion can be determined through the flag value. The MFT entry has the “FILE” as a signature, through which identification as an MFT entry is possible.
In addition, the recovered files also have a challenge in that the recovered file may be corrupted. A typical case of a corrupted file is when another file has overwritten the front or back of the file. Damaged file recovery is another age-old challenge in digital forensics, with corrupted files requiring data extraction by analyzing the file’s unique format. Corrupted file recovery is challenging as it must be analyzed and implemented for each file format.
For this reason, many researchers have studied one file format or one type of file format. Existing researchers have studied how to extract data from damaged files, such as document [18], multimedia [1923], database [24], and Docker files [25]. However, these studies address a recovery of damaged files in a typical system and do not consider the environment in a damaged RAID system. A compromised RAID system is more complex than a typical one and is a problematic research issue.


2.2 Key Considerations
The primary considerations of the proposed system are depicted as follows:

Integrity: A file recovery system should recover a file from a damaged system and extract meaningful data from the file. There should be no unauthorized data access, modification, or deletion in this case.

Reliability: The file recovery system must ensure that data is not tampered with or contains unintended errors during data processing.

Correctness: The software must behave exactly as the algorithm does.

Scalability: Various RAID systems and file systems exist. Instead of proposing a new algorithm every time a new RAID or file system is released, it should be able to apply the existing algorithm.


Proposed System

The included RAID system in Windows 10, which is the most used operating system globally, was used as the experiment environment. Only the RAID-0 and RAID-5 volume in Windows were examined, and NTFS was used as the filesystem. In addition, the file size had to be assumed to be larger than the stripe size; otherwise, data could not be recovered when the corresponding block was lost.

Experimental Environment
The included RAID system in Windows 10, which is the most used operating system globally, was used as the experiment environment. Only the RAID-0 and RAID-5 volume in Windows were examined, and NTFS was used as the filesystem. In addition, the file size had to be assumed to be larger than the stripe size; otherwise, data could not be recovered when the corresponding block was lost.

Experimental Environment
A damaged RAID system is a one characterized by non-recoverable member disk loss. RAID 0 indicates one or more member disk loss and two or more for RAID 5. When only one member disk is lost in RAID 5, disk recovery is possible through parity XOR calculation. As such, the case of having only one disk loss was excluded.
The number of disks in the RAID configuration and order of lost member disks could be identified, as knowledge of the RAID configuration information was assumed. Thus, if the lost member disks are replaced with 0x00, and the member disks and RAID are reconstructed, the volume of the damaged RAID system could be obtained.
The damaged RAID system is illustrated in Figs. 3 and 4, while RAID 0 is shown in Fig. 3, where member disk #2 is lost. When member disk #2 is replaced with 0x00 and RAID is reconstructed with member disks #1 and #3, the unobtainable data area could be identified, as shown on the right side of Fig. 3. Then RAID 5 is shown in Fig. 4, where member disks #3 and #4 are lost. RAID 5 has a lower storage space compared to RAID 0 as it has parity blocks. Upon replacing the lost member disk with 0 x 00 and reconstructing RAID in RAID 5 too, the lost areas due to member disks #3 and #4 could be identified, as shown on the right side of Fig. 4.

Fig. 3. Damaged RAID system (RAID 0).


Fig. 4. Damaged RAID system (RAID 5).


The RAID system alternately stores data up to the stripe size, hence the damaged RAID system has striped volumes. The striped volume area is where the filesystem and file are stored, and so the filesystem and file are also striped. The above is the distinctive property of a damaged RAID system. In summary, the damaged RAID system has striped volumes due to its alternating storing property, and only damaged filesystems and damaged files can be extracted from these volumes.

Fig. 5. Normal RAID system internals.


The RAID system that uses a normal NTFS filesystem and damaged NTFS filesystem are compared in this section. Fig. 5 illustrates the internal structure of a normal RAID system. The filesystem in this RAID system is intact, hence the file name, extension, size, and allocated area for the file could be parsed at the MFT of the NTFS filesystem. Furthermore, deleted files could be identified and found through the MFT entry flag. Normal files can be extracted, while deleted files can be recovered based on this knowledge. After that, files could be recovered through file carving that uses a file signature at the unallocated area. Fig. 5 shows that files #1 and #2 were extracted based on MFT information, and file carving was performed on the unallocated area.
The damaged RAID system has damaged volumes that constitute a damaged filesystem, as shown in Fig. 6. Moreover, all files stored in the filesystem (MFT, files #1 & #2, etc.) and areas (e.g., unallocated area) become striped. In contrast to the normal RAID system (Fig. 5), all files in a damaged RAID system (Fig. 6) are damaged, hence unimpaired information is unobtainable. Since the MFT was damaged, the allocated area of file #2 could not be found. Thus, only file #1 could be extracted in the striped form. Also, the unallocated area is an area other than the allocation area parsed by the MFT. In a striped file system, since the unallocated area is also striped, file carving cannot be performed normally, and partial file recovery is possible.

Fig. 6. Damaged RAID system internals.


File Recovery Procedure
The file recovery method for a damaged NTFS-based RAID system can be divided into four phases, as depicted in Fig. 7, with each phase described as follows.

3.3.1 Phase 1: Preparation
The preparation phase is reconstructing the RAID system. Since the RAID configurations (RAID level, the number of disks, disk order, stripe size, and stripe map) are known, by assumption, RAID could be reconstructed based on this information. Firstly, the lost disk in a RAID system can be identified through the number of disks and disk order values. The disk is then replaced with 0x00, and after that, RAID can be reconstructed with other member disks and RAID configuration information. The damaged volumes could be obtained in Phase 1.

3.3.2 Phase 2: Filesystem metadata-based file recovery
Phase 2 is recovering files based on filesystem metadata. The MFT entry is not continuous as the MFT is striped. Therefore, the MFT entry must be parsed through file carving. The entire damaged filesystem is searched using the “FILE” signature of the MFT entry to collect the MFT entries. After that, the file information (file name, extension, & size, allocated area for the file, etc.) is parsed at the MFT entry. The file allocation information is within the file information, hence the cluster stored in the filesystem can be accurately identified. All files in the MFT entry are recovered in this method, with all recovered files in a striped file form.

Fig. 7. Procedure of file recovery on NTFS-based damaged RAID system.


3.3.3 Phase 3: File carving
Phase 3 is file carving. The unallocated area in a damaged RAID system is different from a general system. As such, all data in the filesystem have to be considered unallocated areas except for the cluster approached in Phase 2. Then file carving is performed based on the file signature to recover files, with all recovered files in a striped file form.

3.3.4 Phase 4: Striped file recovery
The striped files in a damaged RAID system were extracted, and this phase discerns and extracts meaningful data from the striped files. The damaged RAID system or striped file form are not applicable to the general tool, so data could not be extracted. Instead, to extract meaningful data from a striped file, a developed tool specialized for each file format is required.


Performance Evaluation

The recovery rate of the damaged RAID system depends on the number of member disks, order of the damaged member disk, and RAID level. Because of this, a variety of damaged RAID systems are made in the dataset. For example, for the damage of RAID 0 where three disks are configured, a total of six cases are present because three cases have damage in one member disk, with another three cases in having damage in two member disks. In addition, for the damage of RAID 0 where four disks are configured, a total of 14 cases are present because four cases have damage in one member disk, six cases in two member disks, and four cases in three member disks. RAID 5 is composed of at least three disks, and is characterized to be recovered with a parity block computation if one member disk is lost. Accordingly, two member disks should be lost to damage RAID 5. Thus, there are three cases of the damage to RAID 5 consisting of three disks, and 10 cases of the damage to RAID 5 consisting of four disks for the case of the damage to RAID 5.
As described above, a variety of damaged RAID systems were made to configure a dataset according to the RAID system characteristics, and then performance of the proposed algorithm and commercial tools were compared. In Section 4.1, the dataset environment, and comparison tool & method are described. The results are compared in Section 4.2.

Dataset
The dataset [26] uses NTFS of Windows 10, which is the most widely used in the world. Also, RAID 0 and 5 were used for the RAID level, and in order to compare the recovery rate according to the number of member disks, in RAID 0 it was 2–4, and RAID 5 was 3–5. In addition, 11 file formats were used to compare recovery rates according to files. The detailed specifications of the dataset are as follows.

OS: Windows 10 21H1 (OS Build 19043 1586)

Filesystem: NTFS

RAID level: RAID 0, RAID 5

Number of images: 60 (RAID 0–22, RAID 5–38)

Number of files: 10 each of 11 file formats (DOC, XLS, PPT, DOCX, XLSX, PPTX, PDF, PNG, JPG, ZIP, and MP4)

The comparison tools were R-Studio (v8.16 build 180499) [27], MiniTool (v10.1) [28], and Recovery My Files (v6.3.2.2552) [29]. The number of recovered files for each is compared for results comparison. The experimental images were labeled as “RAID level-RAID disk number-lost member disks” for the organization. For instance, “R0-4-234” means that four RAID disks were in RAID level 0, and the lost disks were Disks #2, #3, and #4.

Results
The RAID 0 experimental results are summarized in Table 2, and RAID 5 experiment results in Table 3. Our proposed method generally shows better results than commercial tools.

Table 2. Performance results in RAID 0
RAID image Proposed method R-Studio MiniTool Recovery My Files
# of recovery files Recovery rate (%) # of recovery files Recovery rate (%) # of recovery files Recovery rate (%) # of recovery files Recovery rate (%)
R0-2-1 60 54.5 22 20 25 22.7 19 17.3
R0-2-2 50 45.5 50 45.5 50 45.5 50 45.5
R0-3-1 71 64.5 40 36.4 45 40.9 37 33.6
R0-3-2 70 63.6 34 30.9 37 33.6 22 20
R0-3-3 79 71.8 79 71.8 79 71.8 79 71.8
R0-3-12 31 28.2 10 9.1 8 7.3 8 7.3
R0-3-13 40 36.4 15 13.6 15 13.6 11 10
R0-3-23 39 35.5 13 11.8 18 16.4 13 11.8
R0-4-1 82 74.5 18 16.4 20 18.2 26 23.6
R0-4-2 69 62.7 40 36.4 52 47.3 30 27.3
R0-4-3 86 78.2 45 40.9 50 45.5 31 28.2
R0-4-4 93 84.5 47 42.7 55 50 35 31.8
R0-4-12 41 37.3 17 15.5 20 18.2 15 31.6
R0-4-13 58 52.7 25 22.7 27 24.5 18 16.4
R0-4-14 65 59.1 23 20.9 27 24.5 20 18.2
R0-4-23 45 40.9 33 30 33 30 31 28.2
R0-4-24 52 47.3 26 23.6 26 23.6 26 23.6
R0-4-34 69 62.7 69 62.7 69 62.7 69 62.7
R0-4-123 17 15.5 5 4.5 7 6.4 3 2.7
R0-4-124 24 21.8 3 2.7 5 4.5 3 2.7
R0-4-134 41 37.3 13 11.8 15 13.6 8 7.3
R0-4-234 28 25.5 28 25.5 28 25.5 28 25.5

In a damaged RAID system, the MFT, where the file system metadata is stored, is fragmented and stored, but commercial tools do not take this into account, and so if the first disk is lost (e.g., R0-2-1, R0-3-1, R0-4-1, R0-4-12, etc.), a low recovery rate is shown. However, the method proposed in this study shows a superiorly high recovery rate of files than commercial tools, even if the first disk is lost.
As a result of the experiment, the striped file was recovered. Striped files require separate file recovery tools, depending on the file type to extract the data. However, the existing corrupted file recovery tools were not applicable because they do not take striped files into account. Therefore, there is a need for a damaged file recovery tool based on the characteristics of striped files, unlike existing damaged files, which warrants further research.

Table 2. Performance results in RAID 0
RAID image Proposed method R-Studio MiniTool Recovery My Files
# of recovery files Recovery rate (%) # of recovery files Recovery rate (%) # of recovery files Recovery rate (%) # of recovery files Recovery rate (%)
R5-3-12 31 28.2 10 9.1 8 7.3 8 7.3
R5-3-13 40 36.4 15 13.6 15 13.6 11 10
R5-3-23 39 35.5 13 11.8 18 16.4 13 11.8
R5-4-12 41 37.3 17 15.5 20 18.2 15 31.6
R5-4-13 58 52.7 25 22.7 27 24.5 18 16.4
R5-4-14 65 59.1 23 20.9 27 24.5 20 18.2
R5-4-23 45 40.9 33 30 33 30 31 28.2
R5-4-24 52 47.3 26 23.6 26 23.6 26 23.6
R5-4-34 69 62.7 69 62.7 69 62.7 69 62.7
R5-4-123 17 15.5 5 4.5 7 6.4 3 2.7
R5-4-124 24 21.8 3 2.7 5 4.5 3 2.7
R5-4-134 41 37.3 13 11.8 15 13.6 8 7.3
R5-4-234 28 25.5 28 25.5 28 25.5 28 25.5
R5-5-12 62 56.4 27 24.5 29 26.4 20 18.2
R5-5-13 66 60 30 27.3 31 28.2 22 20
R5-5-14 64 58.2 25 22.7 30 27.3 21 19.1
R5-5-15 60 54.5 22 20 28 25.5 20 18.2
R5-5-23 70 63.6 26 23.6 33 30 25 22.7
R5-5-24 68 61.8 29 26.4 31 28.2 28 25.5
R5-5-25 64 58.2 26 23.6 30 27.3 21 19.1
R5-5-34 72 65.5 30 27.3 35 31.8 23 20.9
R5-5-35 68 61.8 29 26.4 28 25.5 25 22.7
R5-5-45 66 60 26 23.6 34 30.9 24 21.8
R5-5-123 44 40 18 16.4 16 14.5 16 14.5
R5-5-124 42 38.2 15 13.6 16 14.5 13 11.8
R5-5-125 38 34.5 12 10.9 15 13.6 13 11.8
R5-5-134 46 41.8 20 18.2 22 20 16 14.5
R5-5-135 42 38.2 19 17.3 20 18.2 13 11.8
R5-5-145 40 36.4 16 14.5 18 16.4 12 10.9
R5-5-234 50 45.5 20 18.2 24 21.8 16 14.5
R5-5-235 46 41.8 20 18.2 21 19.1 15 13.6
R5-5-245 44 40 20 18.2 20 18.2 15 13.6
R5-5-345 48 43.6 16 14.5 25 22.7 15 13.6
R5-5-1234 24 21.8 24 21.8 24 21.8 24 21.8
R5-5-1235 20 18.2 20 18.2 20 18.2 20 18.2
R5-5-1245 18 16.4 18 16.4 18 16.4 18 16.4
R5-5-1345 22 20 22 20 22 20 22 20
R5-5-2345 26 23.6 26 23.6 26 23.6 26 23.6


Conclusion

In this study, a file recovery system for a damaged RAID system that uses the NTFS filesystem is proposed. A RAID system alternately stores a stripe on member disks. Due to this attribute, if a member disk is lost, the RAID system will be damaged. In a RAID system, there are filesystems and files. Generally, files’ metadata is stored at the front of the file system, and files exist at the rear. With a damaged RAID system, the metadata section and files are fragmented and striped. Thus, it was impossible to recover data from a damaged RAID system with previous methods.
We have proposed a file recovery procedure from corrupted RAID systems by linking RAID reconstruction, file system analysis, fragmented file system analysis, and file fragmentation. The file recovery procedure is as follows: (1) RAID reconstruction through RAID metadata information, (2) filesystem metadata carving and damaged file extraction, (3) damaged unallocated area extraction and file carving, and (4) meaningful data extraction from the extracted damaged file. We also made 60 damaged RAID system datasets considering the RAID characteristics, and proved the proposed method had a better file recovery rate than other commercial tools in these datasets. This study result can be useful in digital forensic or file recovery areas where files are recovered in a damaged system. In future works, RAID environments using filesystems other than NTFS will be analyzed, and a customized recovery tool for striped files will be developed.

Table 3. Performance results in RAID 5

RAID image Proposed method R-Studio MiniTool Recovery My Files
# of recovery files Recovery rate (%) # of recovery files Recovery rate (%) # of recovery files Recovery rate (%) # of recovery files Recovery rate (%)
R5-3-12 31 28.2 10 9.1 8 7.3 8 7.3
R5-3-13 40 36.4 15 13.6 15 13.6 11 10
R5-3-23 39 35.5 13 11.8 18 16.4 13 11.8
R5-4-12 41 37.3 17 15.5 20 18.2 15 31.6
R5-4-13 58 52.7 25 22.7 27 24.5 18 16.4
R5-4-14 65 59.1 23 20.9 27 24.5 20 18.2
R5-4-23 45 40.9 33 30 33 30 31 28.2
R5-4-24 52 47.3 26 23.6 26 23.6 26 23.6
R5-4-34 69 62.7 69 62.7 69 62.7 69 62.7
R5-4-123 17 15.5 5 4.5 7 6.4 3 2.7
R5-4-124 24 21.8 3 2.7 5 4.5 3 2.7
R5-4-134 41 37.3 13 11.8 15 13.6 8 7.3
R5-4-234 28 25.5 28 25.5 28 25.5 28 25.5
R5-5-12 62 56.4 27 24.5 29 26.4 20 18.2
R5-5-13 66 60 30 27.3 31 28.2 22 20
R5-5-14 64 58.2 25 22.7 30 27.3 21 19.1
R5-5-15 60 54.5 22 20 28 25.5 20 18.2
R5-5-23 70 63.6 26 23.6 33 30 25 22.7
R5-5-24 68 61.8 29 26.4 31 28.2 28 25.5
R5-5-25 64 58.2 26 23.6 30 27.3 21 19.1
R5-5-34 72 65.5 30 27.3 35 31.8 23 20.9
R5-5-35 68 61.8 29 26.4 28 25.5 25 22.7
R5-5-45 66 60 26 23.6 34 30.9 24 21.8
R5-5-123 44 40 18 16.4 16 14.5 16 14.5
R5-5-124 42 38.2 15 13.6 16 14.5 13 11.8
R5-5-125 38 34.5 12 10.9 15 13.6 13 11.8
R5-5-134 46 41.8 20 18.2 22 20 16 14.5
R5-5-135 42 38.2 19 17.3 20 18.2 13 11.8
R5-5-145 40 36.4 16 14.5 18 16.4 12 10.9
R5-5-234 50 45.5 20 18.2 24 21.8 16 14.5
R5-5-235 46 41.8 20 18.2 21 19.1 15 13.6
R5-5-245 44 40 20 18.2 20 18.2 15 13.6
R5-5-345 48 43.6 16 14.5 25 22.7 15 13.6
R5-5-1234 24 21.8 24 21.8 24 21.8 24 21.8
R5-5-1235 20 18.2 20 18.2 20 18.2 20 18.2
R5-5-1245 18 16.4 18 16.4 18 16.4 18 16.4
R5-5-1345 22 20 22 20 22 20 22 20
R5-5-2345 26 23.6 26 23.6 26 23.6 26 23.6


Author’s Contributions

Conceptualization, Investigation and methodology, Software, Formal analysis, JHC. Project administration, Supervision, SL. Validation, Writing of the original draft, Writing of the review and editing, Data curation, JHC, SL. All the authors have proofread the final version.


Funding

The authors declare that they have no competing interests.


Competing Interests

The authors declare that they have no competing interests.


Author Biography

Author
Name: Jong-Hyun, Choi
Affiliation: School of Cybersecurity, Korea University, Seoul, South Korea
Biography: He received the B.S. degree in Computer Science from KyungHee University. Moreover, he received the M.S. degree in 2014 and the Ph.D. in 2022 from Korea University's School of Cybersecurity. His research interests include server forensics, database forensics, and incident response.

Author
Name: Sangjin, Lee
Affiliation: School of Cybersecurity, Korea University, Seoul, South Korea
Biography: He received the Ph.D. degree from the Department of Mathematics, Korea University, in 1994. From 1989 to 1999, he was with the Electronics and Telecommunications Research Institute, Korea, as a Senior Researcher. He has been with the Digital Forensic Research Center, Korea University, since 2008. He is currently the President of the Division of Information Security. Korea University. He has authored or coauthored over 130 papers in various archival journals and conference proceedings and over 200 articles in domestic journals. His research interests include digital forensics, data processing, forensic framework, and incident response.


References

[1] D. A. Patterson, G. Gibson, and R. H. Katz, “A case for redundant arrays of inexpensive disks (RAID),” in Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, IL, 1988, pp. 109-116.
[2] SNIA, “Common RAID disk data format specification (version 2.0 revision 19),” 2009 [Online]. Available: https://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf.
[3] Linux RAID superblock formats [Online]. Available: https://raid.wiki.kernel.org/index.php/RAID_superblock_formats.
[4] S. A. Moulton, "RAID recovery: recover your PORN by Sight and Sound," 2009 [Online]. Available: https://defcon.org/images/defcon-17/dc-17-presentations/defcon-17-scott_moulton-raid_recovery.pdf.
[5] J. H. Choi, J. Park, and S. Lee, “Reassembling Linux‐based hybrid RAID,” Journal of Forensic Sciences, vol. 65, no. 3, pp. 966-973, 2020.
[6] J. Kim, S. Lee, and D. Jeong, “Digital forensic investigation methodology for storage space: based on the NIST digital forensic process,” Journal of Forensic Sciences, vol. 67, no. 3, pp. 989-1001, 2022.
[7] J. N. Hilgert, M. Lambertz, and D. Plohmann, “Extending the Sleuth Kit and its underlying model for pooled storage file system forensic analysis,” Digital Investigation, vol. 22, pp. S76-S85, 2017.
[8] J. N. Hilgert, M. Lambertz, and S. Yang, “Forensic analysis of multiple device BTRFS configurations using the Sleuth Kit,” Digital Investigation, vol. 26, pp. S21-S29, 2018.
[9] C. Zoubek, C., Seufert, and A. Dewald, “Generic RAID reassembly using block-level entropy,” Digital Investigation, vol. 16, pp. S44-S54, 2016.
[10] ACELab Team, “Making complex issues simple: a unique method to extract data from RAID with lost configuration,” 2019 [Online]. Available: https://blog.acelab.eu.com/making-complex-issues-simple-a-unique-method-to-extract-data-from-raid-with-lost-configuration.html.
[11] C. J. Veenman, “Statistical disk cluster classification for file carving,” in Proceedings of the 3rd International Symposium on Information Assurance and Security, Manchester, UK, 2007, pp. 393-398.
[12] S. L. Garfinkel, “Carving contiguous and fragmented files with fast object validation,” Digital Investigation, vol. 4, pp. 2-12, 2007.
[13] W. A. Bhat and M. A. Wani, “Forensic analysis of B-tree file system (BTRFS),” Digital Investigation, vol. 27, pp. 57-70, 2018.
[14] R. Nordvik, K. Porter, F. Toolan, S. Axelsson, and K. Franke, “Generic metadata time carving,” Forensic Science International: Digital Investigation, vol. 33, article no. 301005, 2020. https://doi.org/10.1016/j.fsidi.2020.301005
[15] M. Karresand, G. O. Dyrkolbotn, and S. Axelsson, “An empirical study of the NTFS cluster allocation behavior over time,” Forensic Science International: Digital Investigation, vol. 33, article no. 301008, 2020. https://doi.org/10.1016/j.fsidi.2020.301008
[16] V. van der Meer, H. Jonker, and J. van den Bos, “A contemporary investigation of NTFS file fragmentation,” Forensic Science International: Digital Investigation, vol. 38, article no. 301125, 2021. https://doi.org/10.1016/j.fsidi.2021.301125
[17] K. Porter, R. Nordvik, F. Toolan, and S. Axelsson, “Timestamp prefix carving for filesystem metadata extraction,” Forensic Science International: Digital Investigation, vol. 38, article no. 301266, 2021. https://doi.org/10.1016/j.fsidi.2021.301266
[18] A. L. Garrido and A. Peiro, “Recovering damaged documents to improve information retrieval processes,” Journal of Integrated OMICS, vol. 8, no. 3, pp. 53-55, 2018.
[19] W. Y. Lee, K. H. Kim, H. Yang, and Y. W. Ko, “Automatic reconstruction of deleted AVI video files composed of scattered and corrupted fragments,” Multimedia Tools and Applications, vol. 79, no. 37, pp. 28355-28367, 2020.
[20] N. I. Park, J. W. Lee, S. H. Lim, J. S. Byun, G. H. Na, O. Y. Jeon, and J. H. Lee, “Energy-based linear PCM audio recovery method of impaired MP4 file stored in dashboard camera memory,” Forensic Science International: Digital Investigation, vol. 39, article no. 301274, 2021. https://doi.org/10.1016/j.fsidi.2021.301274
[21] E. Altinisik and H. T. Sencar, “Automatic generation of H.264 parameter sets to recover video file fragments,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 4857-4868, 2021.
[22] B. Yoo, J. Park, S. Lim, J. Bang, and S. Lee, “A study on multimedia file carving method,” Multimedia Tools and Applications, vol. 61, no. 1, pp. 243-261, 2012.
[23] H. S. Heo, B. M. So, I. H. Yang, S. H. Yoon, and H. J. Yu, “Automated recovery of damaged audio files using deep neural networks,” Digital Investigation, vol. 30, pp. 117-126, 2019.
[24] J. Wagner, A. Rasin, and J. Grier, “Database forensic analysis through internal structure carving,” Digital Investigation, vol. 14, pp. S106-S115, 2015.
[25] S. Ge, M. Xu, T. Qiao, and N. Zheng, “A novel file carving algorithm for docker container logs recorded by JSON-file logging driver,” Forensic Science International: Digital Investigation, vol. 39, article no. 301272, 2021. https://doi.org/10.1016/j.fsidi.2021.301272
[26] Damaged RAID Dataset [Online]. Available: https://koreaoffice-my.sharepoint.com/:f:/g/personal/antar es_korea_edu/EgI292MSkC1AjdsTzaJcSvUBkfirRma6nj8tszFUXSN85w?e=0PyMcr
[27] R-Studio: Disk Recovery Software and Hard Drive Recovery Tool [Online]. Available: https://www.r-studio.com/?GGLAW640.
[28] MiniTool: Data Recovery Software [Online]. Available: https://www.minitool.com/data-recovery-software/free-for-windows.html.
[29] Recovery My Files: Advanced Data Recovery Softwarre [Online]. Available: https://getdata.com/recovermyfiles/recovery-checklist.php.

About this article
Cite this article

Jong-Hyun Choi and Sangjin Lee*, File Recovery Method in NTFS-Based Damaged RAID System, Article number: 12:40 (2022) Cite this article 1 Accesses

Download citation
  • Received31 October 2021
  • Accepted25 April 2022
  • Published30 August 2022
Share this article

Anyone you share the following link with will be able to read this content:

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords