The RAID Rebuild Process
Rebuilding a RAID is a complex software driven process that is built into many RAID systems. Rebuilding is used to recreate data on RAID arrays when disks in the array fail. RAID systems offer a degree of data protection from hard drive failure by allowing the RAID array to copy data from the failed disk to a spare drive while it is being replaced. A physical drive failure such as a head failure or a broken circuit board should not be confused with a logical drive failure. Logical failures are usually caused by some sort of data or filesystem corruption, this will not be corrected by installing another disk and rebuilding the array as the problem is nothing to do with the disks, but rather it relates to the structure of the data on them.
Using continuous built in hard drive monitoring software the RAID array will often be able to predict a hard drive failure and will flag this drive for replacement to the sysadmin. If the RAID array is configured to make use of a hot spare, a sector level copy will be carried out from the old drive to the hot spare, effectively making a duplicate copy of the drive about the fail. The hot spare then becomes the active drive replacing the drive that has been flagged as about to become faulty.
There are occasions when the built in hard drive monitoring process will not detect a drive failure until after the even has happened. This is quite normal as hard drives can often break without any type of warning first. In the circumstances, the RAID will start to automatically rebuild using the hot spare as a replacement for the hard drive that has failed. This is often an automatic process that may take a considerable time depending on the size of the RAID array and the number of disks that make up the RAID set. If no hot spare is present the RAID will continue to function but performance will be much slower as the RAID is having to rebuild the data from the broken disk on the fly by using data from the remaining disks and piecing it all together. It’s possible on many RAID systems to prioritise the rebuild process, but obviously there is a trade off in system performance. The higher the priority given the the rebuild, the quicker the rebuild will complete but the slower the RAID will function on other tasks in the meantime. Similarly if your RAID serves many users it’s often advisable to make the rebuild a low priority task in order the it’s users can continue to work whilst the rebuild completes.
All RAID arrays have built in rebuilding software including Dell PowerEdge, PowerVault and PERC, HP Proliant, IBM eServer and BladeCentre. I recommend you always configure your systems to use hot spares as it provides an extra layer of data protection and is something that the server will trigger automatically if any hard drive problems are detected. Data Clinic RAID data recovery services can be found here.