What are the different RAID levels for Linux / UNIX and Windows Server? Configure RAID and LVM in Linux.


 RAID and LVM


A Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) (RAID) is an term for data storage schemes that divide and/or replicate data among multiple hard drives. RAID can be designed to provide increased data reliability or increased I/O performance, though one goal may compromise the other.

It is normally used to spread data among several physical hard drives with enough redundancy that should any drive fail the data will still be intact. Once created a RAID array appears to be one device which can be used pretty much like a regular partition. There are several kinds of RAID but I will only refer to the two most common here.

The first is RAID-1 which is also known as mirroring. With RAID-1 it's basically done with two essentially identical drives, each with a complete set of data. The second, the one I will mostly refer to in this guide is RAID-5 which is set up using three or more drives with the data spread in a way that any one drive failing will not result in data loss. The Red Hat website has a great overview of the RAID Levels.

There is one limitation with Linux Software RAID that a /boot partition can only reside on a RAID-1 array.
Linux supports both several hardware RAID devices but also software RAID which allows you to use any IDE or SCSI drives as the physical devices. In all cases I'll refer to software RAID.

There are total 10 types of RAID levels:
  • RAID level 0
  • RAID level RAID level 1
  • RAID level 2
  • RAID level 3
  • RAID level 4
  • RAID level 5
  • RAID level 6
  • RAID level 10
  • RAID level 50
  • RAID level 0+1

Commonly used RAID levels for UNIX / Linux and Windows server

Following are commonly used RAID levels :


 
RAID level Minimum hard disks Suggested application Notes
RAID 0 - Striped Set without parity 2 Hard disks 1. Video Production and Editing
2. Image Editing
3. Any application requiring high bandwidth
Provides improved performance and additional storage but no fault tolerance from disk errors or disk failure. Any disk failure destroys the array, which becomes more likely with more disks in the array.
RAID 1 - Mirrored Set (2 disks minimum) without parity. 2 Hard disks 1. Office application
2. Financial application
3. Payroll application etc
Provides fault tolerance from disk errors and single disk failure. Increased read performance occurs when using a multi-threaded operating system that supports split seeks, very small performance reduction when writing. Array continues to operate so long as at least one drive is functioning
RAID 5 3 Hard disks 1. File and Application servers
2. Internet Web, E-mail servers
3. Intranet servers
Highest Read data transaction rate, Medium Write data transaction rate, Overall good (aggregate) transfer rate. drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive
RAID 10 (nested RAID 1+0) 4 Hard disks 1. Database server (such as Oracle / MySQL / MS-SQL) which requiring high performance and fault tolerance Provides fault tolerance and improved performance but increases complexity.

Managing RAID and LVM with Linux
LVM stands for Logical Volume Manager and is a way of grouping drives and/or partition in a way where instead of dealing with hard and fast physical partitions the data is managed in a virtual basis where the virtual partitions can be resized. The Red Hat website has a great overview of the Logical Volume Manager.
There is one limitation that a LVM cannot be used for the /boot.
There are now two version of LVM for Linux:
  • LVM 2 - The latest and greatest version of LVM for Linux.
    LVM 2 is almost completely backward compatible with volumes created with LVM 1. The exception to this is snapshots (You must remove snapshot volumes before upgrading to LVM 2)
    LVM 2 uses the device mapper kernel driver. Device mapper support is in the 2.6 kernel tree and there are patches available for current 2.4 kernels.
  • LVM 1 - The version that is in the 2.4 series kernel,
    LVM 1 is a mature product that has been considered stable for a couple of years. The kernel driver for LVM 1 is included in the 2.4 series kernels, but this does not mean that your 2.4.x kernel is up to date with the latest version of LVM. Look at the README for the latest information about which kernels have the current code in them



Initial set of a RAID-5 array

I recommend you experiment with setting up and managing RAID and LVM systems before using it on an important filesystem. One way I was able to do it was to take old hard drive and create a bunch of partitions on it (8 or so should be enough) and try combining them into RAID arrays. In my testing I created two RAID-5 arrays each with 3 partitions. You can then manually fail and hot remove the partitions from the array and then add them back to see how the recovery process works. You'll get a warning about the partitions sharing a physical disc but you can ignore that since it's only for experimentation. In my case I have two systems with RAID arrays, one with two 73G SCSI drives running RAID-1 (mirroring) and my other test system is configured with three 120G IDE drives running RAID-5. In most cases I will refer to my RAID-5 configuration as that will be more typical.
I have an extra IDE controller in my system to allow me to support the use of more than 4 IDE devices which caused a very odd drive assignment. The order doesn't seem to bother the Linux kernel so it doesn't bother me. My basic configuration is as follows:
hda 120G drive
hdb 120G drive
hde 60G boot drive not on RAID array
hdf 120G drive
hdg CD-ROM drive
The first step is to create the physical partitions on each drive that will be part of the RAID array. In my case I want to use each 120G drive in the array in it's entirety. All the drives are partitioned identically so for example, this is how hda is partitioned:
Disk /dev/hda: 120.0 GB, 120034123776 bytes
16 heads, 63 sectors/track, 232581 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1      232581   117220792+  fd  Linux raid autodetect
So now with all three drives with a partitioned with id fd Linux raid autodetect you can go ahead and combine the partitions into a RAID array:
# /sbin/mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 \
	/dev/hdb1 /dev/hda1 /dev/hdf1
Wow, that was easy. That created a special device /dev/md0 which can be used instead of a physical partition. You can check on the status of that RAID array with the mdadm command:
# /sbin/mdadm --detail /dev/md0
        Version : 00.90.01
  Creation Time : Wed May 11 20:00:18 2005
     Raid Level : raid5
     Array Size : 234436352 (223.58 GiB 240.06 GB)
    Device Size : 117218176 (111.79 GiB 120.03 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun 10 04:13:11 2005
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 36161bdd:a9018a79:60e0757a:e27bb7ca
         Events : 0.10670

    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1       3       65        1      active sync   /dev/hdb1
       2      33       65        2      active sync   /dev/hdf1
The important lines to see are the State line which should say clean otherwise there might be a problem. At the bottom you should make sure that the State column always says active sync which says each device is actively in the array. You could potentially have a spare device that's on-hand should any drive should fail. If you have a spare you'll see it listed as such here. One thing you'll see above if you're paying attention is the fact that the size of the array is 240G but I have three 120G drives as part of the array. That's because the extra space is used as extra parity data that is needed to survive the failure of one of the drives.


Initial set of LVM on top of RAID

Now that we have /dev/md0 device you can create a Logical Volume on top of it. Why would you want to do that? If I were to build an ext3 filesystem on top of the RAID device and someday wanted to increase it's capacity I wouldn't be able to do that without backing up the data, building a new RAID array and restoring my data. Using LVM allows me to expand (or contract) the size of the filesystem without disturbing the existing data. Anyway, here are the steps to then add this RAID array to the LVM system. The first command pvcreate will "initialize a disk or partition for use by LVM". The second command vgcreate will then create the Volume Group, in my case I called it lvm-raid:
# pvcreate /dev/md0
# vgcreate lvm-raid /dev/md0
The default value for the physical extent size can be too low for a large RAID array. In those cases you'll need to specify the -s option with a larger than default physical extent size. The default is only 4MB as of the version in Fedora Core 5. The maximum number of physical extents is approximately 65k so take your maximum volume size and divide it by 65k then round it to the next nice round number. For example, to successfully create a 550G RAID let's figure that's approximately 550,000 megabytes and divide by 65,000 which gives you roughly 8.46. Round it up to the next nice round number and use 16M (for 16 megabytes) as the physical extent size and you'll be fine:
# vgcreate -s 16M 
Ok, you've created a blank receptacle but now you have to tell how many Physical Extents from the physical device (/dev/md0 in this case) will be allocated to this Volume Group. In my case I wanted all the data from /dev/md0 to be allocated to this Volume Group. If later I wanted to add additional space I would create a new RAID array and add that physical device to this Volume Group. To find out how many PEs are available to me use the vgdisplay command to find out how many are available and now I can create a Logical Volume using all (or some) of the space in the Volume Group. In my case I call the Logical Volume lvm0.
# vgdisplay lvm-raid
	.
	.
   Free  PE / Size       57235 / 223.57 GB
# lvcreate -l 57235 lvm-raid -n lvm0
In the end you will have a device you can use very much like a plain 'ol partition called /dev/lvm-raid/lvm0. You can now check on the status of the Logical Volume with the lvdisplay command. The device can then be used to to create a filesystem on.
# lvdisplay /dev/lvm-raid/lvm0 
  --- Logical volume ---
  LV Name                /dev/lvm-raid/lvm0
  VG Name                lvm-raid
  LV UUID                FFX673-dGlX-tsEL-6UXl-1hLs-6b3Y-rkO9O2
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                223.57 GB
  Current LE             57235
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:2
# mkfs.ext3 /dev/lvm-raid/lvm0
	.
	.
# mount /dev/lvm-raid/lvm0 /mnt
# df -h /mnt
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/lvm--raid-lvm0
                       224G   93M  224G   1% /mnt


Handling a Drive Failure

As everything eventually does break (some sooner than others) a drive in the array will fail. It is a very good idea to run smartd on all drives in your array (and probably ALL drives period) to be notified of a failure or a pending failure as soon as possible. You can also manually fail a partition, meaning to take it out of the RAID array, with the following command:
# /sbin/mdadm /dev/md0 -f /dev/hdb1
mdadm: set /dev/hdb1 faulty in /dev/md0
Once the system has determined a drive has failed or is otherwise missing (you can shut down and pull out a drive and reboot to similate a drive failure or use the command to manually fail a drive above it will show something like this in mdadm:
# /sbin/mdadm --detail /dev/md0
     Update Time : Wed Jun 15 11:30:59 2005
           State : clean, degraded
  Active Devices : 2
 Working Devices : 2
  Failed Devices : 1
   Spare Devices : 0
	.
	.
     Number   Major   Minor   RaidDevice State
        0       3        1        0      active sync   /dev/hda1
        1       0        0        -      removed
        2      33       65        2      active sync   /dev/hdf1
You'll notice in this case I had /dev/hdb fail. I replaced it with a new drive with the same capacity and was able to add it back to the array. The first step is to partition the new drive just like when first creating the array. Then you can simply add the partition back to the array and watch the status as the data is rebuilt onto the newly replace drive.
# /sbin/mdadm /dev/md0 -a /dev/hdb1
# /sbin/mdadm --detail /dev/md0
     Update Time : Wed Jun 15 12:11:23 2005
           State : clean, degraded, recovering
  Active Devices : 2
 Working Devices : 3
  Failed Devices : 0
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 64K

  Rebuild Status : 2% complete
	.
	.
During the rebuild process the system performance may be somewhat impacted but the data should remain in-tact.


Expanding an Array/Filesytem


I'm told it's now possible to expand the size of a RAID array much as you could on a commercial array such as the NetApp. The link below describes the procedure. I have yet to try it but it looks promising:
Growing a RAID5 array - http://scotgate.org/?p=107



Common Glitches

None yet, I've found the software RAID system to be remarkably stable.


Other Useful Resources

I've tried to not just copy other people's tips so I've included a list of other people's tips and tricks I've found to be useful. There should be little or no overlap.
Encrypting /home and swap over RAID with dm-crypt - Do you have important company files on your PC at home, that you can neither afford to lose, nor let fall into the wrong hands? This page explains how to set up encrypted RAID1 ext3 filesystems with dm-crypt, along with an encrypted RAID0 swap, on RedHat / Fedora Core 5, using the twofish encryption algorithm and dm-crypt's new ESSIV mode.

1 comment:

  1. I've been using this blog since the first entry and I think we can't ignore what you did, specially with this entry and the way like you picked the sources because the Active Devices aren't easy to find and you did a good job.

    ReplyDelete

Thank You for your Comments, We will read and response you soon...