How To Set Up Software RAID1 On A Running System (Incl. GRUB Configuration) (CentOS 5.3) - Page 4

On this page

  1. 9 Testing
  2. 10 Links

9 Testing

Now let's simulate a hard drive failure. It doesn't matter if you select /dev/sda or /dev/sdb here. In this example I assume that /dev/sdb has failed.

To simulate the hard drive failure, you can either shut down the system and remove /dev/sdb from the system, or you (soft-)remove it like this:

mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm --manage /dev/md1 --fail /dev/sdb2
mdadm --manage /dev/md2 --fail /dev/sdb3

mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm --manage /dev/md1 --remove /dev/sdb2
mdadm --manage /dev/md2 --remove /dev/sdb3

Shut down the system:

shutdown -h now

Then put in a new /dev/sdb drive (if you simulate a failure of /dev/sda, you should now put /dev/sdb in /dev/sda's place and connect the new HDD as /dev/sdb!) and boot the system. It should still start without problems.

Now run

cat /proc/mdstat

and you should see that we have a degraded array:

[[email protected] ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0]
      200704 blocks [2/1] [U_]

md1 : active raid1 sda2[0]
      522048 blocks [2/1] [U_]

md2 : active raid1 sda3[0]
      9759360 blocks [2/1] [U_]

unused devices: <none>
[[email protected] ~]#

The output of

fdisk -l

should look as follows:

[[email protected] ~]# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          25      200781   fd  Linux raid autodetect
/dev/sda2              26          90      522112+  fd  Linux raid autodetect
/dev/sda3              91        1305     9759487+  fd  Linux raid autodetect

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/md2: 9993 MB, 9993584640 bytes
2 heads, 4 sectors/track, 2439840 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md1: 534 MB, 534577152 bytes
2 heads, 4 sectors/track, 130512 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md0: 205 MB, 205520896 bytes
2 heads, 4 sectors/track, 50176 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table
[[email protected] ~]#

Now we copy the partition table of /dev/sda to /dev/sdb:

sfdisk -d /dev/sda | sfdisk /dev/sdb

(If you get an error, you can try the --force option:

sfdisk -d /dev/sda | sfdisk --force /dev/sdb

)

[[email protected] ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 1305 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    401624     401562  fd  Linux raid autodetect
/dev/sdb2        401625   1445849    1044225  fd  Linux raid autodetect
/dev/sdb3       1445850  20964824   19518975  fd  Linux raid autodetect
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
[[email protected] ~]#

Afterwards we remove any remains of a previous RAID array from /dev/sdb...

mdadm --zero-superblock /dev/sdb1
mdadm --zero-superblock /dev/sdb2
mdadm --zero-superblock /dev/sdb3

... and add /dev/sdb to the RAID array:

mdadm -a /dev/md0 /dev/sdb1
mdadm -a /dev/md1 /dev/sdb2
mdadm -a /dev/md2 /dev/sdb3

Now take a look at

cat /proc/mdstat

[[email protected] ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
      200704 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
      522048 blocks [2/2] [UU]

md2 : active raid1 sdb3[2] sda3[0]
      9759360 blocks [2/1] [U_]
      [=======>.............]  recovery = 39.4% (3846400/9759360) finish=1.7min speed=55890K/sec

unused devices: <none>
[[email protected] ~]#

Wait until the synchronization has finished:

[[email protected] ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
      200704 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
      522048 blocks [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
      9759360 blocks [2/2] [UU]

unused devices: <none>
[[email protected] ~]#

Then run

grub

and install the bootloader on both HDDs:

root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit

That's it. You've just replaced a failed hard drive in your RAID1 array.

 

Share this page:

Suggested articles

6 Comment(s)

Add comment

Comments

By: Pawel

I strongly recommend to disable the SELinux:

vi /etc/sysconfig/selinux

and disable it like this:

SELINUX=disabled

the reason behind it is that it is causing problems after you copy your files to the

second partition and reboot the whole system. In my case HAL daemon failed to start

plus few others:

Determining IP information for eth0...Can't create /var/run/dhclient-eth0.pid: Permission denied
/etc/sysconfig/network-scripts/network-functions: line 437: /usr/bin/logger: Permission denied
done.
[ OK ]
Starting auditd: [FAILED]

...........

Starting NFS statd: type=1400 audit(1259687187.716:475): avc: denied { search } for pid=2325 comm="rpc.statd" name="/" dev=md7 ino=2 scontext=system_u:system_r:rpcd_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
statd: Could not chdir: Permission denied
[FAILED]

Starting system message bus: type=1400 audit(1259687188.859:477): avc: denied { search } for pid=2381 comm="dbus-daemon" name="/" dev=md7 ino=2 scontext=system_u:system_r:system_dbusd_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
type=1400 audit(1259687188.905:478): avc: denied { search } for pid=2381 comm="dbus-daemon" name="/" dev=md7 ino=2 scontext=system_u:system_r:system_dbusd_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
type=1400 audit(1259687188.943:479): avc: denied { search } for pid=2381 comm="dbus-daemon" name="/" dev=md7 ino=2 scontext=system_u:system_r:system_dbusd_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
Failed to start message bus: Failed to bind socket "/var/run/dbus/system_bus_socket": Permission denied
[FAILED]

Starting Avahi daemon... [FAILED]

Starting HAL daemon: [FAILED]

-----------------------------------------------------------------------

disabling it worked like a charm :)

By: Anonymous

It's a common misconception that swap should not be on a RAID1 array when it fact swap should always be included in the raid arrays. If a disk fails on a live system, your processes may freeze or crash when attempting to pull data from swap off the failed disk. By including swap in the raid arrays, you avoid this problem. Also, you gain up to a 2x improvement in read speed from swap as it can be read from both drives, with a very small (insignificant) write speed tradeoff.

By: Anonymous

Very interesting artice especially the missing part here:

mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/sdb1

 I just wonder why you change the mtab, maybe you could explain this a little bit.

 Best regards, xcomm

 P.S.: One swall question ist why are you building a RAID1 about swap.;-)

By: Al

Thank you very much, very useful article.

By: Bob Jameson

Excellent!  The best article in adding RAID 1 in a running system for Centos users.  I cannot do it without your step-by-step procedure.

 

By: dksimple

 g8

Thank you

dksimple