Re: How To Set Up Software RAID1 On A Running System (Debian Etch)

Discussion in 'HOWTO-Related Questions' started by ClarkVent, Mar 16, 2008.

  1. ClarkVent

    ClarkVent New Member

    Hi,

    First of all, I would like to thank falko for his very comprehensive "How To Set Up Software RAID1 On A Running System"-HowTo. It was exactly what I needed. :)

    That said, I do have a question I hope he (or anyone else for that matter) can answer.

    Everything works fine, when I boot the system my two 160Gb HDDs are running nicely in RAID1. At boot-up, I get this grub boot menu:

    Code:
    Debian GNU/Linux, kernel 2.6.18-6-686 (RAID, hd1)
    Debian GNU/Linux, kernel 2.6.18-6-686 (RAID, hd0)
    I have it setup so that it times out waiting for user input after 5 seconds after which it boots my system. All is well - or so it seems.

    When I manually choose the first option (boot from hd1) I get this error message:

    Code:
      Booting 'Debian GNU/Linux, kernel 2.6.18-6-686 (hd1)'
    
    root  (hd1,0)
    Filesystem type is ext2fs, partition type 0x83
    kernel  /boot/vmlinuz-2.6.18-6-686 root=/dev/md0 ro
    
    Error 15: File not found
    
    Press any key to continue...
    When I manually choose the second option (boot from hd0), it boots fine.

    This is my menu.lst (to preserve space, I omitted the unimportant bits - mostly comments. Also, I highlighted a few things I thought might be important):

    Code:
    # menu.lst
    default         0
    fallback        1
    
    timeout         5
    
    color cyan/blue white/blue
    
    # kopt=root=[b]/dev/md0[/b] ro
    # groot=([b]hd1[/b],0)
    # alternative=true
    # lockalternative=false
    # defoptions=
    # lockold=false
    # xenhopt=
    # xenkopt=console=tty0
    # altoptions=(single-user mode) single
    # howmany=all
    # memtest86=true
    # updatedefaultentry=false
    
    ## ## End Default Options ##
    title           Debian GNU/Linux, kernel 2.6.18-6-686 (RAID, hd1)
    root            (hd1,0)
    kernel          /boot/vmlinuz-2.6.18-6-686 root=/dev/md0 ro
    initrd          /boot/initrd.img-2.6.18-6-686
    savedefault
    
    title           Debian GNU/Linux, kernel 2.6.18-6-686 (RAID, hd0)
    root            (hd0,0)
    kernel          /boot/vmlinuz-2.6.18-6-686 root=/dev/md0 ro
    initrd          /boot/initrd.img-2.6.18-6-686
    savedefault
    
    Examining /proc/mdstat shows the RAID is running fine:

    Code:
    ~# cat /proc/mdstat
    Personalities : [raid1]
    md5 : active raid1 sda9[0] sdb9[1]
          144882048 blocks [2/2] [UU]
    
    md4 : active raid1 sda8[0] sdb8[1]
          1003904 blocks [2/2] [UU]
    
    md3 : active raid1 sda7[0] sdb7[1]
          2008000 blocks [2/2] [UU]
    
    md2 : active raid1 sda6[0] sdb6[1]
          10008384 blocks [2/2] [UU]
    
    md1 : active raid1 sda5[0] sdb5[1]
          979840 blocks [2/2] [UU]
    
    md0 : active raid1 sda1[0] sdb1[1]
          1951744 blocks [2/2] [UU]
    
    unused devices: <none>
    
    I have no idea why it throws the "Error 15" when I try to boot from hd1. They (hd0 & hd1) should be each other's mirror, so a file found on one should be found on the other, right?

    So does anybody know what's going on? Thanks in advance!
     
    Last edited: Mar 16, 2008
  2. falko

    falko Super Moderator

    Please make sure that GRUB is properly installed on hd1.
     
  3. ClarkVent

    ClarkVent New Member

    I found what's causing the error, but not how to solve it.

    The error is caused by a third hard drive in the computer which I added after I had created the RAID and which is used for making periodic backups. Apparently, grub detects the third drive as hd1.

    The two RAID drives are connected to SATA port 1 and SATA port 2 on my computer, while the backup drive was connected to SATA port 4. When I connected the backup drive to SATA port 3 the problem disappeared - but only temporarily. After I connected the backup drive to SATA port 3, I tested the RAID by disconnecting the first drive of the RAID. The system booted fine. I connected the first drive again, and disconnected the second drive of the RAID. This time, the system would not boot - I only got a black screen and no grub boot menu. No idea why but I suspect the system was trying to boot from the backup drive (I did do a "root (hd1,0)" and "setup (hd1)" when the system still detected the backup drive as hd1). Anyway, when I hooked up all drives again, I got the original problem again - that grub detected the backup drive as hd1. Now I had to switch the backup drive from SATA port 3 back to SATA port 4 to get the system working again.

    In short, it's totally unpredictable how grub will detect the drives which is causing the problems. Sometimes it detects the backup drive as hd1, sometimes one of the RAID drives.

    So, how can I correct this behavior? Make sure grub detects the drives the same (as hd0, hd1 or hd2) each time, regardless of if a drive is found or not?
     
  4. falko

    falko Super Moderator

    Have you tried to use the ports 1 and 3 for your RAID drives and one of the remaining ports for the backup drive?
     
  5. ClarkVent

    ClarkVent New Member

    Yes I did - it didn't make (much of) a difference. It did change the order of the drives at first, but after I disconnected and reconnected one of the drives, the order of the drives was wrong again.
     
    Last edited: Mar 18, 2008
  6. ClarkVent

    ClarkVent New Member

    Just a quick (extra) question. In your HowTo, you've set hd1 as your default boot drive and hd0 as the fallback drive. Why?

    In my situation, with an extra third drive, if one of the RAID drives fail - as in really fail so that the BIOS doesn't even see the drive anymore - the third drive will become hd1 and a reboot will fail because grub will find hd1 but it won't find the kernel image.

    If you set hd0 as the default boot drive, and one of the the RAID drives fails, the remaining drive will become hd0 automatically and a reboot will succeed.

    Or am I missing something here?
     
  7. falko

    falko Super Moderator

    No special reason. You can change the order.
     
  8. ClarkVent

    ClarkVent New Member

    Wouldn't you agree that using hd0 as default and hd1 as fallback would be better?
     
  9. falko

    falko Super Moderator

    The order doesn't matter as the system must be able to boot from both.
     
  10. ClarkVent

    ClarkVent New Member

    In order to be able to boot from both drives in as many situations as possible, one should really try to boot from hd0 first.

    Wouldn't you agree that it would be better to try to boot from hd0 first because, with all other things being equal, booting from hd0 would capture hardware failure of one of the RAID drives if there are more drives attached then just the RAID drives (while if you tried to boot from hd1 first, it would not)?

    A lot/most systems have more drives attached than just the two RAID drives (card readers come to mind).

    For everything else it does not matter whether you try to boot from hd0 or hd1 first, but since the most common HD failure is hardware failure (I know, since I had 4 drives on my fail in the past month and all were hardware failures), and a lot of systems have more drives attached than just the RAID drives, it would make sense making hd0 the default drive and hd1 the fallback drive.
     

Share This Page