HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials

HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials (http://www.howtoforge.com/forums/index.php)
-   Installation/Configuration (http://www.howtoforge.com/forums/forumdisplay.php?f=4)
-   -   md RAID reconstruction hints & tips? (http://www.howtoforge.com/forums/showthread.php?t=2454)

ryoken 9th February 2006 02:39

md RAID reconstruction hints & tips?
 
this may sound like quite a trivial question, but could someone please confirm whether I need to unmount the hard drives before performing RAID reconstruction using the mdadm tool? should i be booting into single user mode 1st? or should i boot up using a live cd, such as knoppix? which is the accepted or safest method? how would i accomplish this?

Here's the deal: I've been experimenting with Linux RAID (md) and wanted to improve the availability of my server. So when everyone's asleep, I can replace the faulty drive with a new one.

I have /dev/sda and /dev/sdb mirrored (RAID1). Both sda and sdb have a "/" [md0] and "swap" [md1] partition. Also, both sda & sdb's MBR have GRUB installed. Now, when sdb gets unplugged (after powering off of course!), the system still boots into Linux :) Reconnecting sdb and disconnecting sda reveals the same results - flawless booting!

All is good so far. Now, lets plug sda & sdb back and boot into Linux. Running "cat /proc/mdstat" and "mdadm --detail /dev/md0" reveals a degraded RAID array (with sdb flagged as faulty/foreign). So using mdadm again, we can perform a "hot insert" of sdb. After a few moments, we can confirm (through /proc/mdstat and mdadm) that rebuilding was completed. OK, onto rebooting the system. First reboot after reconstruction seems flawless. So we shutdown the system again, and unplug sda again.

Bad news this time around. Powering on the system, we immediately notice GRUB's failed attempt to load the linux kernel, citing crc errors. And I thought this RAID1 mirror was perfect - even after reconstruction. So what went wrong? Here's GRUB's error message FYI:

Booting 'Debian GNU/Linux, kernel 2.6.8-2-386'

[..snip..]

Uncompressing Linux...

crc error

-- System halted


Am I correct in suspecting that mounted drives can't be mirrored (especially the boot blocks)? Is it possible that mounted drives have "locked" files or open "handles" which cannot be mirrored? Or am I way off the mark? Let me know. Looking forward to hearing your responses!!

edit: Whilst installing Debian 3.1 "Sarge", should you wait for RAID to finish syncing during the partitioning phase (by going to console "Alt+F2" & running "cat /proc/mdstat"), or is it OK to go right ahead and format them and install base system files while syncing in the background?

ryoken 9th February 2006 08:49

after further testing, it seems i can no longer boot from md0 :mad:

after md0 was resync'd with sda & sdb plugged in together, i rebooted and started getting errors after loading the kernel such as missing files and segmentation faults.

so i unplugged sdb again, and guess what?? i booted fine! so does this prove that raid arrays cannot be reconstructed when one of the drives are already mounted? or is there something more sinister going on here? :eek:

edit: yes, i did wait until md0 completed resyncing before i hit reboot...

falko 9th February 2006 10:45

I found this about your CRC error: http://aplawrence.com/SME/raidfailure.html

ryoken 9th February 2006 11:33

Quote:

Originally Posted by falko
I found this about your CRC error: http://aplawrence.com/SME/raidfailure.html

hi falko. thanks for the link. all my hard drives are new so i don't think it's a hardware failure.

mphayesuk 9th February 2006 11:53

Just a quick question as I am thinking of doing this.... I take it that the right way of doing the raid is through linux as you have done create duplicate partitions on both drives and then raid them together using software. Its just I thought that if your moptherboard has onboard raid cant you install everything to the first harddrive and then let the onboard raid mirror to the other hard drive or does it not work like that.... Any advice you can give would be helpful.

In your situation it sounds like you have to raid partitions working... swap and / ... when you repaired the raid by syncing did you do both raids swap and / or just one of them.... justing thinking out load that if you did not do both then perhaps that caused your problem..... dont know much about this yet I am only trying it today.....

If you have any pointers please share....

Thanks

ryoken 9th February 2006 12:50

Quote:

Originally Posted by mphayesuk
Just a quick question as I am thinking of doing this.... I take it that the right way of doing the raid is through linux as you have done create duplicate partitions on both drives and then raid them together using software. Its just I thought that if your moptherboard has onboard raid cant you install everything to the first harddrive and then let the onboard raid mirror to the other hard drive or does it not work like that.... Any advice you can give would be helpful.

In your situation it sounds like you have to raid partitions working... swap and / ... when you repaired the raid by syncing did you do both raids swap and / or just one of them.... justing thinking out load that if you did not do both then perhaps that caused your problem..... dont know much about this yet I am only trying it today.....

If you have any pointers please share....

Thanks

hi mphayesuk! yes, ive set mine up using md software raid provided by linux. even though i do have a motherboard with onboard raid (aka fakeraid; see http://linuxmafia.com/faq/Hardware/sata.html), the Debian Sarge 2.6 stock kernel does not support dmraid natively (or at least not during installation). therefore, you will always see two hard drives despite being setup otherwise using the onboard raid's bios. FYI, i'm using a Silicon Image 3112 onboard raid controller.

during debian install, i have done exactly what you've described. /dev/sda1 and /dev/sdb1 combine to become md0. /dev/sda2 and /dev/sdb2 combine to become md1. in turn, md0 becomes "/" and md1 becomes "swap".

good question about md1 (swap). from memory, it seems as though md1 was automagically sync'd after plugging in both sda & sdb. maybe that's because its designated as swap. maybe its a bug. hopefully someone can shed some light on this. i say from memory because since then, ive reformatted the drives and only have a "/" - no swap.

this second install means that i now have only md0. unfortunately, these segmentation faults still occur after simulating a hardware fault (as explained in above posts) - i suspect sdb in md0 is corrupted. given this, i think it's safe to say swap hasn't caused the problem here.

hope this extra info helps!

mphayesuk 9th February 2006 14:03

Thanks... in my case perhaps suse 10 works differently....

What I did was leave the motherboard raid alone... no setup at all (ie no on board raid activated) And then started the install, I set it up as both our posts said duplicate ect.... but after the initial install the machine reboots as it should... I get a "kernal panic cant sync" message and thats when it dies.... At this point I am stuck I dont know what else to try..... any suggestions.

mphayesuk 10th February 2006 10:31

I think I have a solution for raid on suse anyway.... you buy a adaptec raid card which has suse as a supported operating system.. well suse 8 and 8.1 but I would hope that suse 10 has got the same drivers.... but for 30 its worth a go.

ryoken 11th February 2006 03:05

so was your onboard raid bios activated, or disabled? or did u mean that u did not setup the drive mapping for raid within your onboard raid bios?

ive never touched suse before, but i hear good things about it. suse is owned by novell right? there should be plenty of support on their website (or supporting sites). which kernel does suse 10 use? 2.6.??, or 2.4 series? you can try to setup raid1 in suse with only 1 sata drive!!! see what happens then! do you still get kernel sync errors?

i havent touched adaptec raid cards, but im assuming this cheaper version you are referring to may be using 3rd party chipsets (eg. silicon image), and not their in-house adaptec ASICs.

edit: btw, if suse still doesn't work, you can try debian and see if that works for you!! i know i know, but i don't mean to make one change sides ;-)

mphayesuk 12th February 2006 14:59

Tried on-board raid disabled and enabled and the both suse raid on and off, if you know what I mean all combinations have been tried and either get no boot and sync errors.... I think the kernal is 2.4.... I suppose a last resort might be to re-compile with 2.6.... but I am not that good on linux and dont think I could do it.

Not sure what will be on the new raid card but on their web site it did say suse is supported so fingers crossed.

Yeah got debian down before the weekend so I might give that a bash and see what I can do with it.

Thanks for your posts


All times are GMT +2. The time now is 05:35.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.