HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials

HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials (http://www.howtoforge.com/forums/index.php)
-   HOWTO-Related Questions (http://www.howtoforge.com/forums/forumdisplay.php?f=2)
-   -   Perfect XEN Setup on Debian Lenny - Live Migration Issues (http://www.howtoforge.com/forums/showthread.php?t=33171)

AlexKent 2nd April 2009 10:52

Perfect XEN Setup on Debian Lenny - Live Migration Issues
 
1 Attachment(s)
Hi,

As someone who tried to set a virtual servers IP address on KVM, I loved the tutorial on xen working very well on Debian Lenny.

One thing though, I've setup two servers using the guide and are now having issues with live migration.

So the command is:

xm migrate --live domU.com dom0-2.com

dom0-1 seems to successfully complete the transfer of domU to domU2.

However, on dom0-2 the domain stays in the p (paused) state for about 5 minutes and when it finally goes onto b or r and you connect to the console, you just see a stream of kernel panic about tasks being blocked for a certain number of seconds.

(have attached small kernel output selection)

Does anyone know if this is 'just me'? Or did anyone else have these issues about Xen and Debian Lenny not working out the box for live migration.

Thanks,

Alex

falko 3rd April 2009 12:31

Are the dom0-1 and dom0-2 setups identical? Do they have the same architecture (e.g. x86_64 or i386), do the use the same kernel versions, etc.?

AlexKent 3rd April 2009 12:51

Confirmation of identical setups
 
Hi Falko,

uname -a on both dom0 is:

Linux xen02.example.com 2.6.26-1-xen-amd64 #1 SMP Fri Mar 13 21:39:38 UTC 2009 x86_64 GNU/Linux

Linux xen03.example.com 2.6.26-1-xen-amd64 #1 SMP Fri Mar 13 21:39:38 UTC 2009 x86_64 GNU/Linux

They are both setup the same using your tutorial and are otherwise running flawlessly.

One difference to your first tutorial is that they use LVM for managing their discs, like I saw on your follow on tutorial.

I've just tried again, this time without the --live migrate option and the same problem occurs.

I can connect to a console, but after typing in username and pressing enter, the system blocks for a period time and then returns:

---

dom1.example.com login: alex
[ 168.431234] vbd vbd-2049: 16 Device in use; refusing to close

---

After a little while longer of nothingness (couple of minutes), I get:

[ 232.123828] INFO: task rsyslogd:1157 blocked for more than 120 seconds.
[ 232.123837] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 232.123843] rsyslogd D ffff88008162d000 0 1157 1
[ 232.123850] ffff880001617c58 0000000000000286 ffff88000e130120 ffffffff8037e2cb
[ 232.123862] ffff88000fd904c0 ffff88000fd42340 ffff88000fd90740 000000008020e810
[ 232.123872] 0000000000000000 ffff88000e159aa8 0000000000000dfe ffff88008162d000
[ 232.123879] Call Trace:
[ 232.123890] [<ffffffff8037e2cb>] gnttab_free_grant_references+0x19/0x87
[ 232.123898] [<ffffffff8020e911>] xen_clocksource_read+0xd/0x9c
[ 232.123905] [<ffffffff80265ed3>] sync_page_killable+0x0/0x30
[ 232.123912] [<ffffffff80243f2e>] getnstimeofday+0x39/0x98
[ 232.123920] [<ffffffff80265ed3>] sync_page_killable+0x0/0x30
[ 232.123926] [<ffffffff80434a5f>] io_schedule+0x5c/0x9e
[ 232.123932] [<ffffffff802646fd>] sync_page+0x3c/0x41
[ 232.123937] [<ffffffff80265edc>] sync_page_killable+0x9/0x30
[ 232.123942] [<ffffffff80434bd2>] __wait_on_bit_lock+0x36/0x66
[ 232.123948] [<ffffffff8026464a>] __lock_page_killable+0x5e/0x64
[ 232.123956] [<ffffffff8023f6c7>] wake_bit_function+0x0/0x23
[ 232.123962] [<ffffffff802661fd>] generic_file_aio_read+0x2fa/0x4b2
[ 232.123972] [<ffffffff8028a23b>] do_sync_read+0xc9/0x10c
[ 232.123978] [<ffffffff8023f699>] autoremove_wake_function+0x0/0x2e
[ 232.123985] [<ffffffff8028aa2c>] vfs_read+0xaa/0x152
[ 232.123990] [<ffffffff8028ae0d>] sys_read+0x45/0x6e
[ 232.123996] [<ffffffff8020b528>] system_call+0x68/0x6d
[ 232.124001] [<ffffffff8020b4c0>] system_call+0x0/0x6d
[ 232.124005]
[ 232.124009] INFO: task cron:2109 blocked for more than 120 seconds.
[ 232.124015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 232.124021] cron D ffff88008162d000 0 2109 1197
[ 232.124029] ffff88000f2499e8 0000000000000282 ffff88000e1300b0 ffffffff8037e2cb
[ 232.124039] ffff88000fd436c0 ffff88000fce2140 ffff88000fd43940 000000008020e810
[ 232.124048] ffff88000f802980 ffff88000fd4d1a8 0000000000000dfe ffff88008162d000
[ 232.124055] Call Trace:
[ 232.124061] [<ffffffff8037e2cb>] gnttab_free_grant_references+0x19/0x87
[ 232.124068] [<ffffffff8020e911>] xen_clocksource_read+0xd/0x9c
[ 232.124074] [<ffffffff80243f2e>] getnstimeofday+0x39/0x98
[ 232.124080] [<ffffffff802ac72a>] sync_buffer+0x0/0x3f
[ 232.124085] [<ffffffff80434a5f>] io_schedule+0x5c/0x9e
[ 232.124090] [<ffffffff802ac765>] sync_buffer+0x3b/0x3f
[ 232.124095] [<ffffffff80434cba>] __wait_on_bit+0x40/0x6e
[ 232.124102] [<ffffffff802ac72a>] sync_buffer+0x0/0x3f
[ 232.124107] [<ffffffff80434d54>] out_of_line_wait_on_bit+0x6c/0x78
[ 232.124114] [<ffffffff8023f6c7>] wake_bit_function+0x0/0x23
[ 232.124128] [<ffffffffa00612e3>] :ext3:ext3_find_entry+0x3f3/0x5d5
[ 232.124135] [<ffffffff8026c44c>] mark_page_accessed+0x18/0x2b
[ 232.124141] [<ffffffff802abf64>] __getblk+0x1d/0x222
[ 232.124153] [<ffffffffa005cb1b>] :ext3:__ext3_get_inode_loc+0xf9/0x2ac
[ 232.124167] [<ffffffffa0062c72>] :ext3:ext3_lookup+0x31/0xc9
[ 232.124173] [<ffffffff8029b826>] d_alloc+0x15b/0x1c7
[ 232.124179] [<ffffffff802919c5>] do_lookup+0xd7/0x1c1
[ 232.124186] [<ffffffff80293cdb>] __link_path_walk+0x96f/0xdfa
[ 232.124193] [<ffffffff80294028>] __link_path_walk+0xcbc/0xdfa
[ 232.124198] [<ffffffff8026908d>] get_page_from_freelist+0xde/0x518
[ 232.124204] [<ffffffff802941ac>] path_walk+0x46/0x8b
[ 232.124210] [<ffffffff802944d8>] do_path_lookup+0x158/0x1ce
[ 232.124215] [<ffffffff8029501b>] __path_lookup_intent_open+0x56/0x97
[ 232.124221] [<ffffffff80295151>] do_filp_open+0x9c/0x7c4
[ 232.124228] [<ffffffff8028893f>] get_unused_fd_flags+0x74/0x13f
[ 232.124234] [<ffffffff80288a50>] do_sys_open+0x46/0xc3
[ 232.124239] [<ffffffff8020b528>] system_call+0x68/0x6d
[ 232.124244] [<ffffffff8020b4c0>] system_call+0x0/0x6d
[ 232.124248]
[ 251.318307] INFO: task getty:1268 blocked for more than 120 seconds.
[ 251.318314] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 251.318320] getty D ffffffff80449f00 0 1268 1
[ 251.318327] ffff88000e707bc8 0000000000000286 ffff88000e130dd0 ffffffff8037e2cb
[ 251.318338] ffff88000fd7b7c0 ffffffff804fe460 ffff88000fd7ba40 000000008020e810
[ 251.318348] 0000000000000000 ffff88000e1597a8 0000000000000f88 ffff88008162d000
[ 251.318355] Call Trace:
[ 251.318362] [<ffffffff8037e2cb>] gnttab_free_grant_references+0x19/0x87
[ 251.318369] [<ffffffff8020e911>] xen_clocksource_read+0xd/0x9c
[ 251.318375] [<ffffffff80265ed3>] sync_page_killable+0x0/0x30
[ 251.318380] [<ffffffff80243f2e>] getnstimeofday+0x39/0x98
[ 251.318387] [<ffffffff80265ed3>] sync_page_killable+0x0/0x30
[ 251.318395] [<ffffffff80434a5f>] io_schedule+0x5c/0x9e
[ 251.318400] [<ffffffff802646fd>] sync_page+0x3c/0x41
[ 251.318406] [<ffffffff80265edc>] sync_page_killable+0x9/0x30
[ 251.318411] [<ffffffff80434bd2>] __wait_on_bit_lock+0x36/0x66
[ 251.318417] [<ffffffff8026464a>] __lock_page_killable+0x5e/0x64
[ 251.318423] [<ffffffff8023f6c7>] wake_bit_function+0x0/0x23
[ 251.318429] [<ffffffff802661fd>] generic_file_aio_read+0x2fa/0x4b2
[ 251.318436] [<ffffffff8028a23b>] do_sync_read+0xc9/0x10c
[ 251.318444] [<ffffffff8023f699>] autoremove_wake_function+0x0/0x2e
[ 251.318451] [<ffffffff802789b8>] __vma_link+0x42/0x4b
[ 251.318457] [<ffffffff8028aa2c>] vfs_read+0xaa/0x152
[ 251.318463] [<ffffffff8028e2a9>] kernel_read+0x38/0x4c
[ 251.318468] [<ffffffff8028f6b8>] do_execve+0xf1/0x215
[ 251.318474] [<ffffffff80209425>] sys_execve+0x35/0x4c
[ 251.318479] [<ffffffff8020b970>] stub_execve+0x40/0x70
[ 251.318485]
[ 253.498342] INFO: task kjournald:587 blocked for more than 120 seconds.
[ 253.498349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 253.498355] kjournald D ffffffff80449f00 0 587 2
[ 253.498362] ffff8800016c7d30 0000000000000246 ffff88000e159b78 0000000000000d5a
[ 253.498370] ffff88000fc3a080 ffffffff804fe460 ffff88000fc3a300 000000008020e810
[ 253.498383] ffff8800020709a0 ffff88000e159b68 0000000000000fa4 ffff88008162d000
[ 253.498390] Call Trace:
[ 253.498397] [<ffffffff8020e911>] xen_clocksource_read+0xd/0x9c
[ 253.498403] [<ffffffff80243f2e>] getnstimeofday+0x39/0x98
[ 253.498409] [<ffffffff802ac72a>] sync_buffer+0x0/0x3f
[ 253.498414] [<ffffffff80434a5f>] io_schedule+0x5c/0x9e
[ 253.498421] [<ffffffff802ac765>] sync_buffer+0x3b/0x3f
[ 253.498426] [<ffffffff80434cba>] __wait_on_bit+0x40/0x6e
[ 253.498433] [<ffffffff802ac72a>] sync_buffer+0x0/0x3f
[ 253.498438] [<ffffffff80434d54>] out_of_line_wait_on_bit+0x6c/0x78
[ 253.498444] [<ffffffff8023f6c7>] wake_bit_function+0x0/0x23
[ 253.498455] [<ffffffffa004a196>] :jbd:__journal_file_buffer+0xb5/0x154
[ 253.498465] [<ffffffffa004c426>] :jbd:journal_commit_transaction+0x527/0xe5d
[ 253.498471] [<ffffffff80235c38>] lock_timer_base+0x26/0x4b
[ 253.498477] [<ffffffff80235cae>] try_to_del_timer_sync+0x51/0x5a
[ 253.498488] [<ffffffffa004fba5>] :jbd:kjournald+0xd5/0x25a
[ 253.498495] [<ffffffff8023f699>] autoremove_wake_function+0x0/0x2e
[ 253.498504] [<ffffffffa004fad0>] :jbd:kjournald+0x0/0x25a
[ 253.498510] [<ffffffff8023f56b>] kthread+0x47/0x74
[ 253.498516] [<ffffffff8022839f>] schedule_tail+0x27/0x5c
[ 253.498521] [<ffffffff8020be28>] child_rip+0xa/0x12
[ 253.498527] [<ffffffff8023f524>] kthread+0x0/0x74
[ 253.498532] [<ffffffff8020be1e>] child_rip+0x0/0x12

-----

Any ideas would be appreciated and thanks for your help with the tutorial.

Regards,

Alex

falko 4th April 2009 13:11

I'm not sure why this is happening. I think I have to try myself.

AlexKent 5th April 2009 11:34

Thanks Falko, would love to know if it can be done.

Regards,

Alex

AlexKent 22nd April 2009 18:17

Any updates?
 
Hi Falko,

Just wondering if you got round to testing if Live Migration works out the box?

Thanks,

Alex

falko 23rd April 2009 23:47

Yes, I've tried it and written a tutorial about it (will publish it in a few days). I've used shared storage (iSCSI) for the virtual machines. As far as I know, Xen live migration doesn't work without shared storage.

AlexKent 24th April 2009 18:46

Thanks falko
 
Hi Falko,

Thanks for confirming that the live migration issue isn't just me.

I'm just getting my head round LVM at the moment, thinking live migration would be an excellent solution for getting domU's off a failing server, so it's a shame it's not supported.

I don't quite get why though, because the data seems to copy, the configuration gets loaded, yet something isn't quite right and the server streams errors.

If I went back to disc images, do you know if live migration works? I guess that's where I diverged from your tutorial, to my peril.

I look forward to reading your article on iSCSI - sounds fascinating.

Best wishes,

Alex

falko 25th April 2009 11:57

Quote:

Originally Posted by AlexKent (Post 184137)
If I went back to disc images, do you know if live migration works? I guess that's where I diverged from your tutorial, to my peril.

I don't think this has anything to do with disk images vs. LVM images.


All times are GMT +2. The time now is 22:01.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.