Size: a a a

OpenNebula - русскоговорящее сообщество

2020 April 18

AT

Alex Tkachenko in OpenNebula - русскоговорящее сообщество
С снапшотами не мигрируешь.
источник

AT

Alex Tkachenko in OpenNebula - русскоговорящее сообщество
Удали снапшоты, если не нужны
источник
2020 April 19

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
opennebula 5.10, linstor, drbd
мигрирую тачку (№100) с двумя дисками (данные и swap) с diskless node на ноду, где эти диски в реальности существуют

Sun Apr 19 00:11:24 2020 [Z0][VM][I]: New LCM state is MIGRATE
Sun Apr 19 00:11:30 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Sun Apr 19 00:11:31 2020 [Z0][VMM][I]: ExitCode: 0
.
. ****вырезан кусок лога переключения сети****
.
Sun Apr 19 00:12:00 2020 [Z0][VMM][I]: Successfully execute network driver operation: post.
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: Command execution failed (exit code: 1): /var/lib/one/remotes/tm/linstor_un/postmigrate csshv-93 csshv-90 /var/lib/one//datastores/129/100 100 129
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: /var/lib/one/remotes/tm/linstor_un/../../datastore/xpath.rb:66:in `block in <main>': undefined method `elements' for nil:NilClass (NoMethodError)
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: from /var/lib/one/remotes/tm/linstor_un/../../datastore/xpath.rb:60:in `each'
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: from /var/lib/one/remotes/tm/linstor_un/../../datastore/xpath.rb:60:in `<main>'
Sun Apr 19 00:12:03 2020 [Z0][VMM][E]: Command "linstor resource delete csshv-93 one-vm-100-disk-0 --async" failed: (Node: 'csshv-93') Shutdown of the DRBD resource 'one-vm-100-disk-0 failed Error reports: [ 5E8A3C41-2EF7A-000017 ]
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_postmigrate.
Sun Apr 19 00:12:03 2020 [Z0][VM][I]: New LCM state is RUNNING
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
До миграции с ноды csshv-93 (diskless) на csshv-90
# linstor r list |grep 100-disk-
| one-vm-100-disk-0 | csshv-90 | 7035 | Unused | Ok    | UpToDate |
| one-vm-100-disk-0 | csshv-91 | 7035 | Unused | Ok    | UpToDate |
| one-vm-100-disk-0 | csshv-92 | 7035 | Unused | Ok    | UpToDate |
| one-vm-100-disk-0 | csshv-93 | 7035 | InUse  | Ok    | Diskless |
| one-vm-100-disk-1 | csshv-90 | 7036 | InUse  | Ok    | UpToDate |
| one-vm-100-disk-1 | csshv-91 | 7036 | Unused | Ok    | UpToDate |
| one-vm-100-disk-1 | csshv-92 | 7036 | Unused | Ok    | UpToDate |
| one-vm-100-disk-1 | csshv-93 | 7036 | Unused | Ok    | Diskless |
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
После миграции
# linstor r list |grep 100-disk-
| one-vm-100-disk-0 | csshv-90 | 7035 | InUse  | Ok    | UpToDate |
| one-vm-100-disk-0 | csshv-91 | 7035 | Unused | Ok    | UpToDate |
| one-vm-100-disk-0 | csshv-92 | 7035 | Unused | Ok    | UpToDate |
| one-vm-100-disk-1 | csshv-90 | 7036 | InUse  | Ok    | UpToDate |
| one-vm-100-disk-1 | csshv-91 | 7036 | Unused | Ok    | UpToDate |
| one-vm-100-disk-1 | csshv-92 | 7036 | Unused | Ok    | UpToDate |
| one-vm-100-disk-1 | csshv-93 | 7036 | Unused | Ok    | Diskless |
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
миграция live, тачка переехала норм, использование дисков началось на csshv-90 - там, куда тачка и переехала
Напрягают
1. ошибки в логе
2. то, что disk-1 остался и на diskless node (уровень репликации 3)
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
В ошибке на csshv-93 (откуда уезжал диск)
Reported error:
===============

Description:
   Operations on resource 'one-vm-100-disk-0' were aborted
Cause:
   The external command for stopping the DRBD resource failed
Correction:
   - Check whether the required software is installed
   - Check whether the application's search path includes the location
     of the external software
   - Check whether the application has execute permission for the external command

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'deleteDrbd', Source file 'DrbdLayer.java', Line #493

Error message:                      Shutdown of the DRBD resource 'one-vm-100-disk-0 failed

Error context:
   An error occurred while processing resource 'Node: 'csshv-93', Rsc: 'one-vm-100-disk-0''

Additional information:
   The full command line executed was:
   drbdsetup down one-vm-100-disk-0

   The external command sent the following output data:


   The external command sent the following error information:
   one-vm-100-disk-0: State change failed: (-10) State change was refused by peer node
   additional info from kernel:
   Declined by peer csshv-90 (id: 0), see the kernel log there


Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #536

Error message:                      The external command 'drbdsetup' exited with error code 11
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
типа csshv-90 перехватил управление раньше, чем получилось сделать down?
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
и из-за этого, видимо, не получилось сделать down для disk-1 ?
источник

k

kvaps in OpenNebula - русскоговорящее сообщество
Nick Potemkin
В ошибке на csshv-93 (откуда уезжал диск)
Reported error:
===============

Description:
   Operations on resource 'one-vm-100-disk-0' were aborted
Cause:
   The external command for stopping the DRBD resource failed
Correction:
   - Check whether the required software is installed
   - Check whether the application's search path includes the location
     of the external software
   - Check whether the application has execute permission for the external command

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'deleteDrbd', Source file 'DrbdLayer.java', Line #493

Error message:                      Shutdown of the DRBD resource 'one-vm-100-disk-0 failed

Error context:
   An error occurred while processing resource 'Node: 'csshv-93', Rsc: 'one-vm-100-disk-0''

Additional information:
   The full command line executed was:
   drbdsetup down one-vm-100-disk-0

   The external command sent the following output data:


   The external command sent the following error information:
   one-vm-100-disk-0: State change failed: (-10) State change was refused by peer node
   additional info from kernel:
   Declined by peer csshv-90 (id: 0), see the kernel log there


Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #536

Error message:                      The external command 'drbdsetup' exited with error code 11
Спасибо за подробный багрепорт
источник

k

kvaps in OpenNebula - русскоговорящее сообщество
Nick Potemkin
opennebula 5.10, linstor, drbd
мигрирую тачку (№100) с двумя дисками (данные и swap) с diskless node на ноду, где эти диски в реальности существуют

Sun Apr 19 00:11:24 2020 [Z0][VM][I]: New LCM state is MIGRATE
Sun Apr 19 00:11:30 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Sun Apr 19 00:11:31 2020 [Z0][VMM][I]: ExitCode: 0
.
. ****вырезан кусок лога переключения сети****
.
Sun Apr 19 00:12:00 2020 [Z0][VMM][I]: Successfully execute network driver operation: post.
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: Command execution failed (exit code: 1): /var/lib/one/remotes/tm/linstor_un/postmigrate csshv-93 csshv-90 /var/lib/one//datastores/129/100 100 129
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: /var/lib/one/remotes/tm/linstor_un/../../datastore/xpath.rb:66:in `block in <main>': undefined method `elements' for nil:NilClass (NoMethodError)
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: from /var/lib/one/remotes/tm/linstor_un/../../datastore/xpath.rb:60:in `each'
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: from /var/lib/one/remotes/tm/linstor_un/../../datastore/xpath.rb:60:in `<main>'
Sun Apr 19 00:12:03 2020 [Z0][VMM][E]: Command "linstor resource delete csshv-93 one-vm-100-disk-0 --async" failed: (Node: 'csshv-93') Shutdown of the DRBD resource 'one-vm-100-disk-0 failed Error reports: [ 5E8A3C41-2EF7A-000017 ]
Sun Apr 19 00:12:03 2020 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_postmigrate.
Sun Apr 19 00:12:03 2020 [Z0][VM][I]: New LCM state is RUNNING
postmigrate попытался удалить диск с предыдущей ноды, но почему-то drbd был залочен и не смог сделать drbdadm down
источник

k

kvaps in OpenNebula - русскоговорящее сообщество
Nick Potemkin
В ошибке на csshv-93 (откуда уезжал диск)
Reported error:
===============

Description:
   Operations on resource 'one-vm-100-disk-0' were aborted
Cause:
   The external command for stopping the DRBD resource failed
Correction:
   - Check whether the required software is installed
   - Check whether the application's search path includes the location
     of the external software
   - Check whether the application has execute permission for the external command

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'deleteDrbd', Source file 'DrbdLayer.java', Line #493

Error message:                      Shutdown of the DRBD resource 'one-vm-100-disk-0 failed

Error context:
   An error occurred while processing resource 'Node: 'csshv-93', Rsc: 'one-vm-100-disk-0''

Additional information:
   The full command line executed was:
   drbdsetup down one-vm-100-disk-0

   The external command sent the following output data:


   The external command sent the following error information:
   one-vm-100-disk-0: State change failed: (-10) State change was refused by peer node
   additional info from kernel:
   Declined by peer csshv-90 (id: 0), see the kernel log there


Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #536

Error message:                      The external command 'drbdsetup' exited with error code 11
Declined by peer csshv-90 (id: 0), see the kernel log there
источник

k

kvaps in OpenNebula - русскоговорящее сообщество
Видимо csshv-90 не успел отсинкаться или я хз, глянь в кернел лог, может поймёшь из-за чего
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
kvaps
Видимо csshv-90 не успел отсинкаться или я хз, глянь в кернел лог, может поймёшь из-за чего
c csshv-90
[Sun Apr 19 00:11:15 2020] drbd one-vm-100-disk-0: Preparing cluster-wide state change 2595806047 (0->-1 3/1)
[Sun Apr 19 00:11:15 2020] drbd one-vm-100-disk-0: State change 2595806047: primary_nodes=9, weak_nodes=FFFFFFFFFFFFFFF0
[Sun Apr 19 00:11:15 2020] drbd one-vm-100-disk-0: Committing cluster-wide state change 2595806047 (0ms)
[Sun Apr 19 00:11:15 2020] drbd one-vm-100-disk-0: role( Secondary -> Primary )
[Sun Apr 19 00:11:38 2020] drbd one-vm-100-disk-0 csshv-93: peer( Primary -> Secondary )
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: Preparing remote state change 1779374693
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0: State change failed: Intentional diskless peer may not attach a disk
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: Aborting remote state change 1779374693
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0: Preparing cluster-wide state change 2545462789 (0->3 123376/49168)
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0: State change 2545462789: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF8
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0: Committing cluster-wide state change 2545462789 (0ms)
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown )
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0/0 drbd1035 csshv-93: pdsk( Diskless -> DUnknown ) repl( Established -> Off )
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: ack_receiver terminated
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: Terminating ack_recv thread
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-92: Preparing remote state change 3762558587
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-92: Committing remote state change 3762558587 (primary_nodes=1)
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-91: Preparing remote state change 1492002167
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-91: Committing remote state change 1492002167 (primary_nodes=1)
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: Restarting sender thread
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: Connection closed
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: conn( Disconnecting -> StandAlone )
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: Terminating receiver thread
[Sun Apr 19 00:11:44 2020] drbd one-vm-100-disk-0 csshv-93: Terminating sender thread
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
c csshv-93

[Sun Apr 19 00:11:47 2020] drbd one-vm-100-disk-0 csshv-92: Connection closed
[Sun Apr 19 00:11:47 2020] drbd one-vm-100-disk-0 csshv-92: conn( TearDown -> Unconnected )
[Sun Apr 19 00:11:47 2020] drbd one-vm-100-disk-0 csshv-92: Restarting receiver thread
[Sun Apr 19 00:11:47 2020] drbd one-vm-100-disk-0 csshv-92: conn( Unconnected -> Connecting )
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-90: conn( Connecting -> Disconnecting )
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-90: Aborting remote state change 0 commit not possible
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-90: Restarting sender thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-90: Connection closed
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-90: conn( Disconnecting -> StandAlone )
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-90: Terminating receiver thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-90: Terminating sender thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-91: conn( Connecting -> Disconnecting )
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-91: Aborting remote state change 0 commit not possible
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-91: Restarting sender thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-91: Connection closed
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-91: conn( Disconnecting -> StandAlone )
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-91: Terminating receiver thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-91: Terminating sender thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-92: conn( Connecting -> Disconnecting )
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-92: Aborting remote state change 0 commit not possible
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-92: Restarting sender thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-92: Connection closed
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-92: conn( Disconnecting -> StandAlone )
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-92: Terminating receiver thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0 csshv-92: Terminating sender thread
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0/0 drbd1035: drbd_bm_resize called with capacity == 0
[Sun Apr 19 00:12:05 2020] drbd one-vm-100-disk-0: Terminating worker thread
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
причем миграция с save_migrate проходит без проблем и без ошибок.. это только с live такое
источник

k

kvaps in OpenNebula - русскоговорящее сообщество
Nick Potemkin
причем миграция с save_migrate проходит без проблем и без ошибок.. это только с live такое
Каждый раз прям такое?
источник

NP

Nick Potemkin in OpenNebula - русскоговорящее сообщество
прям да )
источник

k

kvaps in OpenNebula - русскоговорящее сообщество
Хм, попробуй ради экспиремента sleep 5 вставить в начало postmigrate скрипта
источник

k

kvaps in OpenNebula - русскоговорящее сообщество
А я у себя посмотрю ещё
источник