[ops][nova][neutron] Proper way to migrate instances between nodes with different ML2 agents types
On Fri, 2019-11-08 at 10:53 +0100, Antoine Millet wrote:
> Hi here,
> I'm trying to find a solution to migrate instances between hypervisors
> of an openstack cluster with nodes running different ML2 agents (OVS
> and bridges, I'm actually migrating the whole cluster to the latter).
> The cluster is running Rocky. I enabled both mechanisms in the neutron-
> server configuration and some nodes are running the neutron-
> openvswitch-agent and some other the neutron-linuxbridge-agent. My
> network nodes (running the l3 agent) are currently running the neutron-
> openvswitch-agent. I also noticed that when nova-compute is starting
> up, VIF plugins for OVS and Bridges are loaded ("INFO os_vif [-] Loaded
> VIF plugins: ovs, linux_bridge").
> When I start a live migration for an instance running on an hypervisor
> using the OVS agent to an hypervisor using the bridge agent, it fails
> because the destination hypervisor try to execute 'ovs-*' commands to
> bind the VM to its network. I also tried cold migration and just
> restarting an hypervisor with the bridge agent instead of the OVS one,
> but it fails similarly when the instances startup.
> After some research, I discovered that the mechanism used to bind an
> instance port to a network is stored in the port binding configuration
> in the database and that the code that executes the 'ovs-*' commands is
> actually located in the os_vif library that is used by the nova-compute
> So, I tried to remove the OVS plugin from the os_vif library. Ubuntu
> ship both plugins in the same package so I just deleted the plugin
> directory in /usr/lib/python2.7/dist-packages directory (don't judge me
> please, it's for science ;-)). And... it worked as expected (port
> bindings are converted to bridge mechanism), at least for the cold
> migration (hot migration is cancelled without any error message, I need
> to investigate more).
so while that is an inventive approch os-vif is not actully involved in the
port binding process it handles port pluggin later.
i did some testing aroudn this usecase back in 2018 and found a number of gaps
that need to be addressed to support live migration between linux brige and ovs or viseversa
first the bridge name not set in vif:binding-details by ml2/linux-bridge
os if we try to go from ovs to linuxbridge we generates the wrong xml and try to add the port
to a linux bridge called br-int
Updating guest XML with vif config: <interface type="bridge">
Aug 14 12:15:27 devstack1 nova-compute: <mac address="fa:16:3e:a9:cf:09"/>
Aug 14 12:15:27 devstack1 nova-compute: <model type="virtio"/>
Aug 14 12:15:27 devstack1 nova-compute: <source bridge="br-int"/>
Aug 14 12:15:27 devstack1 nova-compute: <mtu size="1450"/>
Aug 14 12:15:27 devstack1 nova-compute: <target dev="tapbf69476a-25"/>
Aug 14 12:15:27 devstack1 nova-compute: </interface>
using mixed linux bridnge and ovs host also has other proablems if you are using vxlan or gre
because neutron does not form mesh tunnel overly between different ml2 driver.
the linux bridge plugin also uses a different tcp port for reasons(vxlan was merged in teh linux kernel before the inan
port number was assigned.)
so in effect there is not support way to do this with a live migration in rocky but there ways to force it to work.
the simpelest way to do this is to cold migrate followed by a hard reboot but you need to add both ovs and linux bridge
tools on each host but only have 1 agent running.
you can also live migrate twice to the same host and hard reboot.
the first migration will fail. the second should succeed but result in the vm tap device being connected to the wrong
bridge and the hard reboot fixes it.
> Thank you for any help!