codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova] MessageUndeliverable from nova-conductor and MessagingTimeout from nova-compute


Start by deleting the bindings and recreating them.
Then, if not enough, try delete the queues and restart nova-compute.


Cheers

-- 
Arnaud Morin

On 15.11.20 - 23:24, Tony Liu wrote:
> This seems the bug tracking this issue.
> https://bugs.launchpad.net/nova/+bug/1854992
> In my case, there is some bunch VM creation initiated by Terraform.
> I assume it's all about rabbitmq. Any clear way to fix it once it
> happens, delete and recreate bindings, delete and recreate queues,
> or just restarting nova-compute?
> 
> Thanks!
> Tony
> > -----Original Message-----
> > From: Tony Liu <tonyliu0592 at hotmail.com>
> > Sent: Sunday, November 15, 2020 2:55 PM
> > To: Arnaud Morin <arnaud.morin at gmail.com>
> > Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org>
> > Subject: [nova] MessageUndeliverable from nova-conductor and
> > MessagingTimeout from nova-compute
> > 
> > It's not about connection, but message transmitting.
> > Title is changed.
> > I am having rabbitmq 3.8.4 and nova 21.0.0.
> > 
> > When I check rabbitmq queues for compute, I see 3 messages in
> > compute.compute-1 and 0 messages in other compute node queues.
> > In nova-conductor log of all 3 instances, I see
> > "oslo_messaging.exceptions.MessageUndeliverable".
> > In nova-compute log on compute-1, I see
> > "Timed out waiting for nova-conductor."
> > Actually, 4 out of 5 compute nodes have such warning.
> > One compute node has exception
> > "oslo_messaging.exceptions.MessagingTimeout".
> > 
> > "rabbitmqctl list_bindings | grep compute-1" shows this.
> > =================================
> >         exchange        compute.compute-1       queue   compute.compute-
> > 1       []
> > nova    exchange        compute.compute-1       queue   compute.compute-
> > 1       []
> > =================================
> > Is this some known issue? How did it happen?
> > What's the cause of it and any way to prevent it from happening?
> > 
> > 
> > Thanks!
> > Tony
> > > -----Original Message-----
> > > From: Arnaud Morin <arnaud.morin at gmail.com>
> > > Sent: Saturday, November 14, 2020 2:10 AM
> > > To: Tony Liu <tonyliu0592 at hotmail.com>
> > > Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org>
> > > Subject: Re: [nova-compute] not reconnect to rabbitmq?
> > >
> > > Hello,
> > >
> > > What we noticed in our case is that nova compute is actually
> > > reconnecting, but cannot communicate with the conductor because the
> > > queue binding is either absent or not working anymore.
> > >
> > > So, first, which version of nova are you running?
> > > Which version of rabbitmq? (some bugs related to shadow bindings are
> > > fixed after 3.7.x / dont remember x)
> > >
> > > Can you check if you have any queue related to your compute?
> > > something like that:
> > > rabbtitmqctl list_queues | grep mycompute
> > >
> > > Also check the bindings, better using the management interface or
> > > rabbitmqadmin:
> > > rabbitmqadmin list bindings | grep mycompute
> > >
> > > What usually fixed our issue by the past was to delete / recreate the
> > > binding (easy to do from the management interface).
> > >
> > > Cheers,
> > >
> > > --
> > > Arnaud Morin
> > >
> > > On 14.11.20 - 00:34, Tony Liu wrote:
> > > > Hi,
> > > >
> > > > I'm having a deployment with Ussuri on CentOS 8.
> > > > I noticed that, in case the connection from nova-compute to rabbitmq
> > > > is broken, nova-compute doesn't reconnect.
> > > > I checked nova-conductor who seems keeping trying reconnect to
> > > > rabbitmq when connection is broken. But nova-compute doesn't seem
> > > > doing the same. I've seen it a few times, after I fixed rabbitmq and
> > > > bring it back, nova-conductor gets reconnected, but nova-compute
> > > > doesn't, I have to manually restart it. Anyone else has the similar
> > > > experiences?
> > > > Anything I am missing?
> > > >
> > > >
> > > > Thanks!
> > > > Tony
> > > >
> > > >
>