codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova] MessageUndeliverable from nova-conductor and MessagingTimeout from nova-compute


This seems the bug tracking this issue.
https://bugs.launchpad.net/nova/+bug/1854992
In my case, there is some bunch VM creation initiated by Terraform.
I assume it's all about rabbitmq. Any clear way to fix it once it
happens, delete and recreate bindings, delete and recreate queues,
or just restarting nova-compute?

Thanks!
Tony
> -----Original Message-----
> From: Tony Liu <tonyliu0592 at hotmail.com>
> Sent: Sunday, November 15, 2020 2:55 PM
> To: Arnaud Morin <arnaud.morin at gmail.com>
> Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org>
> Subject: [nova] MessageUndeliverable from nova-conductor and
> MessagingTimeout from nova-compute
> 
> It's not about connection, but message transmitting.
> Title is changed.
> I am having rabbitmq 3.8.4 and nova 21.0.0.
> 
> When I check rabbitmq queues for compute, I see 3 messages in
> compute.compute-1 and 0 messages in other compute node queues.
> In nova-conductor log of all 3 instances, I see
> "oslo_messaging.exceptions.MessageUndeliverable".
> In nova-compute log on compute-1, I see
> "Timed out waiting for nova-conductor."
> Actually, 4 out of 5 compute nodes have such warning.
> One compute node has exception
> "oslo_messaging.exceptions.MessagingTimeout".
> 
> "rabbitmqctl list_bindings | grep compute-1" shows this.
> =================================
>         exchange        compute.compute-1       queue   compute.compute-
> 1       []
> nova    exchange        compute.compute-1       queue   compute.compute-
> 1       []
> =================================
> Is this some known issue? How did it happen?
> What's the cause of it and any way to prevent it from happening?
> 
> 
> Thanks!
> Tony
> > -----Original Message-----
> > From: Arnaud Morin <arnaud.morin at gmail.com>
> > Sent: Saturday, November 14, 2020 2:10 AM
> > To: Tony Liu <tonyliu0592 at hotmail.com>
> > Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org>
> > Subject: Re: [nova-compute] not reconnect to rabbitmq?
> >
> > Hello,
> >
> > What we noticed in our case is that nova compute is actually
> > reconnecting, but cannot communicate with the conductor because the
> > queue binding is either absent or not working anymore.
> >
> > So, first, which version of nova are you running?
> > Which version of rabbitmq? (some bugs related to shadow bindings are
> > fixed after 3.7.x / dont remember x)
> >
> > Can you check if you have any queue related to your compute?
> > something like that:
> > rabbtitmqctl list_queues | grep mycompute
> >
> > Also check the bindings, better using the management interface or
> > rabbitmqadmin:
> > rabbitmqadmin list bindings | grep mycompute
> >
> > What usually fixed our issue by the past was to delete / recreate the
> > binding (easy to do from the management interface).
> >
> > Cheers,
> >
> > --
> > Arnaud Morin
> >
> > On 14.11.20 - 00:34, Tony Liu wrote:
> > > Hi,
> > >
> > > I'm having a deployment with Ussuri on CentOS 8.
> > > I noticed that, in case the connection from nova-compute to rabbitmq
> > > is broken, nova-compute doesn't reconnect.
> > > I checked nova-conductor who seems keeping trying reconnect to
> > > rabbitmq when connection is broken. But nova-compute doesn't seem
> > > doing the same. I've seen it a few times, after I fixed rabbitmq and
> > > bring it back, nova-conductor gets reconnected, but nova-compute
> > > doesn't, I have to manually restart it. Anyone else has the similar
> > > experiences?
> > > Anything I am missing?
> > >
> > >
> > > Thanks!
> > > Tony
> > >
> > >