codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Virtio memory balloon driver


I opened a bug:

https://bugs.launchpad.net/nova/+bug/1862425

-----Original Message-----
From: Sean Mooney <smooney at redhat.com> 
Sent: Wednesday, February 5, 2020 10:25 AM
To: Albert Braden <albertb at synopsys.com>; openstack-discuss at lists.openstack.org
Subject: Re: Virtio memory balloon driver

On Wed, 2020-02-05 at 17:33 +0000, Albert Braden wrote:
> When I start and stop the giant VM I don't see any evidence of OOM errors. I suspect that the #centos guys may be
> correct when they say that the "Virtio memory balloon" device is not capable of addressing that much memory, and that
> I must disable it if I want to create VMs with 1.4T RAM. Setting "mem_stats_period_seconds = 0" doesn't seem to
> disable it.
> 
> How are others working around this? Is anyone else creating Centos 6 VMs with 1.4T or more RAM?
i suspect not.
spawning 1 giant vm that uses all the resouse on the host is not a typical usecse.
in general people move to ironic when the need a vm that large.
i unfortunetly dont have time to look into this right now but we can likely add a way to disabel the ballon device
and if you remind me in a day or two i can try and see why mem_stats_period_seconds = 0 is not working for you.
looking at https://urldefense.proofpoint.com/v2/url?u=https-3A__opendev.org_openstack_nova_src_branch_master_nova_virt_libvirt_driver.py-23L5842-2DL5852&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=WF6NUF1-K7cJv2js_9SXU42-chTUhO8odllpI7Mk26s&s=_kEGfZqTkPscjy0GJB2N_WBXRJPEt2400ADV12hhxR8&e= 
it should work but libvirt addes extra element to the xml after we generate it and fills in some fields.
its possibel that libvirt is adding it and when we dont want the device we need to explcitly disable it in some way.
if that is the case we could track this as a bug and potentially backport it.
> 
> Console log: https://urldefense.proofpoint.com/v2/url?u=https-3A__f.perl.bot_p_njvgbm&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=WF6NUF1-K7cJv2js_9SXU42-chTUhO8odllpI7Mk26s&s=5J3hH_mxdtOyNFqbW6j9yGiSyMXmhy3bXrmXRHkJ9I0&e= 
> 
> The error is at line 404: [   18.736435] BUG: unable to handle kernel paging request at ffff9ca8d9980000
> 
> Dmesg:
> [Tue Feb  4 17:50:42 2020] brq49cbe55d-51: port 1(tap039191ba-25) entered disabled state
> [Tue Feb  4 17:50:42 2020] device tap039191ba-25 left promiscuous mode
> [Tue Feb  4 17:50:42 2020] brq49cbe55d-51: port 1(tap039191ba-25) entered disabled state
> [Tue Feb  4 17:50:47 2020] brq49cbe55d-51: port 1(tap039191ba-25) entered blocking state
> [Tue Feb  4 17:50:47 2020] brq49cbe55d-51: port 1(tap039191ba-25) entered disabled state
> [Tue Feb  4 17:50:47 2020] device tap039191ba-25 entered promiscuous mode
> [Tue Feb  4 17:50:47 2020] brq49cbe55d-51: port 1(tap039191ba-25) entered blocking state
> [Tue Feb  4 17:50:47 2020] brq49cbe55d-51: port 1(tap039191ba-25) entered forwarding state
> 
> Syslog:
> 
> Feb  4 17:50:51 us01odc-p01-hv214 kernel: [2859840.751339] brq49cbe55d-51: port 1(tap039191ba-25) entered blocking
> state
> Feb  4 17:50:51 us01odc-p01-hv214 kernel: [2859840.751342] brq49cbe55d-51: port 1(tap039191ba-25) entered disabled
> state
> Feb  4 17:50:51 us01odc-p01-hv214 kernel: [2859840.751450] device tap039191ba-25 entered promiscuous mode
> Feb  4 17:50:51 us01odc-p01-hv214 systemd-networkd[781]: tap039191ba-25: Gained carrier
> Feb  4 17:50:51 us01odc-p01-hv214 libvirtd[37317]: 2020-02-05 01:50:51.386+0000: 37321: warning :
> qemuDomainObjTaint:5602 : Domain id=15 name='instance-00002164' uuid=33611060-887a-44c1-a3b8-1c36cb8f9984 is tainted:
> host-cpu
> Feb  4 17:50:51 us01odc-p01-hv214 systemd-udevd[238052]: link_config: autonegotiation is unset or enabled, the speed
> and duplex are not writable.
> Feb  4 17:50:51 us01odc-p01-hv214 networkd-dispatcher[1214]: WARNING:Unknown index 32 seen, reloading interface list
> Feb  4 17:50:51 us01odc-p01-hv214 dnsmasq[28739]: reading /etc/resolv.conf
> Feb  4 17:50:51 us01odc-p01-hv214 dnsmasq[28739]: using nameserver 127.0.0.53#53
> Feb  4 17:50:51 us01odc-p01-hv214 kernel: [2859840.751683] brq49cbe55d-51: port 1(tap039191ba-25) entered blocking
> state
> Feb  4 17:50:51 us01odc-p01-hv214 kernel: [2859840.751685] brq49cbe55d-51: port 1(tap039191ba-25) entered forwarding
> state
> Feb  4 17:50:51 us01odc-p01-hv214 dnsmasq[28739]: reading /etc/resolv.conf
> Feb  4 17:50:51 us01odc-p01-hv214 dnsmasq[28739]: using nameserver 127.0.0.53#53
> Feb  4 17:50:52 us01odc-p01-hv214 systemd-networkd[781]: tap039191ba-25: Gained IPv6LL
> Feb  4 17:50:52 us01odc-p01-hv214 dnsmasq[28739]: reading /etc/resolv.conf
> Feb  4 17:50:52 us01odc-p01-hv214 dnsmasq[28739]: using nameserver 127.0.0.53#53
> 
> 
> -----Original Message-----
> From: Jeremy Stanley <fungi at yuggoth.org> 
> Sent: Tuesday, February 4, 2020 4:01 AM
> To: openstack-discuss at lists.openstack.org
> Subject: Re: Virtio memory balloon driver
> 
> On 2020-02-03 23:57:28 +0000 (+0000), Albert Braden wrote:
> > We are reserving 2 CPU and 16G RAM for the hypervisor. I haven't
> > seen any OOM errors. Where should I look for those?
> 
> [...]
> 
> The `dmesg` utility on the hypervisor host should show you the
> kernel's log ring buffer contents (the -T flag is useful to
> translate its timestamps into something more readable than seconds
> since boot too). If the ring buffer has overwritten the relevant
> timeframe then look for signs of kernel OOM killer invocation in
> your syslog or persistent journald storage.