Neutron dhcp-agent scalability techniques
One other thing that comes to mind at 30 seconds is spanning-tree port
forwarding delay. PXE boot often thinks once carrier is up, that it
can try and send/receive packets, however switches may still block
traffic waiting for spanning-tree packets. Just from a limiting
possible issues, it might be a good thing to double check network side
to make sure "portfast" is the operating mode for the physical ports
attached to that flat network. What this would look like is the
machine appears to DHCP, but the packets would never actually reach
the DHCP server.
On Tue, Oct 8, 2019 at 9:55 AM fsbiz at yahoo.com <fsbiz at yahoo.com> wrote:
> Thanks Julia. We have set the port_setup_delay to 30.
> # Delay value to wait for Neutron agents to setup sufficient
> # DHCP configuration for port. (integer value)
> # Minimum value: 0
> port_setup_delay = 30
> >We're hoping that in the U
> >cycle, we'll finally have things in place where neutron tells ironic
> >that the port setup is done and that the machine can be powered-on,
> >but not all the code made it during Train.
> This would be perfect.
> On Tuesday, October 8, 2019, 09:32:44 AM PDT, Julia Kreger <juliaashleykreger at gmail.com> wrote:
> While not necessarily direct scaling of that subnet, you may want to
> look at ironic.conf's [neutron]port_setup_delay option. The default
> value is zero seconds, but increasing that value will cause the
> process to pause a little longer to give time for the neutron agent
> configuration to update, as the agent may not even know about the
> configuration as there are multiple steps with-in neutron, by the time
> the baremetal machine tries to PXE boot. We're hoping that in the U
> cycle, we'll finally have things in place where neutron tells ironic
> that the port setup is done and that the machine can be powered-on,
> but not all the code made it during Train.
> On Tue, Oct 8, 2019 at 9:15 AM fsbiz at yahoo.com <fsbiz at yahoo.com> wrote:
> > Hi folks,
> > We have a rather large flat network consisting of over 300 ironic baremetal nodes
> > and are constantly having the baremetals timing out during their PXE boot due to
> > the dhcp agent not able to respond in time.
> > Looking for inputs on successful DHCP scaling techniques that would help mitigate this.
> > thanks,
> > Fred.