codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[all][healthcheck]


On Mon, Nov 16, 2020 at 10:21 AM Lajos Katona <katonalala at gmail.com> wrote:

> Hi,
>
> I send this mail out to summarize the discussion around Healthcheck API on
> Neutron PTG, and start a discussion how we can make this most valuable to
> the operators.
>
> On the Neutron PTG etherpad this topic is from L114:
> https://etherpad.opendev.org/p/neutron-wallaby-ptg .
>
> Background: oslo_middleware provides /healthcheck API path(see [1]), which
> can be used to poll by services like haproxy, and gives a plugin mechanism
> to add some more complicated checks, which can be switched on/off from
> config.
>
> The main questions:
>
>    - Some common guidance what to present to the operators (if you check
>    [2] and [3] in the comments there are really good questions/concerns)
>       - Perhaps the API SIG has something about healtcheck, just I can't
>       find it.
>    - What to present with and without authentication (after checking
>    again, I am not sure that it is possible to use authentication for the
>    healthcheck)
>       - A way forward can be to make it configurable with default to
>       authenticated, and give the decision to the admin.
>    - During the discussion the agreement was to separate the frontend
>    health from the backend health and use direct indicators (like working db
>    connectivity, and mq connectivity) instead of indirect indicators (like
>    agents' health).
>
> Thanks in advance for the feedback.
>
> [1]
> https://docs.openstack.org/oslo.middleware/latest/reference/healthcheck_plugins.html
> [2] https://review.opendev.org/731396
> [3] https://review.opendev.org/731554
>
> Regards
> Lajos Katona (lajoskatona)
>
>
Hi Lajos,

Bit of background in case you don't know. The oslo healthcheck middleware
is basically a combination of healthcheck middlewares carried within the
few projects ages ago bloated with the plugin framework I don't know of
anyone ever adopted using. The main point for those middlewares carried by
Swift(I think), Glance definitely and possibly some other projects before
osloing it was to give a place for load balancers to ping that does not
necessarily need to be logged every few seconds nor need to send the
excessive amounts of auth calls to keystone. If I recall correctly you can
already place it after keystone middleware if you prefer it being authed, I
don't know of anyone who does.

Main purpose was to provide a way to detect if the service is not
responding or by using the disabled by file to bleed the inflight
connections for maintenance and drop them off the pool for new requests. I
think the original implementations were somewhere around 10-20 lines of
code and did just that job pretty reliably.

Based on the plugin model, it's indeed very easy to leak information out of
that middleware and I think the plugins used need to take that into account
by careful design. I'd very much prefer not breaking the current
healthcheck and the very well stabilized API of it that has been in use for
years just because someone feels like it's a good idea to make leaky
plugins for it. Realizing that agent status might not be the right thing to
check is a good start, what you really want to have is indication is the
API service able to take in new requests or not, not if all permutations of
those requests will succeed on the current system status. Now there are
ways to chain multiples of these middlewares with different configs (with
different endpoints) and it might be worth considering having your plugins
with detailed failure conditions on the admin side that is not exposed to
the public and just very simple yei/nei on your public endpoint. Good luck
and I hope you find the correct balance of detail from the API and
usability.

Best,
Erno "jokke" Kuvaja
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20201119/20d2732f/attachment.html>