codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Customised alerts/notifications and enhancements to alerting/notifications on Airflow


FYI I am on the Airflow Slack but only check it on weekends mostly.

Here is a gist with my implementation of a Slack callback, which attaches
icons/buttons/emoji/etc.:
https://gist.github.com/Eronarn/99408c0e5b0dd964487a5eea64b34f6d

On Wed, Nov 14, 2018 at 1:54 PM Sai Phanindhra <phani8996@xxxxxxxxx> wrote:

> Thanks James for the input.
> For the problems i specified above, i build hacky solutions like adding one
> `*slack_start_notification_operation*` in beginning, `
> *slack_end_notification_operator*` in the end and `
> *slack_failed_notification_operation*` when upstream fails. This addresses
> first 3 issues/feature requirements i spoke about. I am maintaining lists
> for emails and at dag level i'm joing all required emails for addressing
> point 5. Still i feel like this is manual work and need to be done every
> time a new dag onboards in airflow. I feel like these are common problems
> many of the airflow users/developers face.
> @James <jmeickle@xxxxxxxxxxxxxx> lets catch up someone on slack/hangout to
> discuss how these enhancements can be done.
>
>
> On Thu, 15 Nov 2018 at 00:10, James Meickle <jmeickle@xxxxxxxxxxxxxx
> .invalid>
> wrote:
>
> > As the author of the first linked PR, I think your points are good. Here
> is
> > my attempt to address them:
> >
> > 1: It is possible to do this today if you write a Slack callback. I would
> > be happy to share my code for this if you're having trouble integrating
> > Slack. That being said, it would be great if Airflow provided several
> > "default" callbacks for common platforms like Slack and Pagerduty.
> >
> > 2/3: Yes, Airflow should add callbacks for the DAG lifecycle, too. DAG
> > "SLAs" on the other hand, I am not sure would provide any additional
> value,
> > and have a high chance of being misused.
> >
> > 4: That's a great idea. My PR would make adding this very easy, because
> it
> > redefines the "SLAMiss" object as having a "type" of SLA miss. This would
> > involve adding a new type to the enum, and some logic to check when to
> > create an SLA miss of this type.
> >
> > 5: My interpretation is that you mean an email address that always gets
> > notified, regardless of any more specific users that a task says it
> should
> > email. (So not a default value to "emails", but instead an additional
> value
> > that is always added.) I think this makes a lot of sense and would be
> easy
> > to add to email. It would not be even remotely possible for a Slack
> > integration right now, since there's no unified code for that.
> >
> > My preferred way of addressing this would be to get my PR merged as a
> > starting point, which isolates a lot of this functionality from the
> > scheduler code. Then have a broader AIP created, or possibly a pair of
> > them: switching to a more general evented system for Airflow model
> > lifecycles, and implementing pluggable notifiers (right now a lot of the
> > email functionality is hardcoded) the same way that there is already
> > pluggable logging.
> >
> > From an SRE perspective, two other pain points we run into: the statsd
> > integration is subpar (at least when we ingest it in Datadog it's hard to
> > actually alert on), and there's no /health or /healthz endpoints for the
> > scheduler and worker so it's hard to know if they are healthy in a
> > programmatic way.
> >
> > On Wed, Nov 14, 2018 at 1:06 PM Niels Zeilemaker <niels@xxxxxxxxxxxxx>
> > wrote:
> >
> > > I had a go once to introduce something similar, but never got it
> merged.
> > > Maybe you can use it as an inspiration.
> > >
> > > https://github.com/apache/incubator-airflow/pull/2412
> > >
> > > Niels
> > >
> > > Op wo 14 nov. 2018 16:43 schreef Sai Phanindhra <phani8996@xxxxxxxxx:
> > >
> > > > Above mentioned PR address issues/bugs in current functionality. I
> want
> > > to
> > > > add more mediums of alerting which includes SLA.
> > > >
> > > > On Wed, 14 Nov 2018 at 20:51, airflowuser
> > > > <airflowuser@xxxxxxxxxxxxxx.invalid> wrote:
> > > >
> > > > > There is a pending PR to refactor the SLA:
> > > > > https://github.com/apache/incubator-airflow/pull/3584
> > > > >
> > > > > But it requires more reviews from committers.
> > > > >
> > > > >
> > > > > Sent with ProtonMail Secure Email.
> > > > >
> > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > > On Wednesday, November 14, 2018 5:11 PM, Sai Phanindhra <
> > > > > phani8996@xxxxxxxxx> wrote:
> > > > >
> > > > > > Hello airflow committers and maintainers,
> > > > > > I came across sla in airflow. It's a very good feature to begin
> > > > > > with. I feel like few enhancements can be done. These
> enhancements
> > > are
> > > > > not
> > > > > > limited to just sla, they basically are voids i felt when im
> using
> > > > > airflow.
> > > > > > Im listing few of them here.
> > > > > >
> > > > > > 1.  SLA alerts to slack channel(s) along with emails
> > > > > > 2.  Alerts at DAG level(starting, success and failure).
> > > > > > 3.  custom callbacks just like `*on_failure_callback*`,
> > > > > `*on_retry_callback*` and `*on_success_callback*` on DAG level.
> > > > > > 4.  Alerts if task gets completed before minimum run time(This is
> > > > really
> > > > > >     a rare case. But there will be few long running jobs that we
> > know
> > > > > for sure
> > > > > >     runs for at least few hours and if they exit before that it
> > means
> > > > > something
> > > > > >     wrong. We need warning alerts for such cases.)
> > > > > >
> > > > > > 5.  Default/Global Alert config(default emails to send all alerts
> > > > and/or
> > > > > >     slack channel to send alerts)
> > > > > >
> > > > > >     Some of these might have already been solved or someone is
> > > working
> > > > to
> > > > > >     solve. Please share your thoughts and add anything else i
> > missed
> > > to
> > > > > this
> > > > > >     list.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Sai Phanindhra,
> > > > Ph: +91 9043258999
> > > >
> > >
> >
>
>
> --
> Sai Phanindhra,
> Ph: +91 9043258999
>