codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Joining bounded and unbounded data not working using non-global window


Hi Shrijit,

+dev@xxxxxxxxxxxxxxx to fact check my memory here and re-raise the issue

You have hit a known usability problem. It has been discussed but not addressed due to focusing on more holistic fixes, and also backwards compatibility concerns... if someone was counting on the very unfortunate current behavior.

You have this triggering set up:

    .triggering(
        AfterProcessingTime
            .pastFirstElementInPane()
            .plusDelayOf(Duration.standardSeconds(20)))

On our original design for triggers, this will fire only one time and then "close" the window. I believe we agreed on the dev list that it should actually also emit the remaining data at window expiration time.

What you want is probably this:

    .triggering(
        Repeatedly.forever(
            AfterProcessingTime
                .pastFirstElementInPane()
                .plusDelayOf(Duration.standardSeconds(20))))

But the recommended approach is to always use the AfterWatermark trigger with early/late firings, so it would look like this:

    .triggering(
        AfterWatermark.pastEndOfWindow()
            .withEarlyFirings(
                AfterProcessingTime
                    .pastFirstElementInPane()
                    .plusDelayOf(Duration.standardSeconds(20))))))

This will emit a designated "on time" output as well as early firings according to the latency you have set up.

If this does not solve your problem, can you say more about what is going wrong?

Kenn

On Mon, Dec 10, 2018 at 7:41 PM Reza Ardeshir Rokni <rarokni@xxxxxxxxx> wrote:
Hi,

A couple of thoughts;

1- If the amount of data in Hbase that you need to join with is small and does not change, could you use a Side Input? If it does change you could try making use of pattern slowly changing lookup cache (ref below). 
2- If the amount of data is large, would a direct hbase client call from a DoFn work to get the data you need to enrich the element? Similar to pattern Calling external service, (ref below)


Cheers

Reza

On Tue, 11 Dec 2018 at 00:12, Shrijit Pillai <pillaishrijit5022@xxxxxxxxx> wrote:
Hello,

I'm trying to join an unbounded data source and a bounded one using CoGroupByKey. The bounded data source is HBase and the unbounded one is Kafka.

The co-group works if the global window strategy is used but not with a non-global one. I've tried the accumulatingFiredPanes mode(using the non-global window) but that didn't help either. Am I missing something to make the co-group work using a non-global window like FixedWindows or is the GlobalWindow the only way to go about it? I'm using beam 2.8.0

Here's the code snippet:
https://gist.github.com/shrijitpillai/5e9e642f92dd23b3b7bd60e3ce8056bb

Thanks
Shrijit