codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Relational algebra and signal processing


Hi Michael,

yes, our workloads are usually in the context of streaming (but for replay or so we also use batch).
But, if I understand it correctly, the same theory applies to both, tables ("relations") and streaming tables, or?
I hope to find time soon to write a PLC4X - Calicte source which creates one or many streams based on readings from a plc.

Julian

Am 18.12.18, 03:19 schrieb "Michael Mior" <mmior@xxxxxxxxxx>:

    Perhaps you've thought of this already, but it sounds like streaming
    relational algebra could be a good fit here.
    
    https://calcite.apache.org/docs/stream.html
    --
    Michael Mior
    mmior@xxxxxxxxxx
    
    
    Le dim. 16 déc. 2018 à 18:39, Julian Feinauer <j.feinauer@xxxxxxxxxxxxxxxxx>
    a écrit :
    
    > Hi Calcite-devs,
    >
    > I just had a very interesting mail exchange with Julian (Hyde) on the
    > incubator list [1]. It was about our project CRUNCH (which is mostly about
    > time series analyses and signal processing) and its relation to relational
    > algebra and I wanted to bring the discussion to this list to continue here.
    > We already had some discussion about how time series would work in calcite
    > [2] and it’s closely related to MATCH_RECOGNIZE.
    >
    > But, I have a more general question in mind, to ask the experts here on
    > the list.
    > I ask myself if we can see the signal processing and analysis tasks as
    > proper application of relational algebra.
    > Disclaimer, I’m mathematician, so I know the formals of (relational)
    > algebra pretty well but I’m lacking a lot of experience and knowledge in
    > the database theory. Most of my knowledge there comes from Calcites source
    > code and the book from Garcia-Molina and Ullman).
    >
    > So if we take, for example, a stream of signals from a sensor, then we can
    > of course do filtering or smoothing on it and this can be seen as a mapping
    > between the input relation and the output relation. But as we usually need
    > more than just one tuple at a time we lose many of the advantages of the
    > relational theory. And then, if we analyze the signal, we can again model
    > it as a mapping between relations, but the input relation is a “time
    > series” and the output relation consists of “events”, so these are in some
    > way different dimensions. In this situation it becomes mostly obvious where
    > the main differences between time series and relational algebra are. Think
    > of something simple, an event should be registered, whenever the signal
    > switches from FALSE to TRUE (so not for every TRUE). This could also be
    > modelled with MATCH_RECOGNIZE pretty easily. But, for me it feels
    > “unnatural” because we cannot use any indices (we don’t care about the
    > ratio of TRUE and FALSE in the DB, except for probably some very rough
    > outer bounds). And we are lacking the “right” information for the optimizer
    > like estimations on the number of analysis results.
    > It gets even more complicated when moving to continuous valued signals
    > (INT, DOUBLE, …), e.g., temperature readings or something.
    > If we want to analyze the number of times where we have a temperature
    > change of more than 5 degrees in under 4 hours, this should also be doable
    > with MATCH_RECOGNIZE but again, there is no index to help us and we have no
    > information for the optimizer, so it feels very “black box” for the
    > relational algebra.
    >
    > I’m not sure if you get my point, but for me, the elegance of relational
    > algebra was always this optimization stuff, which comes from declarative
    > and ends in an “optimal” physical plan. And I do not see how we can use
    > much of this for the examples given above.
    >
    > Perhaps, one solution would be to do the same as for spatial queries (or
    > the JSON / JSONB support in postgres, [3]) to add specialized indices,
    > statistics and optimizer rules. Then, this would make it more “relational
    > algebra”-esque in the sense that there really is a possibility to apply
    > transformations to a given query.
    >
    > What do you think? Do I see things to complicated or am I missing
    > something?
    >
    > Julian
    >
    > [1]
    > https://lists.apache.org/thread.html/1d5a5aae1d4f5f5a966438a2850860420b674f98b0db7353e7b476f2@%3Cgeneral.incubator.apache.org%3E
    > [2]
    > https://lists.apache.org/thread.html/250575a56165851ab55351b90a26eaa30e84d5bbe2b31203daaaefb9@%3Cdev.calcite.apache.org%3E
    > [3] https://www.postgresql.org/docs/9.4/datatype-json.html
    >
    >