codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)


Hi Andrei,

I tried to have a look in the PR but I am not familiar with the Elastic
adapter code so I am missing a lot of context.

In general, I think that relying on the on the field names coming from a
RelDataType it is not safe. As you observed, it can get messed up quite
easily from many parts of Calcite.
It is usually better to try to reason using the position and/or the type of
the field rather than its name.

Other than that it seems to me _MAP['_id'] will become a RexCall similar to
ITEM($0, '_ID') so it appears to me that you can retrieve the field name by
implementing a RexVisitor.
I don't see clearly why you need to set explicitly aliases but probably is
because I am missing implementation details related with the adapter itself.

Best,
Stamatis

Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde <jhyde@xxxxxxxxxx>
έγραψε:

> I added comments to the case. I did not look at the PR, sorry.
>
> > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
> >
> > Hello,
> >
> > Can somebody pls take a look at the following PR:
> >
> > PR: https://github.com/apache/calcite/pull/982
> > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> >
> > Specifically, I would like to understand if adding explicit mapping
> > <
> https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75
> >
> > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a correct
> > (and recommended) approach. The reason I need it is because the original
> > item name (eg. `_MAP['foo.bar']`) is lost after converter operation (ie
> I'm
> > not able to derive it from `input.getRowType()`)
> >
> > ```sql
> > -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and
> 'foo.bar'
> > select _MAP['_id'], _MAP['foo.bar'] from elastic
> >
> > -- Query below preserves attribute information
> > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> > \"foo.bar\" from elastic)
> > ```
> >
> > Many thanks,
> > Andrei.
> >
> >
> >
> >
> > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <zabetak@xxxxxxxxx>
> > wrote:
> >
> >> Hi Andrei,
> >>
> >> Many data management systems have internal columns which the users may
> be
> >> able to query (see for example Postgres [1]). It doesn't sound
> unnatural to
> >> allow users access these fields. On the other hand such system columns
> do
> >> not appear in SELECT * queries and I suppose they are pruned out by
> >> projections early on if they are not used by the query. I guess you
> could
> >> take a similar approach in the ElasticSearch adapter.
> >>
> >> Best,
> >> Stamatis
> >>
> >> [1] https://www.postgresql.org/docs/current/ddl-system-columns.html
> >>
> >>
> >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda
> <andrei@xxxxxxxxx>
> >> έγραψε:
> >>
> >>> Hello,
> >>>
> >>> Upon indexing, elastic allocates a unique _id
> >>> <
> >>>
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> >>>>
> >>> for each document (unless specified by the user). Currently when
> mapping
> >>> between elastic result and calcite return type we query only _source or
> >>> fields attributes. _id is not part those but exposed at higher level in
> >>> query response (see SearchHit
> >>> <
> >>>
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> >>>>
> >>> ).
> >>>
> >>> Currently  _MAP['foo'] becomes _source.get('foo') (or
> fields.get('foo')).
> >>>
> >>> Do you think we should make _MAP['_id'] available implicitly ?
> >>>
> >>> I have a couple of use-cases where one needs to know document ID.
> >>>
> >>> Please note this makes sense only for non-aggregate queries.
> >>>
> >>> Regards,
> >>>
> >>> Andrei.
> >>>
> >>
>
>