codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)


Hi Lekshmi,

your activity sounds very interesting.
One important thing to note is that Performance testing in Java is always tricky due to JIT and "warmup" phase of the JVM. Thus it is generally recommended to do these tests with JMH (https://openjdk.java.net/projects/code-tools/jmh/).

I would assume that the time for sql2rel reduces drastically (perhaps one or two orders) when run with JMH.

Best
Julian

Am 30.12.18, 23:12 schrieb "Lekshmi" <lekshmibg09@xxxxxxxxx>:

    Hello Folks,
    
    For my research activities, I was trying to perform a benchmark comparison
    between calcite with other database systems.  As an initial step, I was
    trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH queries
    were the right thing to start with. I tried running the TpchTest (
    https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java)
    by adding the *CalciteTimingTracer* in the junit tests to determine the
    execution time. While doing so, I could see that the execution time in
    calcite is significantly higher compared to postgresSql. On further
    investigation, I could see that we generate the required datas required for
    these queries(which comes around 150,000 for some tables) and I was under
    an impression that most of the time was spend on the data generation and
    that the query execution could be faster. So, I modified the relevant
    schema class (
    https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java)
    to perform the data generation and query execution separately. Then, I
    traced the time took for just query execution. Even, then there was a
    significant difference from that of PostgresSql.
    
    I, also enabled the *log4j.rootLogger* to *TRACE * to find the time spend
    for sql2rel and optimization phases of the class Prepare
    <
    https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java>.
    And, to my surprise, I could see that calcite takes a time of 355ms for
    sql2rel and 352ms for optimization for the junit test *testQuery01*. On the
    other side, the same query gave a planning time of 0.163ms in Postgres.
    
    I would like to know, if this is the right way to test the performance of
    TPCH queries using apache calcite. Can anyone let me know if there exist
    any better ways to do it.
    
    And, while searching through JIRA, I could find a ticket
    https://issues.apache.org/jira/browse/CALCITE-2169 which was created by
    Edmon Begoli for performing a comparative performance study of the calcite
    framework. I think, its related to my current problem. I have no idea
    regarding the status of the ticket. It would be really great if someone
    could help me with some information on it.
    
    Also, now coming to the personal preference, I would like to continue my
    research in calcite due to its simplicity and extensibility.  But, if I
    fail to give a good case study in favour of Calcite, I am afraid that I
    could loose an opportunity to work with calcite.
    
    Thanks and Regards
    
    Lekshmi B.G
    Email: lekshmibg09@xxxxxxxxx