[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Question about Flink optimizer on Stream API


I was reading some FLIP documents related to the new design of the Flink
Schedule [1] and unification of batch and stream [2]. Then I created two
different programs to learn how Flink optimizes the Query Plan in Batch and
in Stream mode (and how much further it goes). One using batch [3] and one
using Stream [4]. During the code debugging and also as it is depicted on
the document [2], the batch program uses the
org.apache.flink.optimizer.Optimizer class which generates a
"org.apache.flink.optimizer.plan.OptimizedPlan" while stream program uses
the "org.apache.flink.streaming.api.graph.StreamGraph" and every
transformation inside the packet

When I am showing the execution plan with "env.getExecutionPlan()" I see
exactly I have written on the Flink program (which it is expected).
However, I was looking for where I can see the optimized plan. I mean
decisions of operators reordering based on cost or statistics. For batch I
could find the "org.apache.flink.optimizer.costs.CostEstimator" and
"org.apache.flink.optimizer.DataStatistics". But for Stream I only found
the creation of the plan. How can I debug that? Or have a better
understanding of what Flink is doing. Do you advise me to read some other
reference about this?

Kind Regards,

[1] Group-aware scheduling for Flink -
[2] Unified Core API for Streaming and Batch -

*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *