codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hbase state backend in Flink


FWIW, one major advantage of adopting HBase as Flink statebackend is to
support direct read/write on DFS, so as to disaggregate storage and compute
(DisAgg).  DisAgg has several benefits, such as supporting elastic
computing in cloud, much better (order of magnitude) recovery speed when
rescaling up/down (as Gyula also mentioned), etc. and we could eliminate
the performance regression compared to local RW through techniques like
adding a local L2 cache. More information please refer to our talk
<https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf>
at this year's Flink Forward China, and we could discuss more in another
thread if interested.

Back to @Naveen's question here, we need to make HBase supporting embedded
mode first before adopting it as Flink statebackend. We have done some
initial work and please refer to HBASE-17743
<https://issues.apache.org/jira/browse/HBASE-17743> and the design doc
there for more details. And for sure we will upstream our work when ready
to (smile).

Best Regards,
Yu


On Fri, 28 Dec 2018 at 13:12, Chen Qin <qinnchen@xxxxxxxxx> wrote:

> Hi Naveen,
>
> AFAIK, there are two level of storage in typical statebackend
> (local/remote). I think it kinda similar to what PC main memory and disk
> analogy.
>
> Take RocksDB Statebackend as example, window state (typical very large
> ListState) persisted in partitioned local rocksdb files, adding element to
> window is localized and cheap.When checkpoint starts, each of those rocksdb
> do upload to corresponding HDFS directories separately.This is good in a
> sense when any intermediate states between two successful checkpoints can
> be overwritten and local snapshots can be done cheaply and asynchronously.
>
> I heard folks tried to build mysqlbackend(deprecated), remote rocksdb as
> service backend(hard to scale and performance bottleneck) , Cassandra(hard
> to snapshot). All of which shares same trait on lack of local
> parallelizable snapshot semantic.
>
> Hope this helps!
> Chen
>
> On Thu, Dec 27, 2018 at 8:27 AM miki haiat <miko5054@xxxxxxxxx> wrote:
>
> > Did try to use rocksdb[1] as state backend?
> >
> >
> > 1.
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend
> >
> >
> > On Thu, 27 Dec 2018, 18:17 Naveen Kumar <naveenkumar.g@xxxxxxxxxxxx
> > .invalid
> > wrote:
> >
> > > Hi,
> > >
> > > I am exploring if we can plugin hbase as state backend in Flink. We
> have
> > > need for streaming jobs with large window states, high throughput and
> > > reliability.
> > >
> > > I wanted to know if implementing Flink backend in Hbase or other
> > > distributed KV store is possible. Any documentation or pointers will be
> > > helpful.
> > >
> > > Thanks,
> > > Naveen
> > >
> >
>