Cascading on Apache Flink

High Performance and Low-Latency Batch Processing on Apache Flink.

Currently maintained by the Apache Flink community.

Cascading applications that require high performance or low-latency batch processing modes can leverage the Apache Flink^TM open source platform for distributed stream and batch data processing. This project was contributed by data Artisans and allows existing Cascading-MapReduce users to port their applications to Apache Flink with virtually no code changes.

About Cascading on Flink

Apache Flink^TM is a replacement for MapReduce to support large-scale batch workloads and streaming data flows. It eliminates the concept of mapping and reducers and leverages in-memory storage, resulting in significant performance gains over MapReduce.

With Cascading on Flink, Cascading programs taking advantage of its unique set of runtime features:

Flexible network stack which supports low-latency pipelined data transfers as well as batch transfers for massive scale-out.
Active memory management and custom serialization stack which enables highly efficient operations on binary data and effectively prevent JVM OutOfMemoryErrors as well as frequent Garbage Collection pauses.
In-memory operators that gracefully go to disk in case of scarce memory resources.
Memory-safe execution means very little parameter tuning is necessary to reliably execute Cascading programs on Flink^TM.

Cascading user can port their MapReduce applications to run on Apache Flink with virtually no code changes.

Source and Documentation

Apache®, Apache Flink™, Flink™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.