Cascading on Apache Flink
High Performance and Low-Latency Batch Processing on Apache Flink.
Currently maintained by the Apache Flink community.
Cascading applications that require high performance or low-latency batch processing modes can leverage the Apache FlinkTM open source platform for distributed stream and batch data processing. This project was contributed by data Artisans and allows existing Cascading-MapReduce users to port their applications to Apache Flink with virtually no code changes.
About Cascading on Flink
Apache FlinkTM is a replacement for MapReduce to support large-scale batch workloads and streaming data flows. It eliminates the concept of mapping and reducers and leverages in-memory storage, resulting in significant performance gains over MapReduce.
With Cascading on Flink, Cascading programs taking advantage of its unique set of runtime features:
- Flexible network stack which supports low-latency pipelined data transfers as well as batch transfers for massive scale-out.
- Active memory management and custom serialization stack which enables highly efficient operations on binary data and effectively prevent JVM OutOfMemoryErrors as well as frequent Garbage Collection pauses.
- In-memory operators that gracefully go to disk in case of scarce memory resources.
- Memory-safe execution means very little parameter tuning is necessary to reliably execute Cascading programs on FlinkTM.
Cascading user can port their MapReduce applications to run on Apache Flink with virtually no code changes.
Source and Documentation
Apache®, Apache Flink™, Flink™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.