Cascading Community Projects
The Cascading ecosystem is filled with support for a variety of programming languages, data sources, serializers and tools that extend the functionality of Cascading applications.
These extensions are available for use with Cascading and are contributed code from the Cascading community. Many new projects are actively available through Cascading GitHub and the Conjars Maven jar repository.
Note: Most projects are hosted on GitHub and may have multiple branches and forks as users enrich the original projects. Many are also under active development.
Supported Languages
Supported languages extend Cascading functionality with domain-specific features and functionality of another language.
Language | Project | Description | Resources | License |
---|---|---|---|---|
Clojure | Cascalog | Clojure for Cascading | GitHub Groups Issue Tracking Stack Overflow Docs Tutorials | Apache 2.0 |
Java | Cascading | GitHub Groups Docs Tutorials | Apache 2.0 | |
JRuby | Cascading.JRuby | From Etsy, JRuby for Cascading | GitHub Issue Tracking | LGPL 3 |
Clojure | PigPen | MapReduce for Clojure | GitHub | Apache 2.0 |
PMML | Pattern | PMML for Cascading | GitHub Groups Issue Tracking Docs Tutorials | Apache 2.0 |
JPMML-Cascading | From Openscoring, PMML for Cascading | GitHub Groups Issue Tracking | AGPL 3.0 | |
Python | PyCascading | From Twitter, Python for Cascading | GitHub Issue Tracking Tutorial | Apache 2.0 |
Scala | Scalding | From Twitter, Scala for Cascading | GitHub Groups Issue Tracking Stack Overflow Docs Tutorials | Apache 2.0 |
SQL | Lingual | an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop | GitHub Groups Issue Tracking Docs Tutorials Binary | Apache 2.0 |
Data Source Connectivity (Taps)
A tap is a Cascading term that refers to a physical data source. These data sources can be used as inputs and outputs in Cascading.
Data Source | Project | Description | Resources | License |
---|---|---|---|---|
Accumulo | Cascading.Accumulo | Accumulo data source for Cascading | GitHub Issue Tracking | Apache 2.0 |
Cassandra | Cascading-Cassandra | Cassandra data source for Cascading | GitHub Issue Tracking | Apache 2.0, Eclipse |
Derby | Cascading-JDBC | Derby data source for Cascading via JDBC | GitHub Issue Tracking | Apache 2.0 |
Elasticsearch | elasticsearch-hadoop | Elasticsearch data source for Cascading | GitHub Issue Tracking Tutorial Apache 2.0 | |
ElephantDB | ElephantDB | ElephantDB data source for Cascading | GitHub Issue Tracking | Custom |
ArangoDB | Guacaphant | Allows you to tap ArangoDB | Github Issue Tracking Conjars | MIT |
H2 | Cascading-JDBC | H2 data source for Cascading via JDBC | GitHub Issue Tracking | Apache 2.0 |
HBase | Cascading.HBase | HBase data source for Cascading | GitHub Tutorial | Apache 2.0 |
Hive | Cascading-Hive | Integrate and run Hive in Cascading | GitHub Issue Tracking | Apache 2.0 |
Hive | Cascading.Hive | Hive data source for Cascading | GitHub Issue Tracking | Apache 2.0 |
JDBC | Cascading-JDBC | Provides support for reading/writing data to/from an RDBMS via JDBC drivers | GitHub Issue Tracking | Apache 2.0 |
Kafka | Cascading-Local | Provide integrations with Apache Kafka | GitHub Issue Tracking | Apache 2.0 |
Oracle | Cascading-JDBC | Oracle data source for Cascading via JDBC | GitHub Issue Tracking Tutorial | Apache 2.0 |
Memcached | Cascading.Memcached | Memcached data source for Cascading | GitHub | Apache 2.0 |
MongoDB | Cascading-Mongomigrate | MongoDB data source for Cascading | GitHub | Apache 2.0 |
MySQL | Cascading-JDBC | MySQL data source for Cascading via JDBC | GitHub Issue Tracking | Apache 2.0 |
Neo4j | Cascading.Neo4j | Neo4j data source for Cascading | GitHub Issue Tracking | Apache 2.0 |
OpenCSV | Cascading-OpenCSV | A robust CSV parser | GitHub Issue Tracking | Apache 2.0 |
Parquet | Parquet-mr | Parquet data source for Cascading | GitHub Groups Issue Tracking | Apache 2.0 |
PostgreSQL | Cascading-JDBC | PostgreSQL data source for Cascading via JDBC | GitHub Issue Tracking | Apache 2.0 |
Redshift | Cascading-JDBC | Amazon Redshift data source for Cascading via JDBC | GitHub Issue Tracking Tutorial | Apache 2.0 |
S3 | Cascading-Local | Provide integrations with Amazon S3 | GitHub Issue Tracking | Apache 2.0 |
SimpleDB | Cascading.SimpleDB | From Scale Unlimited, SimpleDB data source for Cascading | GitHub Issue Tracking | Apache 2.0 |
Solr | Cascading.Solr | From Scale Unlimited, Solr data source for Cascading | GitHub Issue Tracking | Custom |
Splunk | Tbana | Splunk data source for Cascading | GitHub Issue Tracking | Apache 2.0 |
Teradata | Cascading-JDBC | Teradata data source for Cascading via JDBC | GitHub Issue Tracking Tutorial | Apache 2.0 |
Serializers
Serializers provide integration with Cascading by translating data objects into other formats that can be stored and reconstructed.
Serializer | Project | Description | Resources | License |
---|---|---|---|---|
Avro | Cascading.Avro | From Scale Unlimited, data serialization for Apache Avro | GitHub Issue Tracking | Apache 2.0 |
JSON | Cascading.JSON | JavaScript Object Notation (JSON) utility classes for Cascading | GitHub Issue Tracking | GNU |
Kryo | Cascading.Kryo | Provides a drop-in Kryo serialization for your Cascading (or Hadoop) workflow | GitHub Issue Tracking | Eclipse |
Protocol Buffers | Cascading2-protobufs | From Square, library for working with Protocol Buffers | GitHub Issue Tracking | MIT |
Thrift | Cascading-Thrift | Serializer and raw comparator for using TBase and TEnum objects in Hadoop | GitHub Issue Tracking | Custom |
Tools
Cascading tools help create, debug, maintain, and otherwise support Cascading apps and functionality.
Project | Project | Description | Resources | License |
---|---|---|---|---|
Activator Scalding | From Typesafe, an integration between Scalding and Typesafe Activator | GitHub Issue Tracking | Apache 2.0 | |
Bixo | Web mining toolkit that runs as a series of Cascading pipes | GitHub Issue Tracking Tutorial | Apache 2.0 | |
Cascading-helpers | From Square, functions, filters, and other tools for Cascading | GitHub Issue Tracking | Apache 2.0 | |
Cascading-dbmigrate | Tool to migrate relational databases into Hadoop | GitHub Issue Tracking | Apache 2.0 | |
Cascading_ext | From LiveRamp, a collection of tools to build, debug, and run data workflows | GitHub Issue Tracking | Apache 2.0 | |
Cascading-simhash | Simhashing is an algorithm that calculates “group id” (minimum hash) content | GitHub Issue Tracking | GPL 3 | |
Cascading-tube | Tiny wrapper around Hadoop for chaining operations | GitHub Issue Tracking | Apache 2.0 | |
Cascading.utils | Set of utilities for Cascading workflows for various projects | GitHub Issue Tracking | Apache 2.0 | |
Conjecture | From Etsy, a framework for building machine learning models in Hadoop using Scalding | GitHub Issue Tracking | MIT | |
Fluid | a Fluent API for Cascading | GitHub Issue Tracking | Apache 2.0 | |
Jading | From Etsy, a build and execution tool for Cascading.JRuby that handles packaging for execution on Hadoop | GitHub Issue Tracking Tutorial | Custom | |
Lein-Cascading | Leiningen is for automating Clojure projects | GitHub Issue Tracking | Apache 2.0 | |
Lingual | an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop | GitHub Groups Issue Tracking Binary | Apache 2.0 | |
Load | a command line interface for load testing and benchmarking | GitHub Issue Tracking Binary | Apache 2.0 | |
Multitool | a command line interface for building data processing jobs | GitHub Binary | Apache 2.0 | |
Riffle | Library for executing collections of dependent processes as a single process | GitHub Issue Tracking | Apache 2.0 | |
Plunger | From Hotels.com, this is a unit testing framework for Cascading applications to simplify automated tests for cascades, flows, assemblies and operations | GitHub Issue Tracking | Apache 2.0 | |
ScaldingUnit | Scalding unit testing library for test-driven development | GitHub Issue Tracking | Apache 2.0 | |
Scalding-REPL | From Twitter, REPL environment to prototype Scalding code and explore data sets with Scalding | GitHub Issue Tracking Tutorial | Apache 2.0 | |
Scaldual | From Twitter, Scaldual makes it easier for Scalding users to take advantage of Lingual. | GitHub Issue Tracking | Apache 2.0 |
Have a related Cascading project that’s not listed?
Let us know by emailing the mail list.