Cascalog
Cascalog is an extension to Cascading that enables application development with Clojure.
Currently maintained by the Cascalog community.
Cascalog was created for developers who want to…
- Build data applications with Clojure or Java
- Query HDFS, databases, local data from the Clojure REPL
- Easily run arbitrary Clojure code in your queries
- Leverage the benefits of the Cascading application framework
About Cascalog
Build Data Applications with Clojure
Use regular Clojure functions as operations or filters, and because Cascalog is a Clojure dynamic programming language, you can use Cascalog in other Clojure code.
Built with the Cascading framework
Because Cascalog is built on top of the Cascading framework, this dynamic programming language inherits the value Cascading brings to app development, including: extensibility with the Cascading ecosystem, application portability and test-driven development best practices.
Ad-hoc Queries
Cascalog queries run as a series of MapReduce jobs. You can query from HDFS, various databases, and locally by making use of Cascading’s Tap abstraction.
Resources
Videos
- Cascalog: Making Data Processing Fun Again (46 min)
- Streaming MapReduce in Clojure (37 min)
- Introducing Cascalog: Functional Data Processing for Hadoop (41 min)
Tutorials
- Cascalog Tutorial
- Introducing Cascalog - Part 1
- Introducing Cascalog - Part 2
- Cascalog Impatient Series
- JCascalog
- Developing and deploying a Cascalog query on Hadoop
- Methods for handling wide sources
- Predicate Macros
- Cascalog and Hadoop Security
- Testing Cascalog with Midje - Part 1
- Testing Cascalog with Midje - Part 2