Cascading Load and Multitool
After a bit of work, we have repackaged both Cascading Load and Multitool giving them helper bash wrappers for installing, running, and updating. The new packages are on the download page.
After unpacking, multitool for example, just run ./bin/multitool install
or ./bin/multitool help
for more information.
Multitool is a command line interface for running sed and grep like application on Apache Hadoop. It even supports joins across multiple files. It’s perfect for finding files or creating large test datasets from larger ones.
Cascading.Load is a command line tool for creating complex loads on a Apache Hadoop cluster for performance tuning.
Both tools are based on Cascading, of course.