News and Announcements
Memcached, Membase, and ElasticSearch Integration
We have added a link to the Cascading.Memecached project on GitHub to the Modules and Extensions page.
This sub-project provides Memcached API integration allowing Cascading Flows to push data into various memcached API compliant applications like ElasticSearch and Membase.
O’Reilly Strata Conference – Last Call
Only a short period left to submit your proposal for the O’Reilly Strata Conference before Sept 28th.
Bixo Hackathon
There will be a Bixo hackathon in Nevada City, CA this Sept 7th and 8th. Read more about it here.
Note that even if you’re not a hard-core Bixo user, fringe benefits from participating include learning a lot about the very useful underlying technologies (Cascading, Hadoop, HttpClient) as well as getting an excuse to visit beautiful Nevada City.
Hope to see you there.
O’Reilly Strata Conference
The new Strata Conference has just been announced with a Call for Proposals ending Sept 28.
This new conference is on the ‘business of data’ and is the sister conference to Velocity.
Hope to see lots of proposals coming in from Hadoop, Cascading, Bixo, and Cascalog users and developers.
Cascading 1.1.2
We are happy to announce that Cascading 1.1.2 is now publicly available for download.
This release features many bug fixes.
For a detailed list of changes see:
CHANGES.txt
This release will run against Hadoop 0.18.3, 0.19.x, and 0.20.x. Including Amazon Elastic MapReduce.
Note the tests will not compile or run against Hadoop 0.18.3 due to package changes since that version.
BigDataCamp 2010
Quick note that Chris will be at the BigDataCamp on June 28, 2010, the night before the Hadoop Summit. Register now before all the seats are taken.
Cascading 1.1.0 Available
We are happy to announce that Cascading 1.1.0 is now publicly available for download.
This release features many performance and usability enhancements while remaining backwards compatible with 1.0.
Specifically:
Performance optimizations with all join types Numerous job planner optimizations Dynamic optimizations when running in Amazon Elastic MapReduce and S3 API usability improvements around large number of field names Support for TSV, CSV, and custom delimited text files Support for manipulating and serializing non-Comparable custom Java types Debug levels supported by the job planner For a detailed list of changes see:
Interview on Parallel Programming
A very interesting interview with Billy Newport on InfoQ about “the need for higher level abstraction to do parallel programming with multi-core systems effectively.”
“Billy Newport is a Distinguished Engineer working on WebSphere eXtreme Scale (ObjectGrid) and on WebSphere high availability.”
Cascalog: An Interactive Query Language
[Nathan Marz]() has just announced and released Cascalog.
Cascalog is an interactive query language for Hadoop with a focus on simplicity, expressiveness, and flexibility intended to be used by Analysts and Developers alike.
Cascalog eschews the SQL syntax for a simpler and more expressive syntax based on Datalog.
With this added expressiveness, Cascalog can query existing data stores “out of the box” with no required data “importing” or “under the hood” configuration necessary.
Karmasphere Studio Ships with Cascading
The recently released Karmasphere Studio 1.2 now includes support for Cascading 1.0 in the free community download.
Karmasphere Studio is an IDE and Debugger for Hadoop MapReduce application developers that also includes integration with the Amazon Web Services platform.
And with Cascading support directly in the Debugger and IDE, developers can even more quickly develop and debug complex Hadoop jobs.
Also worthy of note, Karmasphere recently received $5M Series A funding.
Cascading 1.1 RC3 Available
Cascading 1.1 RC3 is now available from the downloads page.
Note we are no longer serving downloads from Google Code but from links off the download page.
Cascading-DBMigrate
Nathan at BackType has announced and released Cascading-DBMigrate.
In short, DBMigrate is a more flexible and reliable alternative to Sqoop for moving data to/from a relational data store.
Cascading.JDBC has been around for quite a while, but DBMigrate overcomes some of the limitations when dealing with MySQL servers (AsterData did not have the same limitations) and OFFSET/LIMIT queries.
Riffle: Lightweight Workflow
Riffle has been announced on the Mahout mailing list.
Riffle is a lightweight Java library for executing collections of dependent processes as a single process. It is Apache licensed so it can be included in non-GPL compatible projects.
The next major version of Cascading (1.2) will support the Riffle annotations so that projects like Mahout and Pig can participate in a Cascading Cascade execution.
Riffle can be found on its GitHub project page.
Cascading 1.1 RC1 Available
Cascading 1.1 RC1 is now available from the downloads page.
You can read about all the changes in the CHANGES.txt file.
Note we are no longer serving downloads from Google Code but from links off the download page.
Cascading at RazorFish and AWS
Check out the new Case Study published by Amazon on User Segmentation at RazorFish.
SimpleDB Support
Bixo Labs has recently announced a new project for integrating Hadoop and Cascading with Amazon Simple DB. Check it out on GitHub at cascading.simpledb.
This is in part a result of their Public Terabyte Dataset Project in AWS.
Cascading 1.1 User Guide Draft
In anticipation for the Cascading 1.1 release this month, we have published a draft of the 1.1 User Guide.
Please feel free to review and email in any comments or suggestions to the mailing list.
To download the most recent build of Cascading 1.1, please visit the download page at Concurrent. There are plans to have a 1.1 final release candidate available on the community site this week.
NoSQL East
If you are in Atlanta, check out Chris Curtin’s Cascading presentation on Oct 28-30, 2009, at NoSQL East.