Cascading 4 Adds Native JSON Support
Work on Cascading 4 continues with new support for native JSON data types while opening the door for uniform support of other nested data types.
With the latest WIP release, native JSON data type support has been added through
the new JSONCoercibleType
class, and a set of operations that allow for efficient transformations of JSON object types.
This includes operations for declaratively building new JSON objects from other primitive values or other JSON objects found in argument Tuples.
And operations for selectively copying and transforming child trees from parent objects into new JSON object, while optionally
applying a lambda transform (for example, coercing all value
elements to a float
type).
Chaining these Cascading Functions
and Aggregators
allow for complex yet maintainable and testable transformations, while
the JSONCoercibleType
class allows for transparent support of JSON text from various unstructured and structured
data sources via the TextLine
and TextDelimited
Scheme
s.
For a comprehensive set of use-cases, see the JSON test cases.
This functionality is powered by a new stand-alone project Pointer-Path. Where the JSON functionality has been abstracted away from the core APIs allowing for new nested data type providers to be implemented; e.g. XML DOM and POJO support. Pointer-Path will eventually supply the necessary providers enabling Cascading to support newer nested types.
Checkout the JavaDoc for more information:
- http://docs.concurrentinc.com/cascading/4.0/javadoc/cascading-nested-json/
- http://docs.concurrentinc.com/cascading/4.0/javadoc/cascading-nested/
Note the current WIP releases don’t support JSON on MapReduce or Tez, only local mode.