Pages
Archives
- September 2011
- April 2011
- February 2011
- January 2011
- November 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- February 2010
- January 2010
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- May 2006
- March 2006
- February 2006
- January 2006
- December 2005
- October 2005
- September 2005
- August 2005
- May 2004
- March 2002
- October 2001
Cascading’s Logparser example in Clojure
I’ve long been interested in working with Cascading but didn’t relish the thought of jumping back into Java. Thankfully with the arrival of Clojure I can now happily play in my less-typing world.
I wanted to get a basic Cascading demo running so I took the Logparser example from the distribution and ported it over to Clojure. Here’s a zip (casclojure.zip) with the Ant build.xml, the Clojure source, the directory layout, etc. I’ll leave it up to you to get the required jars & libs–I’m using Hadoop 0.18.3 and Cascading 1.0.11. The build.xml has the usual config stuff for setting up where you keep your jars. I gave up trying to be smart about managing jars so I now dump everything into a single dir. Sure makes things like Ant config easier….
Incidentally, this is a perfect example of standing on the shoulders of genius and/or the sufficiently motivated. Most of the build.xml I picked up from Kyle Burton’s “Creating Executable Jars For Your Clojure Application“. (He has some other interesting Clojure posts talking about Incanter & Quartz scheduler btw.)
Clojure is concise. This example isn’t particularly idiomatic Clojure however; you wouldn’t use CamelCase for instance. But it does the job. (Also, gross, definitely need a syntax highlighter now.)
Items to note:
1. Having to use into-array, int-array, etc. Just a few Java-isms.
2. Roughly, Hadoop needs a Jar with a Main. Clojure uses :gen-class to handle that sorta thing. :gen-class at first seemed really complicated. Turns out it is and it isn’t. Keep it simple and no problem! For my purposes I just needed to make sure I had a -main definition.
3. You’ll see logparser.app everywhere. It’s the tie the binds this all together. Most of this exercise was really about getting a common namespace set up for everything: the Clojure, the build.xml, the Jar contents, the runtime environment.
From the build.xml the key tasks are the compile and the jar-with-manifest. They demonstrate what needs to happen to make Clojure compilation possible and to make a Hadoop-happy Jar.
If you’re following along with Cascading’s Gentle Introduction you can use your newly generated Jar in place of the one mentioned:
I’ve only run this in the local Hadoop mode, no distribution, no cluster. Running it on a cluster will be for another post perhaps.
I hope this helps. Let me know otherwise. I’ll try to help but I’m far from an expert on any of this.