Pages
Archives
- September 2011
- April 2011
- February 2011
- January 2011
- November 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- February 2010
- January 2010
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- May 2006
- March 2006
- February 2006
- January 2006
- December 2005
- October 2005
- September 2005
- August 2005
- May 2004
- March 2002
- October 2001
Exploring LingPipe with Clojure & Cljr
Clojure’s REPL is already a great start when it comes to exploring new Java packages. In the past I’d throw together a disposable lein project with the unknown code and open a REPL and get exploring. But now I have a new tool in the toolbox that lets me skip this step. Cljr is more focused on the workspace (for lack of a better name) rather than the lein style project. In short, you get both a REPL and swank (emacs’ clojure integration) backed by global package management. Cljr can pull in the random jars you have locally or it can pull them in from Clojars.
I’m still exploring various ideas I have for mining Twitter for artist & band related info. This time I’m looking at using LingPipe to do the heavy lifting. LingPipe is very comprehensive, very deep and comes with a steep learning curve. Where do I begin?
My process for learning a new package of code is to find the tutorial or demos, pick one and immediately start rewriting it. LingPipe has some great documentation with annotated code so the picking was the easy part. Here’s the Interesting Phrases tutorial converted to Clojure and using a sampling of Twitter search results:
(Keep in mind this isn’t truly executable code (mind the laziness.) It’s a dump of my REPL session mostly.)
The LingPipe code had a bunch more Java for displaying the results but since I’m in a REPL I’m ok using Clojure’s built-in pretty print:
Looking at the output, I can tell the Tweets I collected obviously had Broken Social Scene and Lupe Fiasco in the results, both bands, makes sense. But Betty White? There’s no escaping her. But according to LingPipe this is what’s interesting in this mess of tweets.
The experiment was a success! With a minimal time investment and some throwaway code I’ve decided I want to spend some more time with LingPipe and dig deeper. Quick & easy thanks to Cljr.