Pages
Archives
- September 2011
- April 2011
- February 2011
- January 2011
- November 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- February 2010
- January 2010
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- May 2006
- March 2006
- February 2006
- January 2006
- December 2005
- October 2005
- September 2005
- August 2005
- May 2004
- March 2002
- October 2001
Nerding out with Ruby, Tokyo Cabinet, Hpricot, Twitter, Sinatra, Haml & Passenger
[Update: The service described in this article has been relocated to http://gavinmcgovern.com]
I wanted to get back into Ruby and since it’s been many years I went for total immersion. Only one rule: don’t use anything I used in the past. So easy; everything is pretty much new to me now.
The project: grab the names of hot & happening bands and see which of them are getting any buzz on Twitter. What bands are getting talked about and where? Nothing fancy.
Hpricot
I use Hpricot to seed my list of bands. There are a tons of sites out there that offer lists of new bands, releases, etc. Hpricot provides a nice CSS selector search method, kinda JQuery-ish. One reason I went this route rather than with something like RSS/Atom is because I just want names. Extracting a band name from a feed’s body content is a whole other research project.
Twitter Search API
Once I have the band list I search Twitter for each one. For every matching Tweet I’ll do another little Hpricot scrape for the Tweeter’s location (“ul.entry-author>li>span.adr”, Hpricot makes it so easy.) I’m sure there’s some way of doing this using the Twitter API but I figured it’s public stuff anyway, no need to authenticate for it. Plus more Hpricot practice!
Tokyo Cabinet
All the search results end up in a Tokyo Cabinet database. Very simply, TC is a really fast key-value store. For my purposes I went with the schema-less table-based option, just rows of maps. I’ve been wanting to spend time outside the SQL world so Tokyo Cabinet is perfect for me. Plus building & installing Tokyo Cabinet on my Mac was painless. (Ubuntu was a tiny bit more complicated: ldconfig /usr/local/lib did the trick.)
Sinatra, Haml & Passenger
I keep a lot of data but I only do daily reports. Thankfully Tokyo Cabinet has a query method that lets me do simple filtering. Once I have that sliced up I just use standard Ruby methods to collect, count & sort the results. I have a very simple Sinatra app running under Phusion Passenger & Apache. It handles presenting the report and uses Haml & Sass for the templating.
Next Steps
As you can see there’s something a bit off (beside the old data.) Band names that are also common names or phrases have a great deal more mentions than the truly unique band names. Gomez and 50 mentions vs Dolby Anol and 8 mentions. But there’s good stuff in there! Top 3 locations for Gomez is Chile, United Kingdom and Jalisco. Makes sense: Gomez is a common Spanish name and a British band.
There are various approaches I can take with the search results to handle false positives. Thankfully the Ruby world is full of possibilities. More on this later.
One more addition. The report is just a snapshot in time. I’d like to add some history so I can get a better idea of activity. Is the band buzzing up? Buzzing down? What sorts of trends can we see?
I’m excited. My last project with Ruby was in 2003. The Ruby world of today is almost unrecognizable, in fact, the only thing still around that I remember is Rails!