Scraper
Writing a website scraper in Ruby for a freelance gig I picked up on Craigslist. Using Nokogiri to make an XML tree out of the documents, then grepping for the stuff I need within the document.
Turns out if the designers of the page used a very structured format, it’s much easier to grab the text I need. This should have been obvious.
So my workflow:
-
Do the task by hand and note all the URLs involved.
-
Change URL parameters and see if you can get the web app to respond appropriately.
-
Build your script to automatically generate the parameters.
-
Build the modules that can parse the results.
-
Test to ensure you’re covering all cases with your parser.
-
Build the modules that’ll extract the data you need.
-
Test the extractor.
-
Manipulate the data so that it’s usable.