Alfa Jango Blog Engineering, Software, and Entrepreneurship

Posts Tagged ‘Nokogiri’

Printable Format for Any Webpage
(and the “Meat” Algorithm)

Tuesday, March 23rd, 2010

Last week, we added functionality to one of our web apps to show just the main content of any web-page, without all the other stuff. You may think of this as creating a printable view of any web-page, with all images, videos, ads, etc. removed. Here is an example of an original webpage vs. the printable view we create:

Feel free to skip straight to our “Meat” Algorithm, as we’ve so endearingly named it, if you’re not interested in the specifics of implementing it.

The Tools: Ruby and Nokogiri

Thanks to Ruby and a Ruby gem, called Nokogiri, it’s far easier to create this printable view than you may think. If you haven’t heard of it before, Nokogiri is a gem that reads and parses HTML, XML, and SAX, and allows you to easily search and manipulate these documents based on CSS selectors and XPATH.

(more…)



Entries (RSS) and Comments (RSS)