public
Description: Mechanize is a ruby library that makes automated web interaction easy.
Home | Edit | New

Home

Welcome to the mechanize wiki!

Using Nokogiri instead of Hpricot

Mechanize will be changing its HTML parser from Hpricot to Nokogiri. You can check out how this will affect your current code now by setting the HTML parser on Mechanize. What will this parser buy you? A few important features:

Speed

Nokogiri uses libxml to parse HTML and is much faster than Hpricot.

Better HTML recovery

Nokogiri handles broken HTML better than Hpricot.

XPath and CSS searching

Nokogiri correctly implements XPath queries and will even let you find by CSS selectors. You can use selectors straight from Firebug or even your CSS files to find elements in your HTML.

You can help by making the switch early and reporting any bugs you find.

How to make the switch early

First install nokogiri.

Then set the HTML parser to Nokogiri, and use Mechanize as normal:

require 'rubygems'
require 'nokogiri'
require 'mechanize'

WWW::Mechanize.html_parser = Nokogiri::HTML

agent = WWW::Mechanize.new
agent.get('http://google.com/').links.each do |link|
  p link
end

Last edited by tenderlove, Thu Oct 02 22:41:02 -0700 2008
Home | Edit | New
Versions: