Update: This has been merged into Hpricot core! mri commit, jruby commit
Ever ran into Hpricot::ParseError, “ran out of buffer space on element”? Isn’t that annoying? You, being punished for someone who used ASP.NET (and its beautifully long __VIEWSTATE elements) to create their website. Well, I was at least.
I opened up Hpricot’s C extensions (where the error was coming from) and hacked around a little. I realized that it only malloc()s once. After if it fills up that memory, it throws an error. So I created a patch so that Hpricot will use realloc() keep allocating more memory as needed when it runs into an element larger than its default buffersize (16384). And yes, I know you can set Hpricot.buffer_size yourself, but what size should it be set to? If you are dealing with other peoples’ websites you have no idea what kind of crazy elements you might run into. Hpricot should be able to handle an element of any size. It was a little confusing because Hpricot uses the Ragel parsing library to generate the C code. So I had to make sure some Ragel’s pointers stayed pointing at the correct memory after it was realloc()ed.
Here’s the C patch, apply to Hpricot 0.6 source: http://pastie.caboo.se/97453
Here’s the JRuby patch: http://pastie.caboo.se/169546
Here’s the gem with the patch applied: hpricot-0.6_bufoverflowfix.gem.
Here’s the JRuby gem: hpricot-0.6_bufoverflowfix-jruby.gem
Update: The patch has been added as a ticket on TRAC
Update: Since it doesn’t look like anyone cares about applying this patch to the trunk I’ve bundled it up into a new gem you can get here: hpricot-0.6_bufoverflowfix.gem.
Update: Added a patch to fix the issue for the jruby version of Hpricot as well. Also added a jruby version of the patched gem here.

Thank you for this! I’m grateful to not have to patch this up myself. :)
Comment by Chris Heald — March 21, 2008 @ 9:49 pm
thank you much! that patch is a saver
Comment by Matt Simpson — October 24, 2008 @ 9:03 pm
[...] up: Hpricot As documented by this post, over a year ago I patched the Hpricot C and Java extensions to support arbitrarily large HTML [...]
Pingback by Contributing to open source can be hard and frustrating « coderrr — December 4, 2008 @ 7:48 pm
after installing the gem I get the following error:
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `gem_original_require’: no such file to load — hpricot_scan (MissingSourceFile)
I’m working with hpricot-0.6 (installed it via sudo gem install hpricot –include-dependencies)
any thoughts? thanks!
Comment by david — January 13, 2009 @ 8:41 pm
This patch was merged into hpricot core so you should probably get the latest from why: http://github.com/why/hpricot
Comment by coderrr — January 13, 2009 @ 9:02 pm