public
Description: A Ruby-based parsing DSL based on parsing expression grammars.
Home | Edit | New

Parsing UTF-8 input

It’s not difficult to parse UTF-8 input in Treetop.

Ruby 1.8 and older

If you’re running Ruby 1.8 or older, you can just require 'active_support' and pass input.mb_chars to the parser. String#mb_chars creates a multibyte-safe proxy for string methods that would normally choke on multibyte characters. It’s not free, of course. Expect your parser to be about 10% slower.

Ruby 1.9

If you have Ruby 1.9, you don’t have to do anything special. Strings in 1.9 are (mostly) encoding aware. If you do require active_support, String#mb_chars just returns self. Thus, requiring active_support is the easy way to run one version of your parser on multiple Ruby versions.

Last edited by jgarber, Mon Sep 07 03:32:08 -0700 2009
Home | Edit | New
Versions: