Every repository with this icon (
Every repository with this icon (
Home
About
Neotoma is a packrat parser-generator for Erlang for Parsing Expression Grammars (PEGs). It consists of a parsing-combinator library with memoization routines, a parser for PEGs, and a utility to generate parsers from PEGs. It is inspired by treetop, a Ruby library with similar aims, and parsec, the parser-combinator library for Haskell.
Getting started
- Clone the repository:
$ git clone git://github.com/seancribbs/neotoma.git
- Build the library:
$ cd neotoma
$ make - Start the Erlang shell and generate your parser:
$ erl -pa ebin
1> neotoma:file(“mygrammar.peg”).
ok
Writing a Grammar
Neotoma’s PEG grammars are based on the grammars from Bryan Ford’s thesis with some influences from Treetop. The basic format is thus:
nonterminal <- parsing_expression;
Where parsing_expression is any combination of nonterminals, terminals and sub-expressions (e, e1, e2 are parsing expressions) as described below:
| Non-terminal symbol | some_nonterminal |
All nonterminals on the RHS must have a corresponding rule/reduction. |
| String | "Hello, world" |
single- or double-quoted, quotes escaped with \\ |
| Character class | [a-zA-Z0-9] |
just as in the re module |
| Any single character | . |
|
| Sequence | e1 e2 |
|
| Ordered choice | e1 / e2 |
|
| Grouping | (e) |
|
| Zero-width positive lookahead | &e |
|
| Zero-width negative lookahead | !e |
|
| Optional (zero-or-more) repetition | e* |
|
| Mandatory (one-or-more) repetition | e+ |
|
| Optional expression | e? |
|
| Label | name:e |
Helps in extracting sub-expressions, creates {name, SubTree} tuples in the AST. |
Currently all reductions must end with a semi-colon ;. The first rule/reduction in your grammar will be considered the root of the parse-tree.
Working with the AST
Without specifying any transformations, Neotoma will return a nested list of the results of its parse — essentially an S-expression. In this form, the AST is not very useful; one needs to transform and annotate the tree into a useful data structure. Neotoma provides hooks into the parsing process in the form of the transform/3 function (or the inline code blocks). Once you have generated your parser, you can edit this function in the generated file. The prototype is thus:
transform('nonterminal', Node, Index)
nonterminalis the nonterminal that was successfully parsed.Nodeis a list of the results from sub-expressions, which may be raw terminals or the transformations of other nonterminals.Indexis a tuple representing the position of the parser at the start of this expression, in the form{{line, L},{column,C}}whereLandCare both integers.
Using transform modules
While editing this within the generated parser is easy, Neotoma will overwrite your changes if you regenerate the parser. Therefore, I recommend that you specify an external module in which to do your transformations (or use inline blocks, as described below). Doing so will allow you to develop your grammar and transformations independently, without the parser-generator overwriting your transformations. You can do this by specifying the transform_module option to peg_gen:file/2. The module will be generated for you if it does not exist already. An example:
1> neotoma:file("mygrammar.peg", [{transform_module, myast}]).
Inline AST transformations
As of 1.3 and later, Neotoma allows code inline with your grammar for AST transformation and additional support functions. Reductions may be optionally followed by a code block that is enclosed in backticks (`), or a single tilde (~). The code block will become the body of the transformation function. The ~ will create an identity transformation, equivalent to `Node`. Example from the JSON parser:
number <- int frac? exp? ` case Node of [Int, [], []] -> list_to_integer(lists:flatten([Int])); [Int, Frac, []] -> list_to_float(lists:flatten([Int, Frac])); [Int, [], Exp] -> list_to_float(lists:flatten([Int, ".0", Exp)); _ -> list_to_float(lists:flatten(Node)) end `;
The Node and Idx variables are available to your code block.
To add additional support functions, just put another backtick-delimited block at the bottom of the grammar. All code will be added verbatim to the generated parser.
Future features
- Support for parsing in binary form/UTF.
- Support for LFE and Reia.







