-
Simple Top-Down Parsing in Python (effbot.org)
In the early seventies, Vaughan Pratt published an elegant improvement to recursive-descent in his paper Top-down Operator Precedence. Pratt's algorithm associates semantics with tokens instead of grammar rules, and uses a simple 'binding power' mechanism to handle precedence levels. /.../ In this article, I'll briefly explain how the algorithm works, discuss different ways to implement interpreters and translators with it in Python, and finally use it to implement a parser for Python's expression syntax.
Interesting article. BTW, the line `raise SyntaxError("unknown operator)' is lacking a closing double-quote.
Some errors :
At the start of the article, inside tokenize function, you have to read :
if number:
yield integer_literal_token(int(number))
and inside class end_token:
lbp = 0 # and not ldp = 0
Should be fixed now (I hope). Thanks! /F
Very nice article. However, I'm having trouble implementing postfix operators... is it possible with this approach?
More specifically, postfix operators with high precedence. For instance, say I want to write a parser of regular expressions. "ab*c" should be equivalent to "a(b*)c", not "(ab)*c"... the "b" somehow has to telepathically know that a "*" is coming up, and to combine with that, not with the "a" before it.
I stumbled upon that myself recently, but in that case, it was sufficient to treat the postfix operators as "modifiers" -- just let the operator's "led" method modify the left node, and then return that node instead of itself. More extensive tree modifications are also possible, but is more work.
Excellent suggestion, thanks. In my case the solution was to have the postfix "led" make a copy of its "left" argument, then modify the original to have a new "id", signifying the postfix operation, with the copy as its "first" argument. Works beautifully.
I've made a pdf from your article : http://kib2.free.fr/pdfs/SimpleTopDownParsing.pdf
Thanks Fredrick, very nice article.
Hello,
any chance we will see c?ElementTree for parsing?
As in an easy-to-use library for top-down parsing, or something else?
As in easy-to-use (and fast) library for top-down parsing, YES!
Now I feel like inventing my own language ;)
btw, thanks for the PDF!