• Login or register

discuss.effbot.org

  • Popular
  • Recent
  • Simple Top-Down Parsing in Python (effbot.org)

    In the early seventies, Vaughan Pratt published an elegant improvement to recursive-descent in his paper Top-down Operator Precedence. Pratt's algorithm associates semantics with tokens instead of grammar rules, and uses a simple 'binding power' mechanism to handle precedence levels. /.../ In this article, I'll briefly explain how the algorithm works, discuss different ways to implement interpreters and translators with it in Python, and finally use it to implement a parser for Python's expression syntax.

    3 points by effbot 4 months ago
    • 13 comments
  • 1 point by Ralph Corderoy 2 months ago 0 children

    Interesting article. BTW, the line `raise SyntaxError("unknown operator)' is lacking a closing double-quote.

    • link
    • reply
  • 2 points by kib2 4 months ago 1 child

    Some errors :

    At the start of the article, inside tokenize function, you have to read :

    if number:
    yield integer_literal_token(int(number))

    and inside class end_token:

    lbp = 0 # and not ldp = 0

    • link
    • reply
    • 1 point by effbot 4 months ago 0 children

      Should be fixed now (I hope). Thanks! /F

      • link
      • reply
  • 1 point by thenendo 3 months ago 3 children

    Very nice article. However, I'm having trouble implementing postfix operators... is it possible with this approach?

    • link
    • reply
    • 1 point by thenendo 3 months ago 2 children

      More specifically, postfix operators with high precedence. For instance, say I want to write a parser of regular expressions. "ab*c" should be equivalent to "a(b*)c", not "(ab)*c"... the "b" somehow has to telepathically know that a "*" is coming up, and to combine with that, not with the "a" before it.

      • link
      • reply
      • 2 points by effbot 3 months ago 1 child

        I stumbled upon that myself recently, but in that case, it was sufficient to treat the postfix operators as "modifiers" -- just let the operator's "led" method modify the left node, and then return that node instead of itself. More extensive tree modifications are also possible, but is more work.

        • link
        • reply
        • 1 point by thenendo 3 months ago 0 children

          Excellent suggestion, thanks. In my case the solution was to have the postfix "led" make a copy of its "left" argument, then modify the original to have a new "id", signifying the postfix operation, with the copy as its "first" argument. Works beautifully.

          • link
          • reply
  • 2 points by kib2 4 months ago 0 children

    I've made a pdf from your article : http://kib2.free.fr/pdfs/SimpleTopDownParsing.pdf

    • link
    • reply
  • 2 points by kib2 4 months ago 0 children

    Thanks Fredrick, very nice article.

    • link
    • reply
  • 1 point by ecirhana 3 months ago 2 children

    Hello,
    any chance we will see c?ElementTree for parsing?

    • link
    • reply
    • 1 point by effbot 3 months ago 1 child

      As in an easy-to-use library for top-down parsing, or something else?

      • link
      • reply
      • 1 point by ecirhana 3 months ago 0 children

        As in easy-to-use (and fast) library for top-down parsing, YES!

        • link
        • reply
  • 1 point by slyfox 4 months ago 0 children

    Now I feel like inventing my own language ;)
    btw, thanks for the PDF!

    • link
    • reply
  • Widget
  • Recent Comments
  • Leaders
Powered by slinkset.com