• Login or register

discuss.effbot.org

  • Popular
  • Recent
  • Simple Top-Down Parsing in Python (effbot.org)

    In the early seventies, Vaughan Pratt published an elegant improvement to recursive-descent in his paper Top-down Operator Precedence. Pratt's algorithm associates semantics with tokens instead of grammar rules, and uses a simple 'binding power' mechanism to handle precedence levels. /.../ In this article, I'll briefly explain how the algorithm works, discuss different ways to implement interpreters and translators with it in Python, and finally use it to implement a parser for Python's expression syntax.

    4 points by effbot 1 year ago
    • 16 comments
  • 1 point by Marcin Ciura 5 months ago 0 children

    It seems that a closing paren is indistinguishable from "(end)": test("1)") yields (literal 1). Similarly for brackets, braces, and commas. I'm not sure how to improve error detection in the parser.
    BTW, using ideas from this article, I've managed to write a regexp parser with postfix operators that wrap their arguments instead of modifying them.

    • link
    • reply
  • 1 point by ThomasH 1 year ago 1 child

    Excellent article - thank you very much! That was "the last straw" that got me going ...
    One thing that struck me: Both you and Crockford avoid introducing "," (comma) as an infix operator, to parse all kinds of lists as expressions, which would be very natural to me. Is there a general problem with doing so, or is it just a deliberate design decision?

    • link
    • reply
    • 1 point by ThomasH 1 year ago 0 children

      I probably found the answer myself, it's overgeneration, right?! You lose control over substructures, like the ":" pairs in dicts, or the initializer part in C-style for-loops, if you allow to recurse in generic expression parsing.

      • link
      • reply
  • 1 point by Ralph Corderoy 1 year ago 0 children

    Interesting article. BTW, the line `raise SyntaxError("unknown operator)' is lacking a closing double-quote.

    • link
    • reply
  • 1 point by thenendo 1 year ago 3 children

    Very nice article. However, I'm having trouble implementing postfix operators... is it possible with this approach?

    • link
    • reply
    • 1 point by thenendo 1 year ago 2 children

      More specifically, postfix operators with high precedence. For instance, say I want to write a parser of regular expressions. "ab*c" should be equivalent to "a(b*)c", not "(ab)*c"... the "b" somehow has to telepathically know that a "*" is coming up, and to combine with that, not with the "a" before it.

      • link
      • reply
      • 2 points by effbot 1 year ago 1 child

        I stumbled upon that myself recently, but in that case, it was sufficient to treat the postfix operators as "modifiers" -- just let the operator's "led" method modify the left node, and then return that node instead of itself. More extensive tree modifications are also possible, but is more work.

        • link
        • reply
        • 1 point by thenendo 1 year ago 0 children

          Excellent suggestion, thanks. In my case the solution was to have the postfix "led" make a copy of its "left" argument, then modify the original to have a new "id", signifying the postfix operation, with the copy as its "first" argument. Works beautifully.

          • link
          • reply
  • 1 point by ecirhana 1 year ago 2 children

    Hello,
    any chance we will see c?ElementTree for parsing?

    • link
    • reply
    • 1 point by effbot 1 year ago 1 child

      As in an easy-to-use library for top-down parsing, or something else?

      • link
      • reply
      • 1 point by ecirhana 1 year ago 0 children

        As in easy-to-use (and fast) library for top-down parsing, YES!

        • link
        • reply
  • 2 points by kib2 1 year ago 1 child

    Some errors :

    At the start of the article, inside tokenize function, you have to read :

    if number:
    yield integer_literal_token(int(number))

    and inside class end_token:

    lbp = 0 # and not ldp = 0

    • link
    • reply
    • 1 point by effbot 1 year ago 0 children

      Should be fixed now (I hope). Thanks! /F

      • link
      • reply
  • 1 point by slyfox 1 year ago 0 children

    Now I feel like inventing my own language ;)
    btw, thanks for the PDF!

    • link
    • reply
  • 2 points by kib2 1 year ago 0 children

    I've made a pdf from your article : http://kib2.free.fr/pdfs/SimpleTopDownParsing.pdf

    • link
    • reply
  • 2 points by kib2 1 year ago 0 children

    Thanks Fredrick, very nice article.

    • link
    • reply
  • Widget
  • Recent Comments
  • Leaders
Powered by