Python package Lark does not build the grammar correctly

I would need to build a Tree that would retrieve something like this using Lark package:

start
  expr
    or_expr
      and_expr
        comp_expr
          identifier    Name
          comparator    eq
          value 'Milk'
        comp_expr
          identifier    Price
          comparator    lt
          value 2.55

The grammar used is the following

from lark import Lark

odata_grammar = """
    start: expr

    expr: or_expr

    or_expr: and_expr ("or" and_expr)*
    and_expr: comp_expr ("and" comp_expr)*
    comp_expr: identifier comparator value -> comp_expr

    comparator: "eq" | "lt" | "gt" | "le" | "ge" | "ne"

    value: STRING | NUMBER
    identifier: CNAME

    STRING: /'(''|[^'])*'/
    DATE: /\d{4}-\d{2}-\d{2}/
    NUMBER: /-?\d+(\.\d+)?/

    %import common.CNAME
    %import common.WS
    %ignore WS
"""

parser = Lark(odata_grammar, start='start', parser='lalr')
url_filter = "Name eq 'Milk' and Price lt 2.55"
tree = parser.parse(url_filter)
print(tree.pretty())

When I print this tree, I find that the Tree retrieved is the following:

start
  expr
    or_expr
      and_expr
        comp_expr
          identifier    Name
          comparator
          value 'Milk'
        comp_expr
          identifier    Price
          comparator
          value 2.55

The comparator for some reason is not retrieved. And I say retrieved because the Lark package seems to detect it but it is not printed in the tree. This is curious because when I try to "force" the comparator to doing something like this in the grammar comparator: "eq" -> eq what I get is the comparator named as eq but not comparator: eq.

I would need to build a Tree that would retrieve something like this using Lark package:

start
  expr
    or_expr
      and_expr
        comp_expr
          identifier    Name
          comparator    eq
          value 'Milk'
        comp_expr
          identifier    Price
          comparator    lt
          value 2.55

The grammar used is the following

from lark import Lark

odata_grammar = """
    start: expr

    expr: or_expr

    or_expr: and_expr ("or" and_expr)*
    and_expr: comp_expr ("and" comp_expr)*
    comp_expr: identifier comparator value -> comp_expr

    comparator: "eq" | "lt" | "gt" | "le" | "ge" | "ne"

    value: STRING | NUMBER
    identifier: CNAME

    STRING: /'(''|[^'])*'/
    DATE: /\d{4}-\d{2}-\d{2}/
    NUMBER: /-?\d+(\.\d+)?/

    %import common.CNAME
    %import common.WS
    %ignore WS
"""

parser = Lark(odata_grammar, start='start', parser='lalr')
url_filter = "Name eq 'Milk' and Price lt 2.55"
tree = parser.parse(url_filter)
print(tree.pretty())

When I print this tree, I find that the Tree retrieved is the following:

start
  expr
    or_expr
      and_expr
        comp_expr
          identifier    Name
          comparator
          value 'Milk'
        comp_expr
          identifier    Price
          comparator
          value 2.55

Share Improve this question asked 11 hours ago francollado99 316 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

See Tree Construction section in Lark documentation: https://lark-parser.readthedocs.io/en/stable/tree_construction.html:

" Lark filters out certain types of terminals by default, considering them punctuation:

Terminals that won’t appear in the tree are:
Unnamed literals (like "keyword" or "+")
Terminals whose name starts with an underscore (like _DIGIT)

Terminals that will appear in the tree are:

Unnamed regular expressions (like /[0-9]/)
Named terminals whose name starts with a letter (like DIGIT) "

so... option one - transform the string literals of your comparator rule into regexps:

odata_grammar = """
    start: expr

    expr: or_expr

    or_expr: and_expr ("or" and_expr)*
    and_expr: comp_expr ("and" comp_expr)*
    comp_expr: identifier comparator value -> comp_expr

    comparator: /eq/ | /lt/ | /gt/ | /le/ | /ge/ | /ne/

    value: STRING | NUMBER
    identifier: CNAME

    STRING: /'(''|[^'])*'/
    DATE: /\d{4}-\d{2}-\d{2}/
    NUMBER: /-?\d+(\.\d+)?/

    %import common.CNAME
    %import common.WS
    %ignore WS

Option two: add rules for each comparator literal:

odata_grammar = """
    start: expr

    expr: or_expr

    or_expr: and_expr ("or" and_expr)*
    and_expr: comp_expr ("and" comp_expr)*
    comp_expr: identifier comparator value -> comp_expr

    comparator: eq | lt | gt | le | ge | ne
    eq: "eq"
    lt: "lt"
    gt: "gt"
    le: "le"
    ge: "ge"
    ne: "ne"
    value: STRING | NUMBER
    identifier: CNAME

    STRING: /'(''|[^'])*'/
    DATE: /\d{4}-\d{2}-\d{2}/
    NUMBER: /-?\d+(\.\d+)?/

    %import common.CNAME
    %import common.WS
    %ignore WS
"""

Both solutions will capture eq into the the parse tree.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Python package Lark does not build the grammar correctly - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)