I'm using parsimmon
to parse a simple math expression and I'm failing to parse a simple math expression that follows order of operation (i.e. */
has higher precedence than +-
).
Even if you are not familiar with this library please help me solve precedence problem without left recursion and infinite recursion.
Thank you.
I used TypeScript:
"use strict";
// Run me with Node to see my output!
import * as P from "parsimmon";
import {Parser} from "parsimmon";
// @ts-ignore
import util from "util";
///////////////////////////////////////////////////////////////////////
// Use the JSON standard's definition of whitespace rather than Parsimmon's.
let whitespace = P.regexp(/\s*/m);
// JSON is pretty relaxed about whitespace, so let's make it easy to ignore
// after most text.
function token(parser: Parser<string>) {
return parser.skip(whitespace);
}
// Several parsers are just strings with optional whitespace.
function word(str: string) {
return P.string(str).thru(token);
}
let MathParser = P.createLanguage({
expr: r => P.alt(r.sExpr2, r.sExpr1, r.number),
sExpr1: r => P.seqMap(r.iExpr, P.optWhitespace, r.plusOrMinus, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
sExpr2: r => P.seqMap(r.iExpr, P.optWhitespace, r.multiplyOrDivide, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
iExpr: r => P.alt(r.iExpr, r.number), // Issue here! this causes infinite recursion
// iExpr: r => r.number // this will fix infinite recursion but yields invalid parse
number: () =>
token(P.regexp(/[0-9]+/))
.map(Number)
.desc("number"),
plus: () => word("+"),
minus: () => word("-"),
plusOrMinus: r => P.alt(r.plus, r.minus),
multiply: () => word("*"),
divide: () => word("/"),
multiplyOrDivide: r => P.alt(r.multiply, r.divide),
operator: r => P.alt(r.plusOrMinus, r.multiplyOrDivide)
});
///////////////////////////////////////////////////////////////////////
let text = "3 / 4 - 5 * 6 + 5";
let ast = MathParser.expr.tryParse(text);
console.log(util.inspect(ast, {showHidden: false, depth: null}));
This is my repo
I'm using parsimmon
to parse a simple math expression and I'm failing to parse a simple math expression that follows order of operation (i.e. */
has higher precedence than +-
).
Even if you are not familiar with this library please help me solve precedence problem without left recursion and infinite recursion.
Thank you.
I used TypeScript:
"use strict";
// Run me with Node to see my output!
import * as P from "parsimmon";
import {Parser} from "parsimmon";
// @ts-ignore
import util from "util";
///////////////////////////////////////////////////////////////////////
// Use the JSON standard's definition of whitespace rather than Parsimmon's.
let whitespace = P.regexp(/\s*/m);
// JSON is pretty relaxed about whitespace, so let's make it easy to ignore
// after most text.
function token(parser: Parser<string>) {
return parser.skip(whitespace);
}
// Several parsers are just strings with optional whitespace.
function word(str: string) {
return P.string(str).thru(token);
}
let MathParser = P.createLanguage({
expr: r => P.alt(r.sExpr2, r.sExpr1, r.number),
sExpr1: r => P.seqMap(r.iExpr, P.optWhitespace, r.plusOrMinus, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
sExpr2: r => P.seqMap(r.iExpr, P.optWhitespace, r.multiplyOrDivide, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
iExpr: r => P.alt(r.iExpr, r.number), // Issue here! this causes infinite recursion
// iExpr: r => r.number // this will fix infinite recursion but yields invalid parse
number: () =>
token(P.regexp(/[0-9]+/))
.map(Number)
.desc("number"),
plus: () => word("+"),
minus: () => word("-"),
plusOrMinus: r => P.alt(r.plus, r.minus),
multiply: () => word("*"),
divide: () => word("/"),
multiplyOrDivide: r => P.alt(r.multiply, r.divide),
operator: r => P.alt(r.plusOrMinus, r.multiplyOrDivide)
});
///////////////////////////////////////////////////////////////////////
let text = "3 / 4 - 5 * 6 + 5";
let ast = MathParser.expr.tryParse(text);
console.log(util.inspect(ast, {showHidden: false, depth: null}));
This is my repo
Share Improve this question edited Dec 28, 2019 at 7:11 Node.JS asked Dec 28, 2019 at 7:05 Node.JSNode.JS 1,6347 gold badges55 silver badges131 bronze badges 6-
One thing I forgot to mention is in
P.alt
order matters. So that's why I usedr.sExpr2
and thenr.sExpr1
– Node.JS Commented Dec 28, 2019 at 7:08 -
Sorry I had to fix a typo. My fix was
iExpr: r => r.number
which yields invalid parse tree – Node.JS Commented Dec 28, 2019 at 7:12 -
You have multiple circular references,
expr
is parsingsExpr1
that tries to mapexpr
, also happens withsExpr2
, andiExpr
is triying to parse itself... Could you please explain what is are results you want to aplish? – ZeroWorks Commented Dec 30, 2019 at 18:01 - I want to be able to parse simple Math expression: 1 + 2 / 3 + 4 * 5 – Node.JS Commented Dec 30, 2019 at 18:08
-
I meant what you expect to be the parsed result... why do you need
r.sExpr2
and thenr.sExpr1
, since you are parsing a Math expression you need number + [plus|minus|multiply|divide], ignoring spaces... could you please clarify that please? – ZeroWorks Commented Dec 30, 2019 at 18:13
2 Answers
Reset to default 7 +50Currently the grammar implemented by your parser looks like this (ignoring white space):
expr: sExpr2 | sExpr1 | number
sExpr1: iExpr plusOrMinus expr
sExpr2: iExpr multiplyOrDivide expr
// Version with infinite recursion:
iExpr: iExpr | number
// Version without infinite recursion:
iExpr: number
It's pretty easy to see that iExpr: iExpr
is a left-recursive production and the cause of your infinite recursion. But even if Parsimmon could handle left-recursion, there just wouldn't be any point in that production. If it didn't mess up the parser, it just wouldn't do anything at all. Just like an equation x = x
does not convey any information, a production x: x
doesn't make a grammar match anything it didn't match before. It's basically a no-op, but one that breaks parsers that can't handle left recursion. So removing it is definitely the right solution.
With that fixed, you now get wrong parse trees. Specifically you'll get parse trees as if all your operators were of the same precedence and right-associative. Why? Because the left-side of all of your operators is iExpr
and that can only match single numbers. So you'll always have leaves as the left child of an operator node and the tree always grows to the right.
An unambiguous grammar to correctly parse left-associative operators can be written like this:
expr: expr (plusOrMinus multExpr)?
multExpr: multExpr (multiplyOrDivide primaryExpr)?
primaryExpr: number | '(' expr ')'
(The | '(' expr ')'
part is only needed if you want to allow parentheses, of course)
This would lead to the correct parse trees because there's no way for a multiplication or division to have an unparenthesized addition or subtraction as a child and if there are multiple applications of operators of the same precedence ,such as 1 - 2 - 3
, the outer subtraction would contain the inner subtraction as its left child, correctly treating the operator as left-associative.
Now the problem is that this grammar is left recursive, so it's not going to work with Parsimmon. One's first thought might be to change the left recursion to right recursion like this:
expr: multExpr (plusOrMinus expr)?
multExpr: primaryExpr (multiplyOrDivide multExpr)?
primaryExpr: number | '(' expr ')'
But the problem with that is that now 1 - 2 - 3
wrongly associates to the right instead of the left. Instead, the mon solution is to remove the recursion altogether (except the one from primaryExpr
back to expr
of course) and replace it with repetition:
expr: multExpr (plusOrMinus multExpr)*
multExpr: primaryExpr (multiplyOrDivide primaryExpr)*
primaryExpr: number | '(' expr ')'
In Parsimmon you'd implement this using sepBy1
. So now instead of having a left operand, an operator and a right operand, you have a left operand and then arbitrarily many operator-operand pairs in an array. You can create a left-growing tree from that by simply iterating over the array in a for-loop.
If your want to learn how to deal with left recursion, you can start from https://en.wikipedia/wiki/Parsing_expression_grammar or more precisely https://en.wikipedia/wiki/Parsing_expression_grammar#Indirect_left_recursion
And then read more about PEG
online.
But basically a standard way is to use cycles:
Expr ← Sum
Sum ← Product (('+' / '-') Product)*
Product ← Value (('*' / '/') Value)*
Value ← [0-9]+ / '(' Expr ')'
Your can find examples of this grammar everywhere.
If your want to stick to a more pleasant left-recursive grammars, you can read about packrat
parser. And find another parser for yourself. Because, I'm no sure, but it looks like parsimmon
is not one of them.
If you just what a working code, than you can go to https://repl.it/repls/ObviousNavyblueFiletype
I've implemented the above grammar using parsimmon
API
Expr : r => r.AdditiveExpr,
AdditiveExpr: r => P.seqMap(
r.MultExpr, P.seq(r.plus.or(r.minus), r.MultExpr).many(),
left_association
),
MultExpr : r => P.seqMap(
r.UnaryExpr, P.seq(r.multiply.or(r.divide), r.UnaryExpr).many(),
left_association
),
UnaryExpr : r => P.seq(r.minus, r.UnaryExpr).or(r.PrimaryExpr),
PrimaryExpr : r => P.seqMap(
r.LPAREN, r.Expr, r.RPAREN, // without parens it won't work
(lp,ex,rp) => ex // to remove "(" and ")" from the resulting AST
).or(P.digits),
plus : () => P.string('+').thru(p => p.skip(P.optWhitespace)),
minus : () => P.string('-').thru(p => p.skip(P.optWhitespace)),
multiply : () => P.string('*').thru(p => p.skip(P.optWhitespace)),
divide : () => P.string('/').thru(p => p.skip(P.optWhitespace)),
We also need to use parentheses, or else why would you need recursion? Value ← [0-9]+
will suffice. Without parentheses, there is no need to reference Expr
inside grammar. And if we do, it would not consume any input, it would have no sense whatsoever, and it would hang in infinite recursion.
So let's also add:
LPAREN : () => P.string('(').thru(p => p.skip(P.optWhitespace)),
RPAREN : () => P.string(')').thru(p => p.skip(P.optWhitespace)),
I also added unary expressions, just for pleteness.
And now for the most interesting part - production functions. Without them the result for, let' say:
(3+4+6+(-7*5))
will look like this:
[[["(",[["3",[]],[["+",["4",[]]],["+",["6",[]]],["+",[["(",[[["-","7"],[["*","5"]]],[]],")"],[]]]]],")"],[]],[]]
With them, it'll be:
[[[["3","+","4"],"+","6"],"+",[["-","7"],"*","5"]]]
Much nicer.
So for left associative operators we will need this:
// input (expr1, [[op1, expr2],[op2, expr3],[op3, expr4]])
// -->
// output [[[expr1, op1, expr2], op2, expr3], op3, expr4]
const left_association = (ex, rest) => {
// console.log("input", ex, JSON.stringify(rest))
if( rest.length === 0 ) return ex
if( rest.length === 1 ) return [ex, rest[0][0], rest[0][1]]
let node = [ex]
rest.forEach(([op, ex]) => {
node[1] = op;
node[2] = ex;
node = [node]
})
// console.log("output", JSON.stringify(node))
return node
}