最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Who is faster: PEG or GLR? - Stack Overflow

programmeradmin3浏览0评论

I'm trying to create some kind of lint tool for the C/AL programming language. So basically I need to perform syntax and lexical analysis against the source code. I've planned to write parser from the scratch, but then discovered that there are a lots of tools helping generate these parsers automatically.

I need performance, since checking 20 megabytes of code in one piece is the normal scenario, and I need that tool to be extendable by custom rules. So I decided to go with JavaScript.

As far I have found two generators I can use Jison and PEG.js.

Which of them give me more parsing performance? Maybe not comparing libraries, but algorithms?

Which one is better for my needs (parsing general purpose programming language)?

UPDATE: I have found similar Q&As:

  • Packrat parsing vs. LALR parsing
  • Performance of parsers: PEG vs LALR(1) or LL(k)

I'm trying to create some kind of lint tool for the C/AL programming language. So basically I need to perform syntax and lexical analysis against the source code. I've planned to write parser from the scratch, but then discovered that there are a lots of tools helping generate these parsers automatically.

I need performance, since checking 20 megabytes of code in one piece is the normal scenario, and I need that tool to be extendable by custom rules. So I decided to go with JavaScript.

As far I have found two generators I can use Jison and PEG.js.

Which of them give me more parsing performance? Maybe not comparing libraries, but algorithms?

Which one is better for my needs (parsing general purpose programming language)?

UPDATE: I have found similar Q&As:

  • Packrat parsing vs. LALR parsing
  • Performance of parsers: PEG vs LALR(1) or LL(k)
Share Improve this question edited May 23, 2017 at 11:46 CommunityBot 11 silver badge asked Aug 1, 2012 at 11:51 shytikovshytikov 9,5389 gold badges61 silver badges105 bronze badges
Add a comment  | 

3 Answers 3

Reset to default 10

"I need performance (for 20Mb) ... so I decided Javascript". These are contradictory.

Carefully coded recursive descent parsers can be pretty fast, but you have to write a lot of code. Generally, LALR(1) parsers (generated by Bison from a grammar etc.) are quite fast, and pretty easy to code. (There are technical papers showing how to compile LALR parsers directly to machine code; these parsers are blazingly fast but you need to implement a lot of custom machinery to build one).

If you want flat out high performance parsing with minimal sweat, you should consider LRStar. (I know and highly respect the author but otherwise have nothing to do this). This produces very fast LALR parsers. Downside: you have to make your grammar LALR compatible. You will have to extend your "rules" the same way you extend any other C program: by writing C code. That doesn't seem a lot worse than writing JavaScript IMHO, but the rules will likely execute a lot faster and at the scale you are contemplating this will matter.

GLR parsing is necessarily slower than LALR because it has more bookkeeping to do. But that's just a constant factor. It can be significant (e.g., 100x) over a high performance engine like LRStar. It may be worth the trouble, because it is much easier to pound your grammar into shape, and a less convoluted grammar will likely make writing custom rules easier to do. If you really have millions lines of code, these parsers will like be medium speed at best.

PEG basically is a backtracking. It has to try things, and backtrack when they don't work. They have to be slower than LALR parsers by at least the amount of backtracking they do. You also have the grammar shaping problem.

What you will discover, though, is that parsing simply isn't enough if you want to do anything the faintest bit sophisticated. In that case, you don't want to optimize on parsing; you want to optimize on infrastructure for program analysis. See my essay on Life After Parsing for another alternative.

In general you'd get very good parsing performance from a shift-reduce parser such as the one that Jison implements. It's a bit old-school perhaps, but it can work in very tight memory requirements and in linear time.

PEG produces a different kind of parser that is perhaps more capable, but would require more memory to produce the same result. I.e. PEG will require an amount of memory proportional to the input, while a LALR parser will do it in much less space (some tables and a small stack).

As far I have found two generators I can use Jison and PEG.js. Which of them give me more parsing performance?

According to the JavaScript Parser Libraries benchmark I have created It seems that PEG.js is faster (at least on Chrome/V8).

You can run it online here: http://sap.github.io/chevrotain/performance/

Note that this benchmark uses the very simple JSON grammar to compare the performance of the parsing libraries not a larger and more complex programing language grammar.

发布评论

评论列表(0)

  1. 暂无评论