I'm wondering if it's possible to use a CFG or PEG grammar as a basis for code pletion directly without modification. I have heard that code pletion is in IDE's is sometimes manipulated and massaged or even hard coded so that it performs better.
I want to code plete on a small DSL so I fully understand that a grammar cannot help a code pletion system with knowledge of library functions etc.
As far as I'm aware the parser itself needs to at least provide a system for querying what it expects next.
In particular I'm interested in a javascript code pletion solution using peg.js or jison
I'm wondering if it's possible to use a CFG or PEG grammar as a basis for code pletion directly without modification. I have heard that code pletion is in IDE's is sometimes manipulated and massaged or even hard coded so that it performs better.
I want to code plete on a small DSL so I fully understand that a grammar cannot help a code pletion system with knowledge of library functions etc.
As far as I'm aware the parser itself needs to at least provide a system for querying what it expects next.
In particular I'm interested in a javascript code pletion solution using peg.js or jison
Share Improve this question edited Mar 22, 2013 at 18:44 Wooble 90k12 gold badges110 silver badges132 bronze badges asked Aug 16, 2011 at 12:20 BefittingTheoremBefittingTheorem 10.6k15 gold badges71 silver badges97 bronze badges 1- 1 It is possible indeed to use PEG (with Packrat) for code pletion - I did that with both Emacs and Visual Studio. The trick is to store a list of failed "tokens" tried by a parser a the rightmost failure position - they can be used then for a pletion. If an identifier is expected, parser can also give a hint. – SK-logic Commented Aug 16, 2011 at 14:21
2 Answers
Reset to default 14It is fairly easy to build Javascript editor with code pletion from PEG grammar. I would describe how to do it with PEG.js
. You need to extend your grammar with some lax parsing rules that allow to provide suggestions when previous statements are broken. These lax rules need to be handled conditionally or you will need two separate grammars - one for parsing source and second for code pletion. You can maintain one grammar by using Javascript predicates (available in PEG.js
). It looks like &{return laxParsing}
and it causes that whole rule to be processed when laxParsing
flag is true
. You can switch between lax and strict parsing easily by setting parser's internal flag.
To provide suggestions to user easily you must modify slightly generated PEG.js
parser (version 0.5) to receive in the parsing error structure position (beside the column and line) and list of expectations (beside the error message). You can copy prepared fragment from https://gist.github./1281239.
When you have parser then you can attach it in editor on for example CTRL+SPACE keypress. When these are pressed in text source you need to put a special unparseable sign in place of cursor (to cause a parsing error) and launch parser in lax mode. Then you receive an error with list of suggestions.
Some of suggestions are not only syntax but also they define references (e.g. entities, variables). You can trigger searching these when a particular expectation is found (e.g. VariableName
). You can provide the list by parsing the same source in a different lax parsing mode (filtering only variable names).
For a working example and source to this approach you can check on https://github./mstefaniuk/Concrete-Freetext.
PEG.js gives you quite a bit of context when it generates a SyntaxError. For example, if you have a grammar for SQL and feed it something like:
FOO > 10 A
Then PEG.js will return this:
{
"message": "Expected \"AND\", \"ORDER BY\" or end of input but \"A\" found.",
"expected": [
{
"type": "literal",
"value": "AND",
"description": "\"AND\""
},
{
"type": "literal",
"value": "ORDER BY",
"description": "\"ORDER BY\""
},
{
"type": "end",
"description": "end of input"
}
],
"found": "A",
"offset": 9,
"line": 1,
"column": 10,
"name": "SyntaxError"
}
What it's saying is that it parsed characters 0–9 of the string ("FOO > 10 ") but then encountered an unexpected token at character 10. And it gives you a list of the next tokens it was expecting: FOO > 10 AND
, FOO > 10 ORDER BY
, FOO > 10
. If you tack these onto the valid portion of the query, you'll get a good set of possible pletions:
function getCompletions(pegParse, text) {
var parsedText = pegParse(text);
var pletions = [];
if (parsedText.expected) {
var start = text.substr(0, parsedText.offset);
parsedText.expected.forEach(function(expected) {
if (expected.type != 'literal') return;
var pletion = start + expected.value;
if (pletion.substr(0, text.length) == text) {
pletions.push(pletion);
}
});
}
return pletions;
}
This is quite simplistic -- a real autoplete would match more than just literals and would need some way to take advantage of context not available to the grammar, e.g. the list of arguments to a function that the user is calling.