What would be the most efficient way of parsing a css selector input string, that features any bination of:
[key=value]
: attributes, 0 to * instances#id
: ids, 0 to 1 instances.class
: classes, 0 to * instancestagName
: tag names, 0 to 1 instances (found at start of string only)
(note: '*
', or other applicable binator could be used in lieu of tag?)
Such as:
div.someClass#id[key=value][key2=value2].anotherClass
Into the following output:
['div
','.someClass
','#id
','[key=value]
','[key2=value2]
','.anotherClass
']
Or for bonus points, into this form efficiently (read: a way not just based on using str[0] === '#'
for example):
{
tags : ['div'],
classes : ['someClass','anotherClass'],
ids : ['id'],
attrs :
{
key : value,
key2 : value2
}
}
(note removal of # . [ = ]
)
I imagine some bination of regex and .match(..)
is the way to go, but my regex knowledge is nowhere near advanced enough for this situation.
Many thanks for your help.
What would be the most efficient way of parsing a css selector input string, that features any bination of:
[key=value]
: attributes, 0 to * instances#id
: ids, 0 to 1 instances.class
: classes, 0 to * instancestagName
: tag names, 0 to 1 instances (found at start of string only)
(note: '*
', or other applicable binator could be used in lieu of tag?)
Such as:
div.someClass#id[key=value][key2=value2].anotherClass
Into the following output:
['div
','.someClass
','#id
','[key=value]
','[key2=value2]
','.anotherClass
']
Or for bonus points, into this form efficiently (read: a way not just based on using str[0] === '#'
for example):
{
tags : ['div'],
classes : ['someClass','anotherClass'],
ids : ['id'],
attrs :
{
key : value,
key2 : value2
}
}
(note removal of # . [ = ]
)
I imagine some bination of regex and .match(..)
is the way to go, but my regex knowledge is nowhere near advanced enough for this situation.
Many thanks for your help.
Share Improve this question asked Jul 26, 2013 at 18:03 DariusDarius 5,2695 gold badges50 silver badges63 bronze badges 16- 3 regex is rarely the right solution for plex languages parsing. You should have a look at the many libraries doing this (like sizzle) – Denys Séguret Commented Jul 26, 2013 at 18:04
- I know sizzle does it, but I'm looking to implement my own simple solution. The domain is not as plex as a language, there is no whitespace etc, and a limited format for delimiters (as listed in the question) – Darius Commented Jul 26, 2013 at 18:05
- I was suggering to look at the source, not using it. If you want to parse css selectors, you should take whitespaces into account. – Denys Séguret Commented Jul 26, 2013 at 18:06
- OK I will consult the source, but I'm talking about tokens already split by whitespace. This question is about the next step after splitting the tokens delimited by whitespace – Darius Commented Jul 26, 2013 at 18:07
-
@dystroy I think this is about parsing the selector "sub-syntax" for a single element match; I'm not sure what that's called. Also SCRIPTONITE note that it's not just splitting on whitespace - whitespace is an operator in the CSS selector syntax, parable to the
+
and~
connectors. – Pointy Commented Jul 26, 2013 at 18:07
1 Answer
Reset to default 11You might do the splitting using
var tokens = subselector.split(/(?=\.)|(?=#)|(?=\[)/)
which changes
div.someClass#id[key=value][key2=value2].anotherClass
to
["div", ".someClass", "#id", "[key=value]", "[key2=value2]", ".anotherClass"]
and after that you simply have to look how starts each token (and, in case of tokens starting with [
, checking if they contain a =
).
Here's the whole working code building exactly the object you describe :
function parse(subselector) {
var obj = {tags:[], classes:[], ids:[], attrs:[]};
subselector.split(/(?=\.)|(?=#)|(?=\[)/).forEach(function(token){
switch (token[0]) {
case '#':
obj.ids.push(token.slice(1));
break;
case '.':
obj.classes.push(token.slice(1));
break;
case '[':
obj.attrs.push(token.slice(1,-1).split('='));
break;
default :
obj.tags.push(token);
break;
}
});
return obj;
}
demonstration