regex - Javascript efficient parsing of css selector

What would be the most efficient way of parsing a css selector input string, that features any bination of:

[key=value] : attributes, 0 to * instances
#id : ids, 0 to 1 instances
.class : classes, 0 to * instances
tagName : tag names, 0 to 1 instances (found at start of string only)

(note: '*', or other applicable binator could be used in lieu of tag?)

Such as:

div.someClass#id[key=value][key2=value2].anotherClass

Into the following output:

['div','.someClass','#id','[key=value]','[key2=value2]','.anotherClass']

Or for bonus points, into this form efficiently (read: a way not just based on using str[0] === '#' for example):

{
 tags : ['div'],
 classes : ['someClass','anotherClass'],
 ids : ['id'],
 attrs : 
   {
     key : value,
     key2 : value2
   }
}

(note removal of # . [ = ])

I imagine some bination of regex and .match(..) is the way to go, but my regex knowledge is nowhere near advanced enough for this situation.

Many thanks for your help.

What would be the most efficient way of parsing a css selector input string, that features any bination of:

[key=value] : attributes, 0 to * instances
#id : ids, 0 to 1 instances
.class : classes, 0 to * instances
tagName : tag names, 0 to 1 instances (found at start of string only)

(note: '*', or other applicable binator could be used in lieu of tag?)

Such as:

div.someClass#id[key=value][key2=value2].anotherClass

Into the following output:

['div','.someClass','#id','[key=value]','[key2=value2]','.anotherClass']

Or for bonus points, into this form efficiently (read: a way not just based on using str[0] === '#' for example):

{
 tags : ['div'],
 classes : ['someClass','anotherClass'],
 ids : ['id'],
 attrs : 
   {
     key : value,
     key2 : value2
   }
}

(note removal of # . [ = ])

I imagine some bination of regex and .match(..) is the way to go, but my regex knowledge is nowhere near advanced enough for this situation.

Many thanks for your help.

Share Improve this question asked Jul 26, 2013 at 18:03 Darius 5,2695 gold badges50 silver badges63 bronze badges

3 regex is rarely the right solution for plex languages parsing. You should have a look at the many libraries doing this (like sizzle) – Denys Séguret Commented Jul 26, 2013 at 18:04
I know sizzle does it, but I'm looking to implement my own simple solution. The domain is not as plex as a language, there is no whitespace etc, and a limited format for delimiters (as listed in the question) – Darius Commented Jul 26, 2013 at 18:05
I was suggering to look at the source, not using it. If you want to parse css selectors, you should take whitespaces into account. – Denys Séguret Commented Jul 26, 2013 at 18:06
OK I will consult the source, but I'm talking about tokens already split by whitespace. This question is about the next step after splitting the tokens delimited by whitespace – Darius Commented Jul 26, 2013 at 18:07
@dystroy I think this is about parsing the selector "sub-syntax" for a single element match; I'm not sure what that's called. Also SCRIPTONITE note that it's not just splitting on whitespace - whitespace is an operator in the CSS selector syntax, parable to the + and ~ connectors. – Pointy Commented Jul 26, 2013 at 18:07

| Show 11 more ments

1 Answer 1

Sorted by: Reset to default 11

You might do the splitting using

var tokens = subselector.split(/(?=\.)|(?=#)|(?=\[)/)

which changes

div.someClass#id[key=value][key2=value2].anotherClass

["div", ".someClass", "#id", "[key=value]", "[key2=value2]", ".anotherClass"]

and after that you simply have to look how starts each token (and, in case of tokens starting with [, checking if they contain a =).

Here's the whole working code building exactly the object you describe :

function parse(subselector) {
  var obj = {tags:[], classes:[], ids:[], attrs:[]};
  subselector.split(/(?=\.)|(?=#)|(?=\[)/).forEach(function(token){
    switch (token[0]) {
      case '#':
         obj.ids.push(token.slice(1));
        break;
      case '.':
         obj.classes.push(token.slice(1));
        break;
      case '[':
         obj.attrs.push(token.slice(1,-1).split('='));
        break;
      default :
         obj.tags.push(token);
        break;
    }
  });
  return obj;
}

demonstration

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

regex - Javascript efficient parsing of css selector - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)