最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex to find tag id and content JavaScript - Stack Overflow

programmeradmin4浏览0评论

Hey I'm trying to do something quite specific with regex in javascript and my regexp-foo is shakey at best. Wondered if there were any pros out there who could point me in the right direction. So I have some text...

<item id="myid1">myitem1</item>
<item id="myid2">myitem2</item>

...etc

And I would like to strip it out into an array that reads myid1, myitem1, myid2, myitem2, ....etc

There will never be nested elements so there is no recursive nesting problem. Anyone able to bash this out quickly? Thanks for your help!

Hey I'm trying to do something quite specific with regex in javascript and my regexp-foo is shakey at best. Wondered if there were any pros out there who could point me in the right direction. So I have some text...

<item id="myid1">myitem1</item>
<item id="myid2">myitem2</item>

...etc

And I would like to strip it out into an array that reads myid1, myitem1, myid2, myitem2, ....etc

There will never be nested elements so there is no recursive nesting problem. Anyone able to bash this out quickly? Thanks for your help!

Share Improve this question edited Jul 17, 2010 at 10:20 Darin Dimitrov 1.0m275 gold badges3.3k silver badges2.9k bronze badges asked Jul 17, 2010 at 10:18 ThomasThomas 1,0232 gold badges8 silver badges15 bronze badges 1
  • Can you write a better explanation of the structure of myitem1 myitem2 ...etc, or is it a simple space delimited list of strings? – kzh Commented Jul 17, 2010 at 10:21
Add a comment  | 

4 Answers 4

Reset to default 11

Here's a regex that will:

  • Match the starting and ending tag element names
  • Extract the value of the id attribute
  • Extract the inner html contents of the tag

Note: I am being lazy in matching the attribute value here. It needs to be enclosed in double quotes, and there needs to be no spaces between the attribute name and its value.

<([^\s]+).*?id="([^"]*?)".*?>(.+?)</\1>

Running the regex in javascript would be done like so:

search = '<item id="item1">firstItem</item><item id="item2">secondItem</item>';
regex = new RegExp(/<([^\s]+).*?id="([^"]*?)".*?>(.+?)<\/\1>/gi);
matches = search.match(regex);
results = {};
for (i in matches) {
    parts = regex.exec(matches[i]);
    results[parts[2]] = parts[3];
}

At the end of this, results would be an object that looks like:

{
    "item1": "firstItem",
    "item2": "secondItem"
}

YMMV if the <item> elements contain nested HTML.

If someone really like or need to use Regex to get an HTML tag by id (like the in the question subject), he can use my code:

function GetTagByIdUsingRegex(tag,id,html) {
    return new RegExp("<" + tag + "[^>]*id[\\s]?=[\\s]?['\"]" + id + "['\"][\\s\\S]*?<\/" + tag + ">").exec(html);
}

I made also one to get element by class name:

function GetTagByClassUsingRegex(tag,cls,html) {
    return new RegExp("<" + tag + "[^>]*class[\\s]?=[\\s]?['\"]" + cls + "[^'\"]*['\"][\\s\\S]*?<\/" + tag + ">").exec(html);
}

I always use this site to build my regexes:

http://www.pagecolumn.com/tool/regtest.htm

This is the regex I came up with:

(<[^>]+>)([^<]+)(<[^>]+>)

And this is the result that the page gives me for JavaScript

Using RegExp object:

var str = "<item id="myid1">myitem1</item><item id="myid2">myitem2</item><ssdad<sdasda><>dfsf";
var re = new RegExp("(<[^>]+>)([^<]+)(<[^>]+>)", "g");
var myArray = str.match(re);

Using literal:

var myArray = str.match(/(<[^>]+>)([^<]+)(<[^>]+>)/g)

if ( myArray != null) {
    for ( i = 0; i < myArray.length; i++ ) { 
        var result = "myArray[" + i + "] = " + myArray[i];
    }
}

This is a xml string. A XML parser seems suited best for this kind of task in my opinion. Do the following:

var items = document.getElementsByTagName("item") ; //<> use the parent element if document is not
var dataArray = [ ] ;

for(var n = 0 ; n < items.length ; n++) {

     var id = items[n].id ;
     var text = items[n].childNodes[0] ;

         dataArray.push(id,text) ;

}

If your problem is that you cannot convert the xml string to an xml object, you will have to use a DOM parser beforehand:

var xmlString = "" ; //!! your xml string
var document = null ;

    if (window.ActiveXObject) { //!! for internet explorer

            document = new ActiveXObject("Microsoft.XMLDOM") ;
            document.async = "false" ;
            document.loadXML(xmlString) ;

    } else { //!! for everything else

        var parser = new DOMParser() ;
            document = parser.parseFromString(xmlString,"text/xml") ;

    }

Then use the above script.

发布评论

评论列表(0)

  1. 暂无评论