I am using cheerio in nodejs to parse some rss feeds. I am grabbing all the items putting them into an array. I am using 3 test feeds, all of them have a "description" child element for each "item" element. In one of the feeds the whole "description" is wrapped as CDATA, and I cant get its value. Here is an abbreviated code snippet
//Open the xml document with cheerio
$ = cheerio.load(arrXmlDocs[i],{ ignoreWhitespace : true, xmlMode : true});
//Loop through every item
$('item').each(function(i, xmlItem){
//array to hold each item being converted into an array
var tempArray = [];
//Loop through each child of <item>
$(xmlItem).children().each(function(i, xmlItem){
//Get the name
tempArray[$(this)[0].name] = $(this).text();
}
}
As expected the two rss feeds that dont have CDATA give me an array like this
[
[
name: 'name of episode',
description:'description of episode',
pubdate: 'published date'
],
[
name: 'name of episode',
description:'description of episode',
pubdate: 'published date'
]
]
and the feed with the CDATA description looks like this
[
name: 'name of episode',
pubdate: 'published date'
],
So my question is: Why is cheerio not returning values wrapped in CDATA / how can I make it return those values.
I am using cheerio in nodejs to parse some rss feeds. I am grabbing all the items putting them into an array. I am using 3 test feeds, all of them have a "description" child element for each "item" element. In one of the feeds the whole "description" is wrapped as CDATA, and I cant get its value. Here is an abbreviated code snippet
//Open the xml document with cheerio
$ = cheerio.load(arrXmlDocs[i],{ ignoreWhitespace : true, xmlMode : true});
//Loop through every item
$('item').each(function(i, xmlItem){
//array to hold each item being converted into an array
var tempArray = [];
//Loop through each child of <item>
$(xmlItem).children().each(function(i, xmlItem){
//Get the name
tempArray[$(this)[0].name] = $(this).text();
}
}
As expected the two rss feeds that dont have CDATA give me an array like this
[
[
name: 'name of episode',
description:'description of episode',
pubdate: 'published date'
],
[
name: 'name of episode',
description:'description of episode',
pubdate: 'published date'
]
]
and the feed with the CDATA description looks like this
[
name: 'name of episode',
pubdate: 'published date'
],
So my question is: Why is cheerio not returning values wrapped in CDATA / how can I make it return those values.
Share Improve this question edited Mar 18, 2013 at 10:57 PaulParton asked Mar 18, 2013 at 8:12 PaulPartonPaulParton 1,0331 gold badge11 silver badges19 bronze badges 2- 1 Can you make it more clear exactly what you are asking here? – neelsg Commented Mar 18, 2013 at 8:28
- updated to more clearly ask the question. – PaulParton Commented Mar 18, 2013 at 8:50
1 Answer
Reset to default 7This is a known issue (related) with cheerio. It is unable to create a correct tree out of XML with CDATA
in your case yet. I know this is a disappointing answer, it's WIP.
It is being worked on, meanwhile, you can remove CDATA
with a Regular Expression.
arrXmlDocs[i].replace(/<!\[CDATA\[([\s\S]*?)\]\]>(?=\s*<)/gi, "$1");
Here is a link to an example jsfiddle.
While this is not an ideal solution, it should suffice until they work this issue out.