I am trying to scrape just Jung Ho Kang
and 5
from this html and put it into an object. I want to exclude everything in the (R)
and the SS
.
<td id="lineup-table-top">
<b class="text-muted pad-left-10">5</b>
Jung Ho Kang
<small class="text-muted">(R)</small>
<small class="text-muted">SS</small>
</td>
Here is my code:
var someObjArr = [];
$('td#lineup-table-top').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the salary property of our object with the text value.
someObjArr[i].name = text;
$('b.pad-left-10').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the name property of our object with the text value.
someObjArr[i].batting = text;
});
});
The exact output from the code is as follows:
{ batting: '5',
name: '5 Jung Ho Kang (R) SS 3B' }
{ name: '5 Jung Ho Kang (R) SS' },
The Expected output:
{ batting: '5',
name: 'Jung Ho Kang' }
I don't know why it appears to be looping twice and I can't figure out how to isolate just the name without it having a class/id associated with it.
Any direction is enthusiastically appreciated.
I am trying to scrape just Jung Ho Kang
and 5
from this html and put it into an object. I want to exclude everything in the (R)
and the SS
.
<td id="lineup-table-top">
<b class="text-muted pad-left-10">5</b>
Jung Ho Kang
<small class="text-muted">(R)</small>
<small class="text-muted">SS</small>
</td>
Here is my code:
var someObjArr = [];
$('td#lineup-table-top').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the salary property of our object with the text value.
someObjArr[i].name = text;
$('b.pad-left-10').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the name property of our object with the text value.
someObjArr[i].batting = text;
});
});
The exact output from the code is as follows:
{ batting: '5',
name: '5 Jung Ho Kang (R) SS 3B' }
{ name: '5 Jung Ho Kang (R) SS' },
The Expected output:
{ batting: '5',
name: 'Jung Ho Kang' }
I don't know why it appears to be looping twice and I can't figure out how to isolate just the name without it having a class/id associated with it.
Any direction is enthusiastically appreciated.
Share Improve this question edited Aug 11, 2015 at 18:55 CiscoKidx asked Aug 11, 2015 at 18:38 CiscoKidxCiscoKidx 9209 silver badges30 bronze badges 2- In your output you show values that aren't in the HTML you posted. Can you edit your question to include all of the HTML? – Jordan Running Commented Aug 11, 2015 at 18:51
- @Jordan Done. My bad. – CiscoKidx Commented Aug 11, 2015 at 18:58
1 Answer
Reset to default 8Looks like you want to scrape only the text nodes in the markup.
https://github./cheeriojs/cheerio/issues/359
I'm not sure if nodeType
is supported yet, but you should try to use that first. (nodeType docs)
$('td#lineup-table-top').contents().each(function(i, element){
someObjArr[i] = someObjArr[i] || {};
// The first element in #linup-table-top is batting stats
if ( i === 0 && $(element).hasClass('pad-left-10') ) {
someObjArr[i].name = $(element).text().trim();
}
// The raw text inside of #lineup-table-top the player name
if ( element.nodeType === 3 ) {
someObjArr[i].name = $(element).toString().trim();
}
});
If it's not supported, you can fall back to using element.type
if ( element.type === 'text' ) {
someObjArr[i] = someObjArr[i] || {};
someObjArr[i].name = $(element).toString().trim();
}
I used this in the past to scrape only the text within an entire page of markup.
// For each DOM element in the page
$('*').each(function(i, element) {
// Scrape only the text nodes
$(element).contents().each(function(i, element) {
if (element.type === 'text') {
}
});
});