I am using cheerio library in node.js project, I am trying to get the date from after the tag in additinal to the a tag before it, in order to filter by date, but i cant get it done.
this is my html code :
<a href="opensuse-su-2016_1623-1.json">opensuse-su-2016_1623-1.json</a> 12-May-2024 14:53 40K
<a href="opensuse-su-2016_1623-1.json.asc">opensuse-su-2016_1623-1.json.asc</a> 05-Dec-2024 12:32 819
<a href="opensuse-su-2016_1623-1.json.sha256">opensuse-su-2016_1623-1.json.sha256</a> 14-May-2024 15:09 95
<a href="opensuse-su-2016_1769-1.json">opensuse-su-2016_1769-1.json</a> 12-May-2024 14:53 124K
<a href="opensuse-su-2016_1769-1.json.asc">opensuse-su-2016_1769-1.json.asc</a> 05-Dec-2024 12:32 819
<a href="opensuse-su-2016_1769-1.json.sha256">opensuse-su-2016_1769-1.json.sha256</a> 14-May-2024 15:09 95
<a href="opensuse-su-2016_1778-1.json">opensuse-su-2016_1778-1.json</a> 12-May-2024 14:53 124K
<a href="opensuse-su-2016_1778-1.json.asc">opensuse-su-2016_1778-1.json.asc</a> 05-Dec-2024 12:32 819
<a href="opensuse-su-2016_1778-1.json.sha256">opensuse-su-2016_1778-1.json.sha256</a> 14-May-2024 15:09 95
<a href="opensuse-su-2016_1868-1.json">opensuse-su-2016_1868-1.json</a> 12-May-2024 14:53 47K
<a href="opensuse-su-2016_1868-1.json.asc">opensuse-su-2016_1868-1.json.asc</a> 05-Dec-2024 12:32 819
<a href="opensuse-su-2016_1868-1.json.sha256">opensuse-su-2016_1868-1.json.sha256</a> 14-May-2024 15:09 95
this is my code, the a tags are selected successfully, but not the text following them :
elementconst $: CheerioAPI = cheerio.load(content);
const fileInfoList: FileInfo[] = [];
let links = $('a').toArray();
links.forEach(l => {
const el = $(l);
const url = el.attr('href');
if (!url.endsWith('.json')) {
return;
}
if (minDate) {
// Get the date text next to the anchor
const dateText = $(l).next().text().trim();
...
}
dateText is null, how can i get the text after the tag ?
I am using cheerio library in node.js project, I am trying to get the date from after the tag in additinal to the a tag before it, in order to filter by date, but i cant get it done.
this is my html code :
<a href="opensuse-su-2016_1623-1.json">opensuse-su-2016_1623-1.json</a> 12-May-2024 14:53 40K
<a href="opensuse-su-2016_1623-1.json.asc">opensuse-su-2016_1623-1.json.asc</a> 05-Dec-2024 12:32 819
<a href="opensuse-su-2016_1623-1.json.sha256">opensuse-su-2016_1623-1.json.sha256</a> 14-May-2024 15:09 95
<a href="opensuse-su-2016_1769-1.json">opensuse-su-2016_1769-1.json</a> 12-May-2024 14:53 124K
<a href="opensuse-su-2016_1769-1.json.asc">opensuse-su-2016_1769-1.json.asc</a> 05-Dec-2024 12:32 819
<a href="opensuse-su-2016_1769-1.json.sha256">opensuse-su-2016_1769-1.json.sha256</a> 14-May-2024 15:09 95
<a href="opensuse-su-2016_1778-1.json">opensuse-su-2016_1778-1.json</a> 12-May-2024 14:53 124K
<a href="opensuse-su-2016_1778-1.json.asc">opensuse-su-2016_1778-1.json.asc</a> 05-Dec-2024 12:32 819
<a href="opensuse-su-2016_1778-1.json.sha256">opensuse-su-2016_1778-1.json.sha256</a> 14-May-2024 15:09 95
<a href="opensuse-su-2016_1868-1.json">opensuse-su-2016_1868-1.json</a> 12-May-2024 14:53 47K
<a href="opensuse-su-2016_1868-1.json.asc">opensuse-su-2016_1868-1.json.asc</a> 05-Dec-2024 12:32 819
<a href="opensuse-su-2016_1868-1.json.sha256">opensuse-su-2016_1868-1.json.sha256</a> 14-May-2024 15:09 95
this is my code, the a tags are selected successfully, but not the text following them :
elementconst $: CheerioAPI = cheerio.load(content);
const fileInfoList: FileInfo[] = [];
let links = $('a').toArray();
links.forEach(l => {
const el = $(l);
const url = el.attr('href');
if (!url.endsWith('.json')) {
return;
}
if (minDate) {
// Get the date text next to the anchor
const dateText = $(l).next().text().trim();
...
}
dateText is null, how can i get the text after the tag ?
Share Improve this question edited Mar 19 at 6:44 traynor 8,9023 gold badges15 silver badges28 bronze badges asked Mar 18 at 21:54 chaya Dchaya D 1693 silver badges15 bronze badges 1- It's a good idea to add the enclosing tag here so the example is complete. What is your expected output? Related: How to get next text node with cheerio, cheerio: Get normal + text nodes, How to get a text that's separated by different HTML tags in Cheerio, etc – ggorlen Commented Mar 19 at 4:35
1 Answer
Reset to default 3There are a few possible approaches, but the most straightforward is using .nextSibling.nodeValue
(from my Cheerio recipes/cheatsheet) and this answer:
const cheerio = require("cheerio"); // ^1.0.0-rc.12
const html = `<Your HTML>`;
const $ = cheerio.load(html);
const result = [...$("a")]
.map(e => ({
url: $(e).attr("href"),
text: $(e)[0]
.nextSibling.nodeValue.trim()
.split(/\s{2,}/),
}))
.filter(({url}) => url.endsWith(".json"));
console.log(result);
Output:
[
{
url: 'opensuse-su-2016_1623-1.json',
text: [ '12-May-2024 14:53', '40K' ]
},
{
url: 'opensuse-su-2016_1769-1.json',
text: [ '12-May-2024 14:53', '124K' ]
},
{
url: 'opensuse-su-2016_1778-1.json',
text: [ '12-May-2024 14:53', '124K' ]
},
{
url: 'opensuse-su-2016_1868-1.json',
text: [ '12-May-2024 14:53', '47K' ]
}
]
If you don't like my additional splitting, you may remove it.
Here's another way, using .contents()
on the enclosing parent element (I assume it's a <div>
here):
const $ = cheerio.load(html);
const contents = [...$("div").contents()]
.map(e => $(e).text().trim())
.filter(Boolean);
const result = contents
.flatMap((e, i) =>
i % 2 === 0 && e.endsWith(".json")
? {url: e, text: contents[i + 1].split(/\s{2,}/)}
: []
);
console.log(result);