最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

html - Get an element with the text after it with Cheerio - Stack Overflow

programmeradmin2浏览0评论

I am using cheerio library in node.js project, I am trying to get the date from after the tag in additinal to the a tag before it, in order to filter by date, but i cant get it done.

this is my html code :

    <a href="opensuse-su-2016_1623-1.json">opensuse-su-2016_1623-1.json</a>                       12-May-2024 14:53     40K
<a href="opensuse-su-2016_1623-1.json.asc">opensuse-su-2016_1623-1.json.asc</a>                   05-Dec-2024 12:32     819
<a href="opensuse-su-2016_1623-1.json.sha256">opensuse-su-2016_1623-1.json.sha256</a>                14-May-2024 15:09      95
<a href="opensuse-su-2016_1769-1.json">opensuse-su-2016_1769-1.json</a>                       12-May-2024 14:53    124K
<a href="opensuse-su-2016_1769-1.json.asc">opensuse-su-2016_1769-1.json.asc</a>                   05-Dec-2024 12:32     819
<a href="opensuse-su-2016_1769-1.json.sha256">opensuse-su-2016_1769-1.json.sha256</a>                14-May-2024 15:09      95
<a href="opensuse-su-2016_1778-1.json">opensuse-su-2016_1778-1.json</a>                       12-May-2024 14:53    124K
<a href="opensuse-su-2016_1778-1.json.asc">opensuse-su-2016_1778-1.json.asc</a>                   05-Dec-2024 12:32     819
<a href="opensuse-su-2016_1778-1.json.sha256">opensuse-su-2016_1778-1.json.sha256</a>                14-May-2024 15:09      95
<a href="opensuse-su-2016_1868-1.json">opensuse-su-2016_1868-1.json</a>                       12-May-2024 14:53     47K
<a href="opensuse-su-2016_1868-1.json.asc">opensuse-su-2016_1868-1.json.asc</a>                   05-Dec-2024 12:32     819
<a href="opensuse-su-2016_1868-1.json.sha256">opensuse-su-2016_1868-1.json.sha256</a>                14-May-2024 15:09      95

this is my code, the a tags are selected successfully, but not the text following them :

elementconst $: CheerioAPI = cheerio.load(content);
        const fileInfoList: FileInfo[] = [];

        let links = $('a').toArray();

        links.forEach(l => {
            const el = $(l);
            const url = el.attr('href');
            if (!url.endsWith('.json')) {
                return;
            }
            if (minDate) {
                // Get the date text next to the anchor
                const dateText = $(l).next().text().trim();
...
}

dateText is null, how can i get the text after the tag ?

I am using cheerio library in node.js project, I am trying to get the date from after the tag in additinal to the a tag before it, in order to filter by date, but i cant get it done.

this is my html code :

    <a href="opensuse-su-2016_1623-1.json">opensuse-su-2016_1623-1.json</a>                       12-May-2024 14:53     40K
<a href="opensuse-su-2016_1623-1.json.asc">opensuse-su-2016_1623-1.json.asc</a>                   05-Dec-2024 12:32     819
<a href="opensuse-su-2016_1623-1.json.sha256">opensuse-su-2016_1623-1.json.sha256</a>                14-May-2024 15:09      95
<a href="opensuse-su-2016_1769-1.json">opensuse-su-2016_1769-1.json</a>                       12-May-2024 14:53    124K
<a href="opensuse-su-2016_1769-1.json.asc">opensuse-su-2016_1769-1.json.asc</a>                   05-Dec-2024 12:32     819
<a href="opensuse-su-2016_1769-1.json.sha256">opensuse-su-2016_1769-1.json.sha256</a>                14-May-2024 15:09      95
<a href="opensuse-su-2016_1778-1.json">opensuse-su-2016_1778-1.json</a>                       12-May-2024 14:53    124K
<a href="opensuse-su-2016_1778-1.json.asc">opensuse-su-2016_1778-1.json.asc</a>                   05-Dec-2024 12:32     819
<a href="opensuse-su-2016_1778-1.json.sha256">opensuse-su-2016_1778-1.json.sha256</a>                14-May-2024 15:09      95
<a href="opensuse-su-2016_1868-1.json">opensuse-su-2016_1868-1.json</a>                       12-May-2024 14:53     47K
<a href="opensuse-su-2016_1868-1.json.asc">opensuse-su-2016_1868-1.json.asc</a>                   05-Dec-2024 12:32     819
<a href="opensuse-su-2016_1868-1.json.sha256">opensuse-su-2016_1868-1.json.sha256</a>                14-May-2024 15:09      95

this is my code, the a tags are selected successfully, but not the text following them :

elementconst $: CheerioAPI = cheerio.load(content);
        const fileInfoList: FileInfo[] = [];

        let links = $('a').toArray();

        links.forEach(l => {
            const el = $(l);
            const url = el.attr('href');
            if (!url.endsWith('.json')) {
                return;
            }
            if (minDate) {
                // Get the date text next to the anchor
                const dateText = $(l).next().text().trim();
...
}

dateText is null, how can i get the text after the tag ?

Share Improve this question edited Mar 19 at 6:44 traynor 8,9023 gold badges15 silver badges28 bronze badges asked Mar 18 at 21:54 chaya Dchaya D 1693 silver badges15 bronze badges 1
  • It's a good idea to add the enclosing tag here so the example is complete. What is your expected output? Related: How to get next text node with cheerio, cheerio: Get normal + text nodes, How to get a text that's separated by different HTML tags in Cheerio, etc – ggorlen Commented Mar 19 at 4:35
Add a comment  | 

1 Answer 1

Reset to default 3

There are a few possible approaches, but the most straightforward is using .nextSibling.nodeValue (from my Cheerio recipes/cheatsheet) and this answer:

const cheerio = require("cheerio"); // ^1.0.0-rc.12

const html = `<Your HTML>`;

const $ = cheerio.load(html);
const result = [...$("a")]
  .map(e => ({
    url: $(e).attr("href"),
    text: $(e)[0]
      .nextSibling.nodeValue.trim()
      .split(/\s{2,}/),
  }))
  .filter(({url}) => url.endsWith(".json"));
console.log(result);

Output:

[
  {
    url: 'opensuse-su-2016_1623-1.json',
    text: [ '12-May-2024 14:53', '40K' ]
  },
  {
    url: 'opensuse-su-2016_1769-1.json',
    text: [ '12-May-2024 14:53', '124K' ]
  },
  {
    url: 'opensuse-su-2016_1778-1.json',
    text: [ '12-May-2024 14:53', '124K' ]
  },
  {
    url: 'opensuse-su-2016_1868-1.json',
    text: [ '12-May-2024 14:53', '47K' ]
  }
]

If you don't like my additional splitting, you may remove it.

Here's another way, using .contents() on the enclosing parent element (I assume it's a <div> here):

const $ = cheerio.load(html);
const contents = [...$("div").contents()]
  .map(e => $(e).text().trim())
  .filter(Boolean);
const result = contents
  .flatMap((e, i) =>
    i % 2 === 0 && e.endsWith(".json")
      ? {url: e, text: contents[i + 1].split(/\s{2,}/)}
      : []
  );
console.log(result);
发布评论

评论列表(0)

  1. 暂无评论