最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Replace HTML nodes with Cheerio - Stack Overflow

programmeradmin4浏览0评论

I'm using Cheerio JS to simplify some ancient HTML code and transform it into HTML5. Among other things, I'm replacing some markup-heavy quotes that look like the following:

Node to be replaced:

<div style="margin:20px; margin-top:5px; ">
    <div class="smallfont" style="margin-bottom:2px">Quote:</div>
    <table cellpadding="6" cellspacing="0" border="0" width="100%">
        <tbody>
            <tr>
                <td class="alt2" style="border:1px solid #999">
                    <div>
                        Originally Posted by <strong>Username</strong>
                    </div>
                    <div style="font-style:italic">Lorem ipsum dolor sit amet</div>
                </td>
            </tr>
        </tbody>
    </table>
</div>

The transformed output is supposed to look like this:

<blockquote>Lorem ipsum dolor sit amet</blockquote>

Here's the code current code I'm using at this time:

$(`table[id^='post']`).each( (i, el) => {
    // Get the post
    let postBody = $(el).find(`div[id^='post_message_']`).html().trim();

    // Replace quotes with blockquotes
    cheerio.load(postBody)('div[style^="margin:20px; margin-top:5px; "]').each( (i, el) => {
        if ($(el).html().trim().startsWith('<div class="smallfont" style="margin-bottom:2px">Quote')) {
            let tbody = $(el).find('tbody > tr > td').html();
            let quote = $(el).find('tbody > tr > td > div');

            if (quote.html() && quote.text().trim().startsWith('Originally Posted by')) {
                let replacement = $('<blockquote>Hello</blockquote>');
                quote.parent().html().replace(quote.html(), replacement);
            }

            // Looks all good
            console.log($(el).html())
        }

        postBody = $(el).html();
    });
});

And lastly, more HTML for some context:

<div id="post_message_123456">
    As Username has previously written
    <br>
    <div style="margin:20px; margin-top:5px; ">
        <div class="smallfont" style="margin-bottom:2px">Quote:</div>
        <table cellpadding="6" cellspacing="0" border="0" width="100%">
            <tbody>
                <tr>
                    <td class="alt2" style="border:1px solid #999">

                        <div>
                            Originally Posted by <strong>Username</strong>
                        </div>
                        <div style="font-style:italic">Lorem ipsum dolor sit amet</div>
                    </td>
                </tr>
            </tbody>
        </table>
    </div>
    <br>
    I think he has a point!
    <img src="smile-with-sunglasses.gif" />
</div>

The replacement itself seems to work, the output of the console.log() statement looks all good. The problem lies in the last line, where I'm trying to replace the original content with the replacement. However, postBody looks like it did before. What am I doing wrong?

I'm using Cheerio JS to simplify some ancient HTML code and transform it into HTML5. Among other things, I'm replacing some markup-heavy quotes that look like the following:

Node to be replaced:

<div style="margin:20px; margin-top:5px; ">
    <div class="smallfont" style="margin-bottom:2px">Quote:</div>
    <table cellpadding="6" cellspacing="0" border="0" width="100%">
        <tbody>
            <tr>
                <td class="alt2" style="border:1px solid #999">
                    <div>
                        Originally Posted by <strong>Username</strong>
                    </div>
                    <div style="font-style:italic">Lorem ipsum dolor sit amet</div>
                </td>
            </tr>
        </tbody>
    </table>
</div>

The transformed output is supposed to look like this:

<blockquote>Lorem ipsum dolor sit amet</blockquote>

Here's the code current code I'm using at this time:

$(`table[id^='post']`).each( (i, el) => {
    // Get the post
    let postBody = $(el).find(`div[id^='post_message_']`).html().trim();

    // Replace quotes with blockquotes
    cheerio.load(postBody)('div[style^="margin:20px; margin-top:5px; "]').each( (i, el) => {
        if ($(el).html().trim().startsWith('<div class="smallfont" style="margin-bottom:2px">Quote')) {
            let tbody = $(el).find('tbody > tr > td').html();
            let quote = $(el).find('tbody > tr > td > div');

            if (quote.html() && quote.text().trim().startsWith('Originally Posted by')) {
                let replacement = $('<blockquote>Hello</blockquote>');
                quote.parent().html().replace(quote.html(), replacement);
            }

            // Looks all good
            console.log($(el).html())
        }

        postBody = $(el).html();
    });
});

And lastly, more HTML for some context:

<div id="post_message_123456">
    As Username has previously written
    <br>
    <div style="margin:20px; margin-top:5px; ">
        <div class="smallfont" style="margin-bottom:2px">Quote:</div>
        <table cellpadding="6" cellspacing="0" border="0" width="100%">
            <tbody>
                <tr>
                    <td class="alt2" style="border:1px solid #999">

                        <div>
                            Originally Posted by <strong>Username</strong>
                        </div>
                        <div style="font-style:italic">Lorem ipsum dolor sit amet</div>
                    </td>
                </tr>
            </tbody>
        </table>
    </div>
    <br>
    I think he has a point!
    <img src="smile-with-sunglasses.gif" />
</div>

The replacement itself seems to work, the output of the console.log() statement looks all good. The problem lies in the last line, where I'm trying to replace the original content with the replacement. However, postBody looks like it did before. What am I doing wrong?

Share Improve this question edited Oct 8, 2018 at 6:28 idleberg asked Oct 6, 2018 at 13:37 idlebergidleberg 12.9k9 gold badges45 silver badges71 bronze badges 4
  • There's no div[id^='post_message_'] element in that html – pguardiario Commented Oct 6, 2018 at 22:58
  • @pguardiario How do you know what the markup looks like? I've only posted the part I'd like to replace and mentioned that it's working up until the last line. – idleberg Commented Oct 7, 2018 at 12:42
  • Post html that works with your code please – pguardiario Commented Oct 8, 2018 at 0:14
  • Okay, I've updated my question – idleberg Commented Oct 8, 2018 at 6:28
Add a ment  | 

3 Answers 3

Reset to default 8

Try it like this:

let $ = cheerio.load(html)

$('.alt2 div:contains("Originally Posted by")')
  .replaceWith('<blockquote>Lorem ipsum dolor sit amet</blockquote>')

console.log($.html())

Replace items based on individual context

This demonstrates how you could swap out insecure with secure URLs as a useful real-world example and also make programatic decisions that is much easier to do than with regex for most normal humans.

const $ = cheerio.load(html)
// example replace all http:// with https://
$('img[src^="http://"]').replaceWith(function() {
  const src = $(this).attr('src')
  if (src.indexOf('s3.amazon.')) {
    src = src.replace('s3.amazon.', 'storage.azure')
  }
  return $(this).attr('src', src.replace('http://', 'https://'))
})

If you are looking to transfer only attributes of the HTML nodes, like link href and img src or text content, I suggest using Cheerio's each instead of replaceWith, as based on my experience, the replaceWith is somewhat more problematic for some edge cases. You do not need to replace the element as a whole, you can mutate its attributes and children as you wish.

Example:

    $('img').each(function () {
        const $this = $(this);
        let src = $this.attr('rc');
        if (src.indexOf('s3.amazon.')) {
           src = src.replace('s3.amazon.', 'storage.azure')
        }
        $this.attr('src', src)
    });
发布评论

评论列表(0)

  1. 暂无评论