I'm using Cheerio JS to simplify some ancient HTML code and transform it into HTML5. Among other things, I'm replacing some markup-heavy quotes that look like the following:
Node to be replaced:
<div style="margin:20px; margin-top:5px; ">
<div class="smallfont" style="margin-bottom:2px">Quote:</div>
<table cellpadding="6" cellspacing="0" border="0" width="100%">
<tbody>
<tr>
<td class="alt2" style="border:1px solid #999">
<div>
Originally Posted by <strong>Username</strong>
</div>
<div style="font-style:italic">Lorem ipsum dolor sit amet</div>
</td>
</tr>
</tbody>
</table>
</div>
The transformed output is supposed to look like this:
<blockquote>Lorem ipsum dolor sit amet</blockquote>
Here's the code current code I'm using at this time:
$(`table[id^='post']`).each( (i, el) => {
// Get the post
let postBody = $(el).find(`div[id^='post_message_']`).html().trim();
// Replace quotes with blockquotes
cheerio.load(postBody)('div[style^="margin:20px; margin-top:5px; "]').each( (i, el) => {
if ($(el).html().trim().startsWith('<div class="smallfont" style="margin-bottom:2px">Quote')) {
let tbody = $(el).find('tbody > tr > td').html();
let quote = $(el).find('tbody > tr > td > div');
if (quote.html() && quote.text().trim().startsWith('Originally Posted by')) {
let replacement = $('<blockquote>Hello</blockquote>');
quote.parent().html().replace(quote.html(), replacement);
}
// Looks all good
console.log($(el).html())
}
postBody = $(el).html();
});
});
And lastly, more HTML for some context:
<div id="post_message_123456">
As Username has previously written
<br>
<div style="margin:20px; margin-top:5px; ">
<div class="smallfont" style="margin-bottom:2px">Quote:</div>
<table cellpadding="6" cellspacing="0" border="0" width="100%">
<tbody>
<tr>
<td class="alt2" style="border:1px solid #999">
<div>
Originally Posted by <strong>Username</strong>
</div>
<div style="font-style:italic">Lorem ipsum dolor sit amet</div>
</td>
</tr>
</tbody>
</table>
</div>
<br>
I think he has a point!
<img src="smile-with-sunglasses.gif" />
</div>
The replacement itself seems to work, the output of the console.log()
statement looks all good. The problem lies in the last line, where I'm trying to replace the original content with the replacement. However, postBody
looks like it did before. What am I doing wrong?
I'm using Cheerio JS to simplify some ancient HTML code and transform it into HTML5. Among other things, I'm replacing some markup-heavy quotes that look like the following:
Node to be replaced:
<div style="margin:20px; margin-top:5px; ">
<div class="smallfont" style="margin-bottom:2px">Quote:</div>
<table cellpadding="6" cellspacing="0" border="0" width="100%">
<tbody>
<tr>
<td class="alt2" style="border:1px solid #999">
<div>
Originally Posted by <strong>Username</strong>
</div>
<div style="font-style:italic">Lorem ipsum dolor sit amet</div>
</td>
</tr>
</tbody>
</table>
</div>
The transformed output is supposed to look like this:
<blockquote>Lorem ipsum dolor sit amet</blockquote>
Here's the code current code I'm using at this time:
$(`table[id^='post']`).each( (i, el) => {
// Get the post
let postBody = $(el).find(`div[id^='post_message_']`).html().trim();
// Replace quotes with blockquotes
cheerio.load(postBody)('div[style^="margin:20px; margin-top:5px; "]').each( (i, el) => {
if ($(el).html().trim().startsWith('<div class="smallfont" style="margin-bottom:2px">Quote')) {
let tbody = $(el).find('tbody > tr > td').html();
let quote = $(el).find('tbody > tr > td > div');
if (quote.html() && quote.text().trim().startsWith('Originally Posted by')) {
let replacement = $('<blockquote>Hello</blockquote>');
quote.parent().html().replace(quote.html(), replacement);
}
// Looks all good
console.log($(el).html())
}
postBody = $(el).html();
});
});
And lastly, more HTML for some context:
<div id="post_message_123456">
As Username has previously written
<br>
<div style="margin:20px; margin-top:5px; ">
<div class="smallfont" style="margin-bottom:2px">Quote:</div>
<table cellpadding="6" cellspacing="0" border="0" width="100%">
<tbody>
<tr>
<td class="alt2" style="border:1px solid #999">
<div>
Originally Posted by <strong>Username</strong>
</div>
<div style="font-style:italic">Lorem ipsum dolor sit amet</div>
</td>
</tr>
</tbody>
</table>
</div>
<br>
I think he has a point!
<img src="smile-with-sunglasses.gif" />
</div>
The replacement itself seems to work, the output of the console.log()
statement looks all good. The problem lies in the last line, where I'm trying to replace the original content with the replacement. However, postBody
looks like it did before. What am I doing wrong?
-
There's no
div[id^='post_message_']
element in that html – pguardiario Commented Oct 6, 2018 at 22:58 - @pguardiario How do you know what the markup looks like? I've only posted the part I'd like to replace and mentioned that it's working up until the last line. – idleberg Commented Oct 7, 2018 at 12:42
- Post html that works with your code please – pguardiario Commented Oct 8, 2018 at 0:14
- Okay, I've updated my question – idleberg Commented Oct 8, 2018 at 6:28
3 Answers
Reset to default 8Try it like this:
let $ = cheerio.load(html)
$('.alt2 div:contains("Originally Posted by")')
.replaceWith('<blockquote>Lorem ipsum dolor sit amet</blockquote>')
console.log($.html())
Replace items based on individual context
This demonstrates how you could swap out insecure with secure URLs as a useful real-world example and also make programatic decisions that is much easier to do than with regex for most normal humans.
const $ = cheerio.load(html)
// example replace all http:// with https://
$('img[src^="http://"]').replaceWith(function() {
const src = $(this).attr('src')
if (src.indexOf('s3.amazon.')) {
src = src.replace('s3.amazon.', 'storage.azure')
}
return $(this).attr('src', src.replace('http://', 'https://'))
})
If you are looking to transfer only attributes of the HTML nodes, like link href
and img src
or text content, I suggest using Cheerio's each
instead of replaceWith
, as based on my experience, the replaceWith
is somewhat more problematic for some edge cases. You do not need to replace the element as a whole, you can mutate its attributes and children as you wish.
Example:
$('img').each(function () {
const $this = $(this);
let src = $this.attr('rc');
if (src.indexOf('s3.amazon.')) {
src = src.replace('s3.amazon.', 'storage.azure')
}
$this.attr('src', src)
});