I can get all code of page with Puppeteer, but how I can get only the plain text? without tags?
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('');
console.log(await page.content()); //Get all code
await browser.close();
})();
I can get all code of page with Puppeteer, but how I can get only the plain text? without tags?
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://google.');
console.log(await page.content()); //Get all code
await browser.close();
})();
Share
Improve this question
asked Oct 18, 2017 at 14:48
pypypupypypu
1551 gold badge2 silver badges6 bronze badges
2 Answers
Reset to default 10I haven't tried it, but $eval
might work for you:
await page.$eval('*', el => el.innerText);
guys. I've gathered few possible variants in my article: How to get all text from a webpage using Puppeteer?
To keep things short:
innerText
variant. Works with most webpages, but not all of them
await page.$eval('*', el => el.innerText);
- Select text variant. Works with more webpages
await page.$eval('*', (el) => {
const selection = window.getSelection();
const range = document.createRange();
range.selectNode(el);
selection.removeAllRanges();
selection.addRange(range);
return window.getSelection().toString();
});
- Use a third-party library of your choice (like
html-to-text
)