javascript - Get all plain text with Puppeteer

I can get all code of page with Puppeteer, but how I can get only the plain text? without tags?

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('');
  console.log(await page.content()); //Get all code
  await browser.close();
})();

I can get all code of page with Puppeteer, but how I can get only the plain text? without tags?

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://google.');
  console.log(await page.content()); //Get all code
  await browser.close();
})();

Share Improve this question asked Oct 18, 2017 at 14:48 pypypu 1551 gold badge2 silver badges6 bronze badges

Add a ment |

2 Answers 2

Sorted by: Reset to default 10

I haven't tried it, but $eval might work for you:

await page.$eval('*', el => el.innerText);

guys. I've gathered few possible variants in my article: How to get all text from a webpage using Puppeteer?

To keep things short:

innerText variant. Works with most webpages, but not all of them

await page.$eval('*', el => el.innerText);

Select text variant. Works with more webpages

await page.$eval('*', (el) => {
        const selection = window.getSelection();
        const range = document.createRange();
        range.selectNode(el);
        selection.removeAllRanges();
        selection.addRange(range);
        return window.getSelection().toString();
    });

Use a third-party library of your choice (like html-to-text)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Get all plain text with Puppeteer - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)