最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Get all plain text with Puppeteer - Stack Overflow

programmeradmin2浏览0评论

I can get all code of page with Puppeteer, but how I can get only the plain text? without tags?

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('');
  console.log(await page.content()); //Get all code
  await browser.close();
})();

I can get all code of page with Puppeteer, but how I can get only the plain text? without tags?

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://google.');
  console.log(await page.content()); //Get all code
  await browser.close();
})();
Share Improve this question asked Oct 18, 2017 at 14:48 pypypupypypu 1551 gold badge2 silver badges6 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 10

I haven't tried it, but $eval might work for you:

await page.$eval('*', el => el.innerText);

guys. I've gathered few possible variants in my article: How to get all text from a webpage using Puppeteer?

To keep things short:

  1. innerText variant. Works with most webpages, but not all of them
await page.$eval('*', el => el.innerText);
  1. Select text variant. Works with more webpages
await page.$eval('*', (el) => {
        const selection = window.getSelection();
        const range = document.createRange();
        range.selectNode(el);
        selection.removeAllRanges();
        selection.addRange(range);
        return window.getSelection().toString();
    });
  1. Use a third-party library of your choice (like html-to-text)
发布评论

评论列表(0)

  1. 暂无评论