最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to make puppeteer load websites faster? - Stack Overflow

programmeradmin4浏览0评论

so am working with puppeteer to automate stuffs and it working fine, but when i load the website it take a bit more time to load than my normal website, i tried doing cache using this

const puppeteer = require('puppeteer');
let time = new Date()
async function test() {
    const browser = await puppeteer.launch({
        headless: true, 
       executablePath:"D:\\Desktop\\node_modules\\puppeteer\\.local-chromium\\win64-848005\\chrome-win\\chrome.exe",
        args: ['--no-sandbox'], 
    });
    const page = await browser.newPage();
    const response = await page.goto('/');
    console.log(`${new Date() -time }`)
    console.log(response);
    await browser.close();
}

and it worked for the example the cache was stored and it became faster to load but my targeted website seem to dont allow for cache storing

any another way to fasten the process ?

so am working with puppeteer to automate stuffs and it working fine, but when i load the website it take a bit more time to load than my normal website, i tried doing cache using this

const puppeteer = require('puppeteer');
let time = new Date()
async function test() {
    const browser = await puppeteer.launch({
        headless: true, 
       executablePath:"D:\\Desktop\\node_modules\\puppeteer\\.local-chromium\\win64-848005\\chrome-win\\chrome.exe",
        args: ['--no-sandbox'], 
    });
    const page = await browser.newPage();
    const response = await page.goto('https://example./');
    console.log(`${new Date() -time }`)
    console.log(response);
    await browser.close();
}

and it worked for the example. the cache was stored and it became faster to load but my targeted website seem to dont allow for cache storing

any another way to fasten the process ?

Share Improve this question asked Mar 10, 2021 at 10:34 DUMBUSERDUMBUSER 5519 silver badges25 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 6

If you just want the site to load faster when scraping and you do not rely on some of the images or javascript, you have the possibility to block these resources.

Blocking by Resource Type

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', (req) => {
    if (req.resourceType() === 'image') {
      req.abort();
    } else {
      req.continue();
    }
  });

  await page.goto('https://bbc.');
  await page.screenshot({path: 'no-images.png', fullPage: true});
  await browser.close();
})();

Blocking by Domain

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: true,
  });
  const page = await browser.newPage();
  const options = {
    waitUntil: 'networkidle2',
    timeout: 30000,
  };

  // Before: Normal navigtation
  await page.goto('https://theverge.', options);
  await page.screenshot({path: 'before.png', fullPage: true});
  const metrics = await page.metrics();
  console.info(metrics);

  // After: Navigation with some domains blocked

  // Array of third-party domains to block
  const blockedDomains = [
    'https://pagead2.googlesyndication.',
    'https://creativecdn.',
    'https://www.googletagmanager.',
    'https://cdn.krxd',
    'https://adservice.google.',
    'https://cdn.concert.io',
    'https://z.moatads.',
    'https://cdn.permutive.'];
  await page.setRequestInterception(true);
  page.on('request', (request) => {
    const url = request.url();
    if (blockedDomains.some((d) => url.startsWith(d))) {
      request.abort();
    } else {
      request.continue();
    }
  });

  await page.goto('https://theverge.', options);
  await page.screenshot({path: 'after.png', fullPage: true});

  const metricsAfter = await page.metrics();
  console.info(metricsAfter);

  await browser.close();
})();

Source: https://github./addyosmani/puppeteer-webperf

发布评论

评论列表(0)

  1. 暂无评论