so am working with puppeteer to automate stuffs and it working fine, but when i load the website it take a bit more time to load than my normal website, i tried doing cache using this
const puppeteer = require('puppeteer');
let time = new Date()
async function test() {
const browser = await puppeteer.launch({
headless: true,
executablePath:"D:\\Desktop\\node_modules\\puppeteer\\.local-chromium\\win64-848005\\chrome-win\\chrome.exe",
args: ['--no-sandbox'],
});
const page = await browser.newPage();
const response = await page.goto('/');
console.log(`${new Date() -time }`)
console.log(response);
await browser.close();
}
and it worked for the example the cache was stored and it became faster to load but my targeted website seem to dont allow for cache storing
any another way to fasten the process ?
so am working with puppeteer to automate stuffs and it working fine, but when i load the website it take a bit more time to load than my normal website, i tried doing cache using this
const puppeteer = require('puppeteer');
let time = new Date()
async function test() {
const browser = await puppeteer.launch({
headless: true,
executablePath:"D:\\Desktop\\node_modules\\puppeteer\\.local-chromium\\win64-848005\\chrome-win\\chrome.exe",
args: ['--no-sandbox'],
});
const page = await browser.newPage();
const response = await page.goto('https://example./');
console.log(`${new Date() -time }`)
console.log(response);
await browser.close();
}
and it worked for the example. the cache was stored and it became faster to load but my targeted website seem to dont allow for cache storing
any another way to fasten the process ?
Share Improve this question asked Mar 10, 2021 at 10:34 DUMBUSERDUMBUSER 5519 silver badges25 bronze badges1 Answer
Reset to default 6If you just want the site to load faster when scraping and you do not rely on some of the images or javascript, you have the possibility to block these resources.
Blocking by Resource Type
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (req) => {
if (req.resourceType() === 'image') {
req.abort();
} else {
req.continue();
}
});
await page.goto('https://bbc.');
await page.screenshot({path: 'no-images.png', fullPage: true});
await browser.close();
})();
Blocking by Domain
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
const options = {
waitUntil: 'networkidle2',
timeout: 30000,
};
// Before: Normal navigtation
await page.goto('https://theverge.', options);
await page.screenshot({path: 'before.png', fullPage: true});
const metrics = await page.metrics();
console.info(metrics);
// After: Navigation with some domains blocked
// Array of third-party domains to block
const blockedDomains = [
'https://pagead2.googlesyndication.',
'https://creativecdn.',
'https://www.googletagmanager.',
'https://cdn.krxd',
'https://adservice.google.',
'https://cdn.concert.io',
'https://z.moatads.',
'https://cdn.permutive.'];
await page.setRequestInterception(true);
page.on('request', (request) => {
const url = request.url();
if (blockedDomains.some((d) => url.startsWith(d))) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://theverge.', options);
await page.screenshot({path: 'after.png', fullPage: true});
const metricsAfter = await page.metrics();
console.info(metricsAfter);
await browser.close();
})();
Source: https://github./addyosmani/puppeteer-webperf