最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to cache file with puppeteer - Stack Overflow

programmeradmin0浏览0评论

I want to know how I can cache a file with puppeteer, so I don't have to load it again when the script starts, assuming I have this script:

async function run () {
 const browser = await puppeteer.launch();
 const page = await browser.newPage();
 await page.goto("/");
 browser.close();
}
run();

Well, if I wanted to save the html so it wouldn't be necessary to load it again, how would I do it? I researched and found How can I disable cache in puppeteer? but I didn't find many details neither in the answer, nor in the question, could someone explain to me how to save the html in cache for example?

I want to know how I can cache a file with puppeteer, so I don't have to load it again when the script starts, assuming I have this script:

async function run () {
 const browser = await puppeteer.launch();
 const page = await browser.newPage();
 await page.goto("https://www.amazon./");
 browser.close();
}
run();

Well, if I wanted to save the html so it wouldn't be necessary to load it again, how would I do it? I researched and found How can I disable cache in puppeteer? but I didn't find many details neither in the answer, nor in the question, could someone explain to me how to save the html in cache for example?

Share Improve this question edited Jun 6, 2021 at 15:26 ggorlen 57.4k8 gold badges110 silver badges154 bronze badges asked Jun 6, 2021 at 9:02 user15594988user15594988 8
  • Could you explain a bit more? Puppeteer emulates browser's behavior so it cache resources the same way as browser does. What problem are you trying to solve? – Drag13 Commented Jun 6, 2021 at 9:15
  • @Drag13 Well, I'm not sure if it stores the html anymore and if for example I have a javascript file that is being referenced in the html (javascript file separate from html), how could I save javascript files that are separate from html to be used again without having to load it again – user15594988 Commented Jun 6, 2021 at 9:26
  • @Drag13 Well assuming you want to cache this test.js file to be used again without having to reload it, how can I do this? I want to keep the file saved, to simply be used when you need it, without having to reload it – user15594988 Commented Jun 6, 2021 at 9:26
  • In case you are doing tests during one session and you don't disable cache manually and cache headers are present in your static resources it will be done automatically, the same way browser do. In case you want to cache resources in between launches - you have to "warm" (load it once) the page before tests to get resources cached – Drag13 Commented Jun 6, 2021 at 9:29
  • @Drag13 I believe that the html for example is not cached, because if I try to access any page online and then stop running the script and try to open the same page but offline the page doesn't load, and if it doesn't load it isn't stored in cache correct? – user15594988 Commented Jun 6, 2021 at 9:37
 |  Show 3 more ments

2 Answers 2

Reset to default 4

Puppeteer uses Chrome (or FireFox) browser under the hood, so in case:

  • This is not the first visit (cache filled)
  • Resources has proper cache headers and not expired (cache-control, etc.)
  • You didn't disable cache manually using
await page.setCacheEnabled(false);
await pageSession.send('Network.setCacheDisabled', { cacheDisabled: true });

Resources will be already cached and you don't need to do anything manually.

However, if you want to do testing on cached page, you will need to warm it up simply pre visiting it before tests, like in the example:

async function warmingBrowser(url: URL, pageInstance: Page) {
    await pageInstance.goto(url.href, { waitUntil: 'networkidle0' });
    await pageInstance.close();
}

The code is taken from the perfrunner

In case you want to make it work pletely offline - Puppeteer will not help with that, you need to implement your own caching strategy using the ServiceWorker.

But there are some pitfalls on this step (exactly with caching and invalidating the cache) so be aware.

You can use setRequestInterception and respond manually what you want:

page.setRequestInterception(true)
page.on('request', (req) => {
if (req.url() == 'your_url' && <there is a cached value>) {
   return request.respond({ status: 200, body: <your_cached_body>, <your_cached_headers> });
}
});

And cache requests if not already saved using:

page.on('requestfinished', async (request) => { 
if (request.url() != <check if already available in cached value>) { 
   // Save in cached value
}
发布评论

评论列表(0)

  1. 暂无评论
ok 不同模板 switch ($forum['model']) { /*case '0': include _include(APP_PATH . 'view/htm/read.htm'); break;*/ default: include _include(theme_load('read', $fid)); break; } } break; case '10': // 主题外链 / thread external link http_location(htmlspecialchars_decode(trim($thread['description']))); break; case '11': // 单页 / single page $attachlist = array(); $imagelist = array(); $thread['filelist'] = array(); $threadlist = NULL; $thread['files'] > 0 and list($attachlist, $imagelist, $thread['filelist']) = well_attach_find_by_tid($tid); $data = data_read_cache($tid); empty($data) and message(-1, lang('data_malformation')); $tidlist = $forum['threads'] ? page_find_by_fid($fid, $page, $pagesize) : NULL; if ($tidlist) { $tidarr = arrlist_values($tidlist, 'tid'); $threadlist = well_thread_find($tidarr, $pagesize); // 按之前tidlist排序 $threadlist = array2_sort_key($threadlist, $tidlist, 'tid'); } $allowpost = forum_access_user($fid, $gid, 'allowpost'); $allowupdate = forum_access_mod($fid, $gid, 'allowupdate'); $allowdelete = forum_access_mod($fid, $gid, 'allowdelete'); $access = array('allowpost' => $allowpost, 'allowupdate' => $allowupdate, 'allowdelete' => $allowdelete); $header['title'] = $thread['subject']; $header['mobile_link'] = $thread['url']; $header['keywords'] = $thread['keyword'] ? $thread['keyword'] : $thread['subject']; $header['description'] = $thread['description'] ? $thread['description'] : $thread['brief']; $_SESSION['fid'] = $fid; if ($ajax) { empty($conf['api_on']) and message(0, lang('closed')); $apilist['header'] = $header; $apilist['extra'] = $extra; $apilist['access'] = $access; $apilist['thread'] = well_thread_safe_info($thread); $apilist['thread_data'] = $data; $apilist['forum'] = $forum; $apilist['imagelist'] = $imagelist; $apilist['filelist'] = $thread['filelist']; $apilist['threadlist'] = $threadlist; message(0, $apilist); } else { include _include(theme_load('single_page', $fid)); } break; default: message(-1, lang('data_malformation')); break; } ?>