最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - playwright disable caching of webpage so I can fetch new elements after scrolling down - Stack Overflow

programmeradmin4浏览0评论

I'm using playwright (Firefox browser) to scrape some websites. Many of the websites load more content when I scroll down the page. The problem is that the new content loaded is not being picked up by the await page.$$("") methods.

But If I do a document.querySelectorAll("") on the console after the scroll, then I am able to get the newer content as well.

I see that puppeteer has a setting page.setCacheEnabled(enabled) that allows for disabling cache, but I can't find a similar thing in playwright.

I'm using playwright (Firefox browser) to scrape some websites. Many of the websites load more content when I scroll down the page. The problem is that the new content loaded is not being picked up by the await page.$$("") methods.

But If I do a document.querySelectorAll("") on the console after the scroll, then I am able to get the newer content as well.

I see that puppeteer has a setting page.setCacheEnabled(enabled) that allows for disabling cache, but I can't find a similar thing in playwright.

Share Improve this question edited Jul 25, 2021 at 20:16 Curious101 asked Jul 25, 2021 at 20:10 Curious101Curious101 1,7483 gold badges19 silver badges42 bronze badges
Add a ment  | 

3 Answers 3

Reset to default 8

You are quite correct that there is no method like setCacheEnabled in playwright. One workaround is to set up a route for all requests:

page.route('**', route => route.continue());

You can see here that:

Enabling routing disables http cache.

Which should acplish the same thing.

According to this PR, to avoid cache use a browserContext

We had the same issue where Playwright 'remembers' the state of the browser from one session to another. The workaround that worked for us was to delete the test-data-dir directory before running the script.

playwright creates this directory once you run it. So, we have this script defined in our package.json

"cleanup:chrome": "rm -rf ./path/to/test-data-dir/* || true",

And then run playwright as usual

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论