I'm using playwright (Firefox browser) to scrape some websites. Many of the websites load more content when I scroll down the page. The problem is that the new content loaded is not being picked up by the await page.$$("") methods.
But If I do a document.querySelectorAll("") on the console after the scroll, then I am able to get the newer content as well.
I see that puppeteer has a setting page.setCacheEnabled(enabled) that allows for disabling cache, but I can't find a similar thing in playwright.
I'm using playwright (Firefox browser) to scrape some websites. Many of the websites load more content when I scroll down the page. The problem is that the new content loaded is not being picked up by the await page.$$("") methods.
But If I do a document.querySelectorAll("") on the console after the scroll, then I am able to get the newer content as well.
I see that puppeteer has a setting page.setCacheEnabled(enabled) that allows for disabling cache, but I can't find a similar thing in playwright.
Share Improve this question edited Jul 25, 2021 at 20:16 Curious101 asked Jul 25, 2021 at 20:10 Curious101Curious101 1,7483 gold badges19 silver badges42 bronze badges3 Answers
Reset to default 8You are quite correct that there is no method like setCacheEnabled
in playwright. One workaround is to set up a route for all requests:
page.route('**', route => route.continue());
You can see here that:
Enabling routing disables http cache.
Which should acplish the same thing.
According to this PR, to avoid cache use a browserContext
We had the same issue where Playwright 'remembers' the state of the browser from one session to another. The workaround that worked for us was to delete the test-data-dir
directory before running the script.
playwright creates this directory once you run it. So, we have this script defined in our package.json
"cleanup:chrome": "rm -rf ./path/to/test-data-dir/* || true",
And then run playwright as usual