最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Cannot get querySelectorAll to work with puppeteer (returns undefined) - Stack Overflow

programmeradmin2浏览0评论

I'm trying to practice some web scraping with prices from a supermarket. It's with node.js and puppeteer. I can navigate throught the website in beginning with accepting cookies and clicking a "load more button". But then when I try to read div's containing the products with querySelectorAll I get stuck. It returns undefined even though I wait for a specific div to be present. What am I missing?

Problem is at the end of the code block.

const { product } = require("puppeteer");

const scraperObjectAll = {
    url: '/?query=',
    async scraper(browser) {
        let page = await browser.newPage();
        console.log(`Navigating to ${this.url}`);
        await page.goto(this.url);

        // accept cookies
        await page.evaluate(_ => {
            CookieInformation.submitAllCategories();
        });

        var productsRead = 0;
        var productsTotal = Number.MAX_VALUE;

        while (productsRead < 100) {
            // Wait for the required DOM to be rendered
            await page.waitForSelector('button.btn.btn-dark.border-radius.my-3');
            // Click button to read more products
            await page.evaluate(_ => {
                document.querySelector("button.btn.btn-dark.border-radius.my-3").click()
            });
            // Wait for it to load the new products
            await page.waitForSelector('div.col-10.col-sm-4.col-lg-2.text-center.mt-4.text-secondary');
            // Get number of products read and total
            const loadProducts = await page.evaluate(_ => {
                let p = document.querySelector("div.col-10.col-sm-4.col-lg-2").innerText.replace("INDLÆS FLERE", "").replace("Du har set ","").replace(" ", "").replace(/(\r\n|\n|\r)/gm,"").split("af ");
                return p;
            });

            console.log("Products (read/total): " + loadProducts);
            productsRead = loadProducts[0];
            productsTotal = loadProducts[1];

            // Now waiting for a div element
            await page.waitForSelector('div[data-productid]');

            const getProducts = await page.evaluate(_ => {
                return document.querySelectorAll('div');
            });

            // PROBLEM HERE!
            // Cannot convert undefined or null to object
            console.log("LENGTH: " + Array.from(getProducts).length);
        }

I'm trying to practice some web scraping with prices from a supermarket. It's with node.js and puppeteer. I can navigate throught the website in beginning with accepting cookies and clicking a "load more button". But then when I try to read div's containing the products with querySelectorAll I get stuck. It returns undefined even though I wait for a specific div to be present. What am I missing?

Problem is at the end of the code block.

const { product } = require("puppeteer");

const scraperObjectAll = {
    url: 'https://www.bilkatogo.dk/s/?query=',
    async scraper(browser) {
        let page = await browser.newPage();
        console.log(`Navigating to ${this.url}`);
        await page.goto(this.url);

        // accept cookies
        await page.evaluate(_ => {
            CookieInformation.submitAllCategories();
        });

        var productsRead = 0;
        var productsTotal = Number.MAX_VALUE;

        while (productsRead < 100) {
            // Wait for the required DOM to be rendered
            await page.waitForSelector('button.btn.btn-dark.border-radius.my-3');
            // Click button to read more products
            await page.evaluate(_ => {
                document.querySelector("button.btn.btn-dark.border-radius.my-3").click()
            });
            // Wait for it to load the new products
            await page.waitForSelector('div.col-10.col-sm-4.col-lg-2.text-center.mt-4.text-secondary');
            // Get number of products read and total
            const loadProducts = await page.evaluate(_ => {
                let p = document.querySelector("div.col-10.col-sm-4.col-lg-2").innerText.replace("INDLÆS FLERE", "").replace("Du har set ","").replace(" ", "").replace(/(\r\n|\n|\r)/gm,"").split("af ");
                return p;
            });

            console.log("Products (read/total): " + loadProducts);
            productsRead = loadProducts[0];
            productsTotal = loadProducts[1];

            // Now waiting for a div element
            await page.waitForSelector('div[data-productid]');

            const getProducts = await page.evaluate(_ => {
                return document.querySelectorAll('div');
            });

            // PROBLEM HERE!
            // Cannot convert undefined or null to object
            console.log("LENGTH: " + Array.from(getProducts).length);
        }
Share Improve this question asked Dec 12, 2020 at 23:20 Kasper HansenKasper Hansen 6,56721 gold badges72 silver badges107 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 6

The callback passed to page.evaluate runs in the emulated page context, not in the standard scope of the Node script. Expressions can't be passed between the page and the Node script without careful considerations: most importantly, if something isn't serializable (converted into plain JSON), it can't be transferred.

querySelectorAll returns a NodeList, and NodeLists only exist on the front-end, not the backend. Similarly, NodeLists contain HTMLElements, which also only exist on the front-end.

Put all the logic that requires using the data that exists only on the front-end inside the .evaluate callback, for example:

const numberOfDivs = await page.evaluate(_ => {
  return document.querySelectorAll('div').length;
});

or

const firstDivText = await page.evaluate(_ => {
  return document.querySelector('div').textContent;
});
发布评论

评论列表(0)

  1. 暂无评论