最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Puppeteer - waitForResponse() timeout, but page.on('response') finds response - Stack Overflow

programmeradmin2浏览0评论

I'm trying to get an XHR response from a webpage. I found the

await page.waitForResponse(url);

or

await page.waitForResponse((res) => {
  if (res.url() === myUrl) return true;
});

method, but it always timeout for the url response I'm trying to get.

However, if I set

page.on('response', (res) => {
  if (res.url() === myUrl) {
    // do what I want with the response
  }
})

the correct response is found and I can retrive the data.

After some debugging, seems like waitForResponse() isn't returning any XHR req/res.

Any ideias?

EDIT: Example. For this case, its required to use puppeteer-extra-plugin-stealth and puppeteer-extra package, otherwise, this URL will return status code '403':

import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import UserAgent from 'user-agents';
import puppeteer from 'puppeteer-extra';
import { Page } from 'puppeteer';

const wantedUrl = '';

const workingFunction = async (page: Page) => {
    let reqCount = 0;
    let resCount = 0;

    page.on('request', req => {
        reqCount++;
        if (req.url() == wantedUrl) {
            console.log('The request I need: ', req.url());
            console.log(reqCount);
        }
    });
    page.on('response', async res => {
        resCount++;
        if (res.url() == wantedUrl) {
            console.log('The response I need:', await res.json());
            console.log(resCount);
        }
    });

    await page.goto('', {
        timeout: 0,
    });
};

const notWorkingFunction = async (page: Page) => {
    let resCount = 0;
    await page.goto('');
    const res = await page.waitForResponse(
        res => {
            resCount++;
            console.log(res.url());
            console.log(resCount);
            if (res.url() === wantedUrl) {
                return true;
            }
            return false;
        },
        { timeout: 0 }
    );

    return res;
};

(async () => {
    puppeteer.use(StealthPlugin());
    const browser = await puppeteer.launch({});
    const page = await browser.newPage();
    const userAgent = new UserAgent({ deviceCategory: 'desktop' });
    await page.setUserAgent(userAgent.random().toString());

    try {
        // workingFunction(page);
        const res = await notWorkingFunction(page);
    } catch (e) {
        console.log(e);
    }
})();

I'm trying to get an XHR response from a webpage. I found the

await page.waitForResponse(url);

or

await page.waitForResponse((res) => {
  if (res.url() === myUrl) return true;
});

method, but it always timeout for the url response I'm trying to get.

However, if I set

page.on('response', (res) => {
  if (res.url() === myUrl) {
    // do what I want with the response
  }
})

the correct response is found and I can retrive the data.

After some debugging, seems like waitForResponse() isn't returning any XHR req/res.

Any ideias?

EDIT: Example. For this case, its required to use puppeteer-extra-plugin-stealth and puppeteer-extra package, otherwise, this URL will return status code '403':

import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import UserAgent from 'user-agents';
import puppeteer from 'puppeteer-extra';
import { Page } from 'puppeteer';

const wantedUrl = 'https://www.nike.com.br/DataLayer/dataLayer';

const workingFunction = async (page: Page) => {
    let reqCount = 0;
    let resCount = 0;

    page.on('request', req => {
        reqCount++;
        if (req.url() == wantedUrl) {
            console.log('The request I need: ', req.url());
            console.log(reqCount);
        }
    });
    page.on('response', async res => {
        resCount++;
        if (res.url() == wantedUrl) {
            console.log('The response I need:', await res.json());
            console.log(resCount);
        }
    });

    await page.goto('https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741', {
        timeout: 0,
    });
};

const notWorkingFunction = async (page: Page) => {
    let resCount = 0;
    await page.goto('https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741');
    const res = await page.waitForResponse(
        res => {
            resCount++;
            console.log(res.url());
            console.log(resCount);
            if (res.url() === wantedUrl) {
                return true;
            }
            return false;
        },
        { timeout: 0 }
    );

    return res;
};

(async () => {
    puppeteer.use(StealthPlugin());
    const browser = await puppeteer.launch({});
    const page = await browser.newPage();
    const userAgent = new UserAgent({ deviceCategory: 'desktop' });
    await page.setUserAgent(userAgent.random().toString());

    try {
        // workingFunction(page);
        const res = await notWorkingFunction(page);
    } catch (e) {
        console.log(e);
    }
})();
Share Improve this question edited Mar 17, 2022 at 14:08 mtbossa asked Mar 16, 2022 at 19:29 mtbossamtbossa 931 gold badge2 silver badges8 bronze badges 0
Add a comment  | 

1 Answer 1

Reset to default 18

The reason the page.on version works is because it sets the request/response handlers before performing navigation. On the other hand, the waitForResponse version waits until the "load" event fires (page.goto()'s default resolution point), and only then starts tracking responses with the call to page.waitForResponse. MDN says of the load event:

The load event is fired when the whole page has loaded, including all dependent resources such as stylesheets and images. This is in contrast to DOMContentLoaded, which is fired as soon as the page DOM has been loaded, without waiting for resources to finish loading.

Based on this, we can infer that by the time the load event fires and the waitForResponse function finally starts listening to traffic, it's already missed the desired response, so it just waits forever!

The solution is to create the promise for page.waitForResponse before (or at the same time as) the goto call such that no traffic is missed when you kick off navigation.

I also suggest using "domcontentloaded" on the goto call. "domcontentloaded" is underused in Puppeteer -- there's no sense in waiting for all resources to arrive when you're just looking for one. The default "load" or often-used "networkidleN" settings are better for use cases like screenshotting the page where you want the whole thing to look like it does as a user would see it. To be clear, this isn't the fix to the problem, just an optimization, and it's not too apparent from the docs which is suitable when.

Here's a minimal example (I used JS, not TS):

const puppeteer = require("puppeteer-extra"); // ^3.2.3
const StealthPlugin = require("puppeteer-extra-plugin-stealth"); // ^2.9.0
const UserAgent = require("user-agents"); // ^1.0.958

puppeteer.use(StealthPlugin());

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  const userAgent = new UserAgent({deviceCategory: "desktop"});
  await page.setUserAgent(userAgent.random().toString());
  const url = "https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741";
  const wantedUrl = "https://www.nike.com.br/DataLayer/dataLayer";
  const [res] = await Promise.all([
    page.waitForResponse(res => res.url() === wantedUrl, {timeout: 90_000}),
    page.goto(url, {waitUntil: "domcontentloaded"}),
  ]);
  console.log(await res.json());
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

(Note that the site has changed since the time this was posted--the code no longer works, but the fundamental ideas still apply)

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论