I am using puppeteer to scrape a website for real time data in nodejs. Instead of scraping the page, I am watching the backend requests and capturing the JSON/TEXT responses so I have more structured data. (and seeing more data than what's being displayed in the browser) Everything is working except some of the data is updated with a Firestore request. I can capture the response to that request, but only get the data when the request is complete.
If I monitor the request/response in the browser, I can see that there are several numbered "messages" in the response packet:
16 [[732,["noop"]]] 16 [[733,["noop"]]] 123 [[734,[{ "targetChange": { "resumeToken": "CgkIgqb+n7TFiwM=", "readTime": "2025-02-15T09:53:39.558146Z" } } ]]] 16 [[735,["noop"]]]
with each message coming in over several seconds. At some point the request completes and a new firebase request is issued. The problem is I only see all these messages in my app once the response is complete and not in real time as each message comes in. (which my app requires in order to display changes in real time)
Is there a way to see the response data as it is received and not just when the request is completed?
page.on('response', async (response) => {
if (response.request().resourceType() === 'xhr') {
console.log('Firestore Response URL:', response.url());
const theResponse = await response.text();
console.log('response.text: ', theResponse);
}
}
I am using puppeteer to scrape a website for real time data in nodejs. Instead of scraping the page, I am watching the backend requests and capturing the JSON/TEXT responses so I have more structured data. (and seeing more data than what's being displayed in the browser) Everything is working except some of the data is updated with a Firestore request. I can capture the response to that request, but only get the data when the request is complete.
If I monitor the request/response in the browser, I can see that there are several numbered "messages" in the response packet:
16 [[732,["noop"]]] 16 [[733,["noop"]]] 123 [[734,[{ "targetChange": { "resumeToken": "CgkIgqb+n7TFiwM=", "readTime": "2025-02-15T09:53:39.558146Z" } } ]]] 16 [[735,["noop"]]]
with each message coming in over several seconds. At some point the request completes and a new firebase request is issued. The problem is I only see all these messages in my app once the response is complete and not in real time as each message comes in. (which my app requires in order to display changes in real time)
Is there a way to see the response data as it is received and not just when the request is completed?
page.on('response', async (response) => {
if (response.request().resourceType() === 'xhr') {
console.log('Firestore Response URL:', response.url());
const theResponse = await response.text();
console.log('response.text: ', theResponse);
}
}
Share
Improve this question
asked Feb 15 at 13:26
EdEEdE
211 silver badge1 bronze badge
2
- Probably not possible since Firestore pipelines batches of query data over a single socket connection, which puppeteer probably views as a single xhr request/response cycle rather than multiple separate requests. It sounds like you're hoping that puppeteer would understand how to separate out the results of each of those queries, and that's asking a bit much of it since this is highly customized use of a connection and isn't "standard" web tech. – Doug Stevenson Commented Feb 15 at 13:36
- There should be a cdp event for chunks of response data but I haven't tried it and it's probably a pain to work with. – pguardiario Commented Feb 18 at 3:03
1 Answer
Reset to default 0The following request intercept handler can be used to record each chunk of the responses to certain requests as it arrives (while also passing the completed response to the browser). You can process the recorded chunks as you need, in your case by extracting the numbered messages from a Firestore response. This example simply logs the chunks.
Doug Stevenson's warning still applies: The chunks that are recorded for one request may belong to several queries, they arrive just as Firestore sends them over the connection.
await page.setRequestInterception(true);
page.on("request", async function (request) {
if («condition on the request») {
let body;
if (request.hasPostData()) {
body = request.postData();
if (body === undefined) body = await request.fetchPostData();
}
const response = await fetch(request.url(), {
method: request.method(),
headers: request.headers(),
body
});
const chunks = [];
stream.Readable.fromWeb(response.body)
.on("data", function (chunk) {
console.log(chunk.toString()); // or some other kind of processing
chunks.push(chunk);
})
.on("end", function () {
const headers = {};
for (const [h, v] of response.headers.entries()) headers[h] = v;
request.respond({
status: response.status,
headers,
body: Buffer.concat(chunks)
});
});
} else request.continue();
});
For example, the response to https://httpbin./drip consists of 10 asterisks that are sent one by one. If puppeteer loads this URL with a request that fulfills the «condition on the request» (in your case request.resourceType() === "xhr"
), you will see the chunks in the browser's network trace as well as on your console.
What I could not achieve is passing the response chunks to the browser one by one. Puppeteer's HTTPRequest.respond()
methods only allows sending the complete response to an intercepted request. This means that your processing sees the Firestore response chunks earlier than your browser does.