最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

node.js - Bypassing Cloudflare with Puppeteer and FlareSolver - Stack Overflow

programmeradmin1浏览0评论

In the past few weeks, we have been working on web scraping on /. Initially, we used only Puppeteer, but quite often, the browser encountered a Cloudflare challenge page displaying the message:

"Waiting for the website to respond."

To overcome this, we tried several alternative approaches:

  • Puppeteer + rotating proxy
  • puppeteer-extra-plugin-stealth
  • Puppeteer-real-browser + rotating proxy
  • Puppeteer-real-browser + FlareSolverr pre-request + rotating proxy

Current Approach

To address the issue, we decided to make a pre-request to the site using FlareSolverr. We then extracted the obtained cookies and user agent and passed them to Puppeteer for browser navigation.

However, we encountered two key issues:

FlareSolverr fails to solve the Cloudflare challenge

  • When FlareSolverr detects a Cloudflare challenge, it fails to bypass it, logging the error:

Error solving the challenge. Timeout after X seconds.

  • When FlareSolverr does not detect a challenge and successfully completes the pre-request, we extract the cookies and user agent, set them in Puppeteer, and navigate to the target page.

However, Puppeteer still encounters the Cloudflare challenge page. This suggests that FlareSolverr might not be detecting the challenge properly and, therefore, does not retrieve the necessary cookies.

Question What are we doing wrong? It seems that FlareSolverr reduces the likelihood of hitting the challenge but fails when it actually encounters one.

What would be the best approach to ensure Puppeteer can bypass Cloudflare protection?

Code Snippet typescript

let flaresolverrData: any;
let attempts = 0;
const maxAttempts = 5;

// pre request to flaresolverr, if it fails we try again.
while (attempts < maxAttempts) {
  try {
    let response = await fetch("http://localhost:8191/v1", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        cmd: "request.get",
        url: actionParams.url,
        maxTimeout: 5000,
        session: flaresolverrSessionId,
      }),
    });

    flaresolverrData = await response.json();

    if (flaresolverrData?.status === "error") {
      throw new Error(flaresolverrData?.message ?? "FlareSolverr error");
    }

    break;
  } catch (error: any) {
    attempts++;
    if (attempts === maxAttempts) {
      throw new Error(`Failed to fetch after ${attempts} attempts: ${error.message}`);
    }
    await new Promise((resolve) => setTimeout(resolve, 1000)); // Wait 1 second between retries
  }
}

if (!flaresolverrData) throw new Error("FlareSolverr data not found");

const cookies = flaresolverrData.solution.cookies;
const userAgent = flaresolverrData.solution.userAgent;

if (!userAgent) throw new Error("User agent not found");

if (cookies.length !== 0) {
  await browser.setCookie(
    ...cookies.map((cookie: any) => ({ ...cookie, expires: cookie?.expiry ?? 0 }))
  );
}

await browserPage.setUserAgent(userAgent);

await delay(Math.random() * 10000 + 1000);
await browserPage.goto(actionParams.url, { waitUntil: "networkidle0" });

const content = await browserPage.content();
return content;
发布评论

评论列表(0)

  1. 暂无评论