最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Cant take a Screenshot using Crawl4ai - Stack Overflow

programmeradmin4浏览0评论

I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I try to do it I get an error or I don't get anything.

Here is the code I used that is the same from their own documentation:

import os, asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode

async def main():
  async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(
        url=";,
        cache_mode=CacheMode.BYPASS,
        pdf=True,
        screenshot=True
    )

    if result.success:
        # Save screenshot
        if result.screenshot:
            with open("wikipedia_screenshot.png", "wb") as f:
                f.write(b64decode(result.screenshot))

        # Save PDF
        if result.pdf:
            with open("wikipedia_page.pdf", "wb") as f:
                f.write(result.pdf)

        print("[OK] PDF & screenshot captured.")
    else:
        print("[ERROR]", result.error_message)

if __name__ == "__main__":
   asyncio.run(main())

And the error that I get:

Error: crawl4ai.async_webcrawler.AsyncWebCrawler.aprocess_html() got multiple values for keyword argument 'screenshot'

I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I try to do it I get an error or I don't get anything.

Here is the code I used that is the same from their own documentation:

import os, asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode

async def main():
  async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(
        url="https://en.wikipedia./wiki/List_of_common_misconceptions",
        cache_mode=CacheMode.BYPASS,
        pdf=True,
        screenshot=True
    )

    if result.success:
        # Save screenshot
        if result.screenshot:
            with open("wikipedia_screenshot.png", "wb") as f:
                f.write(b64decode(result.screenshot))

        # Save PDF
        if result.pdf:
            with open("wikipedia_page.pdf", "wb") as f:
                f.write(result.pdf)

        print("[OK] PDF & screenshot captured.")
    else:
        print("[ERROR]", result.error_message)

if __name__ == "__main__":
   asyncio.run(main())

And the error that I get:

Error: crawl4ai.async_webcrawler.AsyncWebCrawler.aprocess_html() got multiple values for keyword argument 'screenshot'

Share Improve this question edited Mar 25 at 7:04 Basheer Jarrah 5713 silver badges16 bronze badges asked Mar 24 at 19:49 BernardoBernardo 411 silver badge7 bronze badges 3
  • Aside answer, depending the environment of course: using firefox --screenshot https://en.wikipedia./wiki/List_of_common_misconceptions. Calling that from python would require import subprocess. Just an idea, there is many alternatives. – NVRM Commented Mar 25 at 7:10
  • I am not sure why it's so large, but you could from this simply compress it to .jpg in your script, so it's automatic. You get 1Mb picture or so, Then you delete the original to save space. This might not be the best way if you have limited power, but doing this locally should be almost instantaneous. There is so many ways, find what is best for your use case, this is how we build our tooling. Later you will appreciate having spent time on this, it will be easier. GL – NVRM Commented Mar 25 at 19:56
  • Gotta love ghost accounts. – NVRM Commented 2 days ago
Add a comment  | 

1 Answer 1

Reset to default 0

I have tested your script, and it mostly work, on my machine. It's possible that your Crawl4ai browser setup is misconfigured.

Using that particular URL (a long scrolling page), yes, the output PNG is 154Mb, that's way too much for this format to be opened.

I have added a compression step using pillow (pip install pillow), and changed the output format to JPEG, with quite high 20% compression (quality=20), this results in a fairly usable 4.8MB JPEG.

Going lower in compression is very destructive, but you could try to adjust that quality.

#!/usr/bin/python
import os
import asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode
from PIL import Image
import io

async def main():
    save_dir = os.path.dirname("wikipedia_screenshot.png")
    if save_dir and not os.path.exists(save_dir):
        os.makedirs(save_dir)

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://en.wikipedia./wiki/List_of_common_misconceptions",
            cache_mode=CacheMode.BYPASS,
            pdf=True,
            screenshot=True
        )

        if result.success:
            if result.screenshot:
                image_data = b64decode(result.screenshot)
                image = Image.open(io.BytesIO(image_data))
                compressed_image = image.convert("RGB")
                compressed_image_io = io.BytesIO()
                compressed_image.save(compressed_image_io, format='JPEG', optimize=True, quality=20)
                compressed_image_io.seek(0)
                with open("wikipedia_screenshot.jpg", "wb") as f:
                    f.write(compressed_image_io.getvalue())
                print("[OK] Screenshot saved successfully (compressed).")


            if result.pdf:
                with open("wikipedia_page.pdf", "wb") as f:
                    f.write(result.pdf)
                print("[OK] PDF saved successfully.")

            print("[OK] PDF & compressed screenshot captured.")
        else:
            print(f"[ERROR] Failed to retrieve content: {result.error_message}")

if __name__ == "__main__":
    asyncio.run(main())
发布评论

评论列表(0)

  1. 暂无评论