最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Adding user agent in chrome options in selenium - Stack Overflow

programmeradmin5浏览0评论

i'm performing data crawling on a webpage using selenium. this is my code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from time import sleep
import tempfile
from fake_useragent import UserAgent


ua = UserAgent(browsers=['Chrome'], os=['Linux'], platforms=['desktop'])
CHROMEDRIVER_PORT = 9515

chrome_service = Service(executable_path="/path/to/chromedriver_linux64", log_output="/tmp/chromedriver.log", port=CHROMEDRIVER_PORT)
chrome_options = Options()
chrome_options.binary_location = "/path/to/google-chrome"
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--dns-prefetch-disable")
chrome_options.add_argument(f"user-agent={ua.random}")

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)
print("Chrome Browser Invoked")

driver.get("<url>")
sleep(2)

print("Page Title:", driver.title)

driver.quit()

i run this code on my local with internet, it completely fine but on my server (no GUI, no internet), it get timeout.

urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9515): Read timed out. (read timeout=120)

in general:
- delete the user-agent -> run but cannot scan the webpage
- keep the user-agent -> got timeout

i'm performing data crawling on a webpage using selenium. this is my code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from time import sleep
import tempfile
from fake_useragent import UserAgent


ua = UserAgent(browsers=['Chrome'], os=['Linux'], platforms=['desktop'])
CHROMEDRIVER_PORT = 9515

chrome_service = Service(executable_path="/path/to/chromedriver_linux64", log_output="/tmp/chromedriver.log", port=CHROMEDRIVER_PORT)
chrome_options = Options()
chrome_options.binary_location = "/path/to/google-chrome"
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--dns-prefetch-disable")
chrome_options.add_argument(f"user-agent={ua.random}")

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)
print("Chrome Browser Invoked")

driver.get("<url>")
sleep(2)

print("Page Title:", driver.title)

driver.quit()

i run this code on my local with internet, it completely fine but on my server (no GUI, no internet), it get timeout.

urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9515): Read timed out. (read timeout=120)

in general:
- delete the user-agent -> run but cannot scan the webpage
- keep the user-agent -> got timeout

Share Improve this question asked Mar 24 at 7:01 midmash36midmash36 1 2
  • What exactly do you mean by "no internet" ? – margusl Commented Mar 24 at 10:39
  • @margusl only allow to connect to specific domain to crawl – midmash36 Commented Mar 25 at 11:33
Add a comment  | 

1 Answer 1

Reset to default 0

This isn’t really about the user-agent, the root issue is that your server doesn’t have outbound internet access, so driver.get() hangs trying to fetch the page. The ChromeDriver timeout is a side effect of that failed network call.

A few quick tips:

  • Remove the custom port=9515 unless there’s a specific reason — Selenium will manage it just fine.

  • Make sure Chrome and ChromeDriver versions are compatible.

  • Check if headless Chrome runs correctly in isolation using --headless --dump-dom to test page load manually.

  • Look into DNS or firewall issues — outbound HTTP(S) needs to be open.

If your server setup is locked down (e.g., air-gapped, strict egress rules), you won’t get far running a real browser. In those cases, it’s better to offload crawling to external services. Something like Crawlbase, Apify, zyte, or any other popular platform works well — it handles headless browsing and scraping via API, so you don’t need to run browsers on your own infrastructure.

发布评论

评论列表(0)

  1. 暂无评论