i'm performing data crawling on a webpage using selenium. this is my code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from time import sleep
import tempfile
from fake_useragent import UserAgent
ua = UserAgent(browsers=['Chrome'], os=['Linux'], platforms=['desktop'])
CHROMEDRIVER_PORT = 9515
chrome_service = Service(executable_path="/path/to/chromedriver_linux64", log_output="/tmp/chromedriver.log", port=CHROMEDRIVER_PORT)
chrome_options = Options()
chrome_options.binary_location = "/path/to/google-chrome"
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--dns-prefetch-disable")
chrome_options.add_argument(f"user-agent={ua.random}")
driver = webdriver.Chrome(service=chrome_service, options=chrome_options)
print("Chrome Browser Invoked")
driver.get("<url>")
sleep(2)
print("Page Title:", driver.title)
driver.quit()
i run this code on my local with internet, it completely fine but on my server (no GUI, no internet), it get timeout.
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9515): Read timed out. (read timeout=120)
in general:
- delete the user-agent -> run but cannot scan the webpage
- keep the user-agent -> got timeout
i'm performing data crawling on a webpage using selenium. this is my code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from time import sleep
import tempfile
from fake_useragent import UserAgent
ua = UserAgent(browsers=['Chrome'], os=['Linux'], platforms=['desktop'])
CHROMEDRIVER_PORT = 9515
chrome_service = Service(executable_path="/path/to/chromedriver_linux64", log_output="/tmp/chromedriver.log", port=CHROMEDRIVER_PORT)
chrome_options = Options()
chrome_options.binary_location = "/path/to/google-chrome"
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--dns-prefetch-disable")
chrome_options.add_argument(f"user-agent={ua.random}")
driver = webdriver.Chrome(service=chrome_service, options=chrome_options)
print("Chrome Browser Invoked")
driver.get("<url>")
sleep(2)
print("Page Title:", driver.title)
driver.quit()
i run this code on my local with internet, it completely fine but on my server (no GUI, no internet), it get timeout.
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9515): Read timed out. (read timeout=120)
in general:
- delete the user-agent -> run but cannot scan the webpage
- keep the user-agent -> got timeout
- What exactly do you mean by "no internet" ? – margusl Commented Mar 24 at 10:39
- @margusl only allow to connect to specific domain to crawl – midmash36 Commented Mar 25 at 11:33
1 Answer
Reset to default 0This isn’t really about the user-agent, the root issue is that your server doesn’t have outbound internet access, so driver.get()
hangs trying to fetch the page. The ChromeDriver timeout is a side effect of that failed network call.
A few quick tips:
Remove the custom
port=9515
unless there’s a specific reason — Selenium will manage it just fine.Make sure Chrome and ChromeDriver versions are compatible.
Check if headless Chrome runs correctly in isolation using
--headless --dump-dom
to test page load manually.Look into DNS or firewall issues — outbound HTTP(S) needs to be open.
If your server setup is locked down (e.g., air-gapped, strict egress rules), you won’t get far running a real browser. In those cases, it’s better to offload crawling to external services. Something like Crawlbase, Apify, zyte, or any other popular platform works well — it handles headless browsing and scraping via API, so you don’t need to run browsers on your own infrastructure.