I am a social scientist with very limited background on computational methods and Python (I am mostly self taught). This is my first time posting here, so please bear with me if I misuse any technical terms or my descriptions are verbose.
tldr: I want to scrape news articles from a website using Selenium and ChromeDriver in Google Colab. I need to use undetected-chromedriver to avoid detection by anti-bot mechanisms. But the script that I used earlier is no longer working, and I've not been able to fix it for weeks now. According to AI, the problem might be related to "a TypeError related to the executable_path argument, which seems to be a compatibility issue between undetected-chromedriver and Selenium 4.10.0."
Now some details:
I'm trying to scrape news texts from a website that does not allow me to use Beautiful Soup that I was able to use for other sites and blocks bots. About a month ago, I finally managed to run a script in Google Colab, which successfully scraped some data and stored them as .txt files overnight on my drive before it stopped. (This happened despite the activated Caffeine extension, so maybe Colab kicked me out at some point? This was my first time using Colab, so I might be wrong).
I’m using Google Colab with the following packages:
selenium==4.10.0 undetected-chromedriver (version 3.5.5)
# Install required packages
!pip install selenium==4.10.0 undetected-chromedriver
# Download and set up Chrome and ChromeDriver
!wget -q -O /tmp/chrome-linux64.zip .0.5790.102/linux64/chrome-linux64.zip
!unzip -o /tmp/chrome-linux64.zip -d /tmp
!mv /tmp/chrome-linux64/chrome /usr/local/bin/chrome
!chmod +x /usr/local/bin/chrome
!chrome --version
!wget -q -O /tmp/chromedriver-linux64.zip .0.5790.102/linux64/chromedriver-linux64.zip
!unzip -o /tmp/chromedriver-linux64.zip -d /tmp
!mv /tmp/chromedriver-linux64/chromedriver /usr/local/bin/chromedriver
!chmod +x /usr/local/bin/chromedriver
!chromedriver --version
import undetected_chromedriver as uc
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
import time
import csv
def setup_driver():
options = uc.ChromeOptions()
options.headless = False # Set to True for headless mode
driver = uc.Chrome(options=options)
return driver
def scrape_news():
driver = setup_driver()
try:
url = "/"
driver.get(url)
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, 'a.news-page__card'))
)
print("Page loaded successfully.")
finally:
driver.quit()
if __name__ == '__main__':
scrape_news()
Anyway, after that initial successful run, I've tried to run it again, but it has not worked ever since, due to persistent compatibility issues between Selenium, ChromeDriver, and the undetected-chromedriver.
I've also tried to manually install lower versions of Chrome and ChromeDriver using wget and unzip, but that did not work either (or maybe I'm not doing it properly?)
I've also tried using google-colab-selenium package to simplify the setup, but it resulted in errors related to the DriverFinder module, which returned the following error. I've also tried downgrading Selenium to 4.6.0 and 4.10.0 to ensure compatibility with google-colab-selenium, but the issue persists:
Requirement already satisfied: h11<1,>=0.9.0 in /usr/local/lib/python3.11/dist-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium->google-colab-selenium[undetected]) (0.14.0)
Downloading google_colab_selenium-1.0.14-py3-none-any.whl (8.2 kB)
Installing collected packages: google-colab-selenium
Successfully installed google-colab-selenium-1.0.14
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-c1a995ad582d> in <cell line: 0>()
173
174 if __name__ == '__main__':
--> 175 scrape_news()
4 frames
/usr/local/lib/python3.11/dist-packages/google_colab_selenium/colab_selenium_manager.py in <module>
9 from selenium.webdriver.chrome.service import Service
10 from selenium.webdriver.chrome.options import Options
---> 11 from selenium.webdrivermon.driver_finder import DriverFinder
12
13
ModuleNotFoundError: No module named 'selenium.webdrivermon.driver_finder'
---------------------------------------------------------------------------
Is there a way to resolve the ModuleNotFoundError: No module named 'selenium.webdrivermon.driver_finder' error when using google-colab-selenium? Or, is there a better way to set up Selenium and ChromeDriver in Google Colab that avoids these compatibility issues in general? If there are no solutions, what are some alternative tools or libraries I can use for web scraping in Colab that are more reliable? I am quite stuck, and all suggestions and help are very welcome.
I am a social scientist with very limited background on computational methods and Python (I am mostly self taught). This is my first time posting here, so please bear with me if I misuse any technical terms or my descriptions are verbose.
tldr: I want to scrape news articles from a website using Selenium and ChromeDriver in Google Colab. I need to use undetected-chromedriver to avoid detection by anti-bot mechanisms. But the script that I used earlier is no longer working, and I've not been able to fix it for weeks now. According to AI, the problem might be related to "a TypeError related to the executable_path argument, which seems to be a compatibility issue between undetected-chromedriver and Selenium 4.10.0."
Now some details:
I'm trying to scrape news texts from a website that does not allow me to use Beautiful Soup that I was able to use for other sites and blocks bots. About a month ago, I finally managed to run a script in Google Colab, which successfully scraped some data and stored them as .txt files overnight on my drive before it stopped. (This happened despite the activated Caffeine extension, so maybe Colab kicked me out at some point? This was my first time using Colab, so I might be wrong).
I’m using Google Colab with the following packages:
selenium==4.10.0 undetected-chromedriver (version 3.5.5)
# Install required packages
!pip install selenium==4.10.0 undetected-chromedriver
# Download and set up Chrome and ChromeDriver
!wget -q -O /tmp/chrome-linux64.zip https://edgedl.me.gvt1/edgedl/chrome/chrome-for-testing/115.0.5790.102/linux64/chrome-linux64.zip
!unzip -o /tmp/chrome-linux64.zip -d /tmp
!mv /tmp/chrome-linux64/chrome /usr/local/bin/chrome
!chmod +x /usr/local/bin/chrome
!chrome --version
!wget -q -O /tmp/chromedriver-linux64.zip https://edgedl.me.gvt1/edgedl/chrome/chrome-for-testing/115.0.5790.102/linux64/chromedriver-linux64.zip
!unzip -o /tmp/chromedriver-linux64.zip -d /tmp
!mv /tmp/chromedriver-linux64/chromedriver /usr/local/bin/chromedriver
!chmod +x /usr/local/bin/chromedriver
!chromedriver --version
import undetected_chromedriver as uc
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
import time
import csv
def setup_driver():
options = uc.ChromeOptions()
options.headless = False # Set to True for headless mode
driver = uc.Chrome(options=options)
return driver
def scrape_news():
driver = setup_driver()
try:
url = "https://www.akparti..tr/haberler/"
driver.get(url)
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, 'a.news-page__card'))
)
print("Page loaded successfully.")
finally:
driver.quit()
if __name__ == '__main__':
scrape_news()
Anyway, after that initial successful run, I've tried to run it again, but it has not worked ever since, due to persistent compatibility issues between Selenium, ChromeDriver, and the undetected-chromedriver.
I've also tried to manually install lower versions of Chrome and ChromeDriver using wget and unzip, but that did not work either (or maybe I'm not doing it properly?)
I've also tried using google-colab-selenium package to simplify the setup, but it resulted in errors related to the DriverFinder module, which returned the following error. I've also tried downgrading Selenium to 4.6.0 and 4.10.0 to ensure compatibility with google-colab-selenium, but the issue persists:
Requirement already satisfied: h11<1,>=0.9.0 in /usr/local/lib/python3.11/dist-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium->google-colab-selenium[undetected]) (0.14.0)
Downloading google_colab_selenium-1.0.14-py3-none-any.whl (8.2 kB)
Installing collected packages: google-colab-selenium
Successfully installed google-colab-selenium-1.0.14
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-c1a995ad582d> in <cell line: 0>()
173
174 if __name__ == '__main__':
--> 175 scrape_news()
4 frames
/usr/local/lib/python3.11/dist-packages/google_colab_selenium/colab_selenium_manager.py in <module>
9 from selenium.webdriver.chrome.service import Service
10 from selenium.webdriver.chrome.options import Options
---> 11 from selenium.webdrivermon.driver_finder import DriverFinder
12
13
ModuleNotFoundError: No module named 'selenium.webdrivermon.driver_finder'
---------------------------------------------------------------------------
Is there a way to resolve the ModuleNotFoundError: No module named 'selenium.webdrivermon.driver_finder' error when using google-colab-selenium? Or, is there a better way to set up Selenium and ChromeDriver in Google Colab that avoids these compatibility issues in general? If there are no solutions, what are some alternative tools or libraries I can use for web scraping in Colab that are more reliable? I am quite stuck, and all suggestions and help are very welcome.
Share Improve this question asked Mar 10 at 12:10 Hedda GablerHedda Gabler 112 bronze badges1 Answer
Reset to default 0Your issue is due to compatibility problems between Selenium 4.10.0 and undetected-chromedriver
. Here’s a reliable way to set up Selenium and ChromeDriver in Google Colab:
1. Install Required Packages
!pip install selenium==4.9.1 undetected-chromedriver==3.4.6
2. Setup Chrome & ChromeDriver
!apt-get update
!apt-get install -y unzip wget
!wget -q -O /tmp/chromedriver.zip https://chromedriver.storage.googleapis/114.0.5735.90/chromedriver_linux64.zip
!unzip -o /tmp/chromedriver.zip -d /usr/local/bin/
!chmod +x /usr/local/bin/chromedriver
!wget -q -O /tmp/google-chrome-stable_current_amd64.deb https://dl.google/linux/direct/google-chrome-stable_current_amd64.deb
!dpkg -i /tmp/google-chrome-stable_current_amd64.deb
!apt-get -fy install
3. Use undetected-chromedriver
import undetected_chromedriver as uc
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def setup_driver():
options = uc.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
driver = uc.Chrome(options=options, version_main=114) # Ensure compatibility
return driver
def scrape_news():
driver = setup_driver()
try:
url = "https://www.akparti..tr/haberler/"
driver.get(url)
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, 'a.news-page__card'))
)
print("Page loaded successfully.")
finally:
driver.quit()
if __name__ == '__main__':
scrape_news()
Why This Works?
Downgrades Selenium to 4.9.1 (last stable version with
undetected-chromedriver 3.4.6
).Installs ChromeDriver version 114 (compatible with Selenium and undetected-chromedriver).
Uses
version_main=114
inuc.Chrome()
to ensure compatibility.Avoids
google-colab-selenium
, which has module issues.
Alternative to Selenium?
Scrapy (Better for large-scale scraping)
Playwright (More reliable anti-bot evasion)
Requests + BeautifulSoup (If JS rendering is not needed)
I bet this will fix your issue!