web scraping - Selenium and ChromeDriver Setup Issue in Google Colab: Compatibility Problems with undetected-chromedriver (and n

I am a social scientist with very limited background on computational methods and Python (I am mostly self taught). This is my first time posting here, so please bear with me if I misuse any technical terms or my descriptions are verbose.

tldr: I want to scrape news articles from a website using Selenium and ChromeDriver in Google Colab. I need to use undetected-chromedriver to avoid detection by anti-bot mechanisms. But the script that I used earlier is no longer working, and I've not been able to fix it for weeks now. According to AI, the problem might be related to "a TypeError related to the executable_path argument, which seems to be a compatibility issue between undetected-chromedriver and Selenium 4.10.0."

Now some details:

I'm trying to scrape news texts from a website that does not allow me to use Beautiful Soup that I was able to use for other sites and blocks bots. About a month ago, I finally managed to run a script in Google Colab, which successfully scraped some data and stored them as .txt files overnight on my drive before it stopped. (This happened despite the activated Caffeine extension, so maybe Colab kicked me out at some point? This was my first time using Colab, so I might be wrong).

I’m using Google Colab with the following packages:

selenium==4.10.0 undetected-chromedriver (version 3.5.5)

# Install required packages
!pip install selenium==4.10.0 undetected-chromedriver

# Download and set up Chrome and ChromeDriver
!wget -q -O /tmp/chrome-linux64.zip     .0.5790.102/linux64/chrome-linux64.zip
!unzip -o /tmp/chrome-linux64.zip -d /tmp
!mv /tmp/chrome-linux64/chrome /usr/local/bin/chrome
!chmod +x /usr/local/bin/chrome
!chrome --version

!wget -q -O /tmp/chromedriver-linux64.zip .0.5790.102/linux64/chromedriver-linux64.zip
!unzip -o /tmp/chromedriver-linux64.zip -d /tmp
!mv /tmp/chromedriver-linux64/chromedriver /usr/local/bin/chromedriver
!chmod +x /usr/local/bin/chromedriver
!chromedriver --version

import undetected_chromedriver as uc
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
import time
import csv

def setup_driver():
    options = uc.ChromeOptions()
    options.headless = False  # Set to True for headless mode
    driver = uc.Chrome(options=options)
    return driver

def scrape_news():
    driver = setup_driver()
    try:
        url = "/"
        driver.get(url)
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, 'a.news-page__card'))
        )
        print("Page loaded successfully.")
    finally:
        driver.quit()

if __name__ == '__main__':
    scrape_news()

Anyway, after that initial successful run, I've tried to run it again, but it has not worked ever since, due to persistent compatibility issues between Selenium, ChromeDriver, and the undetected-chromedriver.

I've also tried to manually install lower versions of Chrome and ChromeDriver using wget and unzip, but that did not work either (or maybe I'm not doing it properly?)

I've also tried using google-colab-selenium package to simplify the setup, but it resulted in errors related to the DriverFinder module, which returned the following error. I've also tried downgrading Selenium to 4.6.0 and 4.10.0 to ensure compatibility with google-colab-selenium, but the issue persists:

Requirement already satisfied: h11<1,>=0.9.0 in /usr/local/lib/python3.11/dist-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium->google-colab-selenium[undetected]) (0.14.0)
Downloading google_colab_selenium-1.0.14-py3-none-any.whl (8.2 kB)
Installing collected packages: google-colab-selenium
Successfully installed google-colab-selenium-1.0.14
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-c1a995ad582d> in <cell line: 0>()
    173 
    174 if __name__ == '__main__':
--> 175     scrape_news()

4 frames
/usr/local/lib/python3.11/dist-packages/google_colab_selenium/colab_selenium_manager.py in <module>
      9 from selenium.webdriver.chrome.service import Service
     10 from selenium.webdriver.chrome.options import Options
---> 11 from selenium.webdrivermon.driver_finder import DriverFinder
     12 
     13 

ModuleNotFoundError: No module named 'selenium.webdrivermon.driver_finder'

---------------------------------------------------------------------------

Is there a way to resolve the ModuleNotFoundError: No module named 'selenium.webdrivermon.driver_finder' error when using google-colab-selenium? Or, is there a better way to set up Selenium and ChromeDriver in Google Colab that avoids these compatibility issues in general? If there are no solutions, what are some alternative tools or libraries I can use for web scraping in Colab that are more reliable? I am quite stuck, and all suggestions and help are very welcome.

Now some details:

I’m using Google Colab with the following packages:

selenium==4.10.0 undetected-chromedriver (version 3.5.5)

# Install required packages
!pip install selenium==4.10.0 undetected-chromedriver

# Download and set up Chrome and ChromeDriver
!wget -q -O /tmp/chrome-linux64.zip     https://edgedl.me.gvt1/edgedl/chrome/chrome-for-testing/115.0.5790.102/linux64/chrome-linux64.zip
!unzip -o /tmp/chrome-linux64.zip -d /tmp
!mv /tmp/chrome-linux64/chrome /usr/local/bin/chrome
!chmod +x /usr/local/bin/chrome
!chrome --version

!wget -q -O /tmp/chromedriver-linux64.zip https://edgedl.me.gvt1/edgedl/chrome/chrome-for-testing/115.0.5790.102/linux64/chromedriver-linux64.zip
!unzip -o /tmp/chromedriver-linux64.zip -d /tmp
!mv /tmp/chromedriver-linux64/chromedriver /usr/local/bin/chromedriver
!chmod +x /usr/local/bin/chromedriver
!chromedriver --version

import undetected_chromedriver as uc
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
import time
import csv

def setup_driver():
    options = uc.ChromeOptions()
    options.headless = False  # Set to True for headless mode
    driver = uc.Chrome(options=options)
    return driver

def scrape_news():
    driver = setup_driver()
    try:
        url = "https://www.akparti..tr/haberler/"
        driver.get(url)
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, 'a.news-page__card'))
        )
        print("Page loaded successfully.")
    finally:
        driver.quit()

if __name__ == '__main__':
    scrape_news()

I've also tried to manually install lower versions of Chrome and ChromeDriver using wget and unzip, but that did not work either (or maybe I'm not doing it properly?)

Requirement already satisfied: h11<1,>=0.9.0 in /usr/local/lib/python3.11/dist-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium->google-colab-selenium[undetected]) (0.14.0)
Downloading google_colab_selenium-1.0.14-py3-none-any.whl (8.2 kB)
Installing collected packages: google-colab-selenium
Successfully installed google-colab-selenium-1.0.14
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-c1a995ad582d> in <cell line: 0>()
    173 
    174 if __name__ == '__main__':
--> 175     scrape_news()

4 frames
/usr/local/lib/python3.11/dist-packages/google_colab_selenium/colab_selenium_manager.py in <module>
      9 from selenium.webdriver.chrome.service import Service
     10 from selenium.webdriver.chrome.options import Options
---> 11 from selenium.webdrivermon.driver_finder import DriverFinder
     12 
     13 

ModuleNotFoundError: No module named 'selenium.webdrivermon.driver_finder'

---------------------------------------------------------------------------

Share Improve this question asked Mar 10 at 12:10 Hedda Gabler 112 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Your issue is due to compatibility problems between Selenium 4.10.0 and undetected-chromedriver. Here’s a reliable way to set up Selenium and ChromeDriver in Google Colab:

1. Install Required Packages

!pip install selenium==4.9.1 undetected-chromedriver==3.4.6

2. Setup Chrome & ChromeDriver

!apt-get update
!apt-get install -y unzip wget
!wget -q -O /tmp/chromedriver.zip https://chromedriver.storage.googleapis/114.0.5735.90/chromedriver_linux64.zip
!unzip -o /tmp/chromedriver.zip -d /usr/local/bin/
!chmod +x /usr/local/bin/chromedriver

!wget -q -O /tmp/google-chrome-stable_current_amd64.deb https://dl.google/linux/direct/google-chrome-stable_current_amd64.deb
!dpkg -i /tmp/google-chrome-stable_current_amd64.deb
!apt-get -fy install

3. Use undetected-chromedriver

import undetected_chromedriver as uc
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def setup_driver():
    options = uc.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    
    driver = uc.Chrome(options=options, version_main=114)  # Ensure compatibility
    return driver

def scrape_news():
    driver = setup_driver()
    try:
        url = "https://www.akparti..tr/haberler/"
        driver.get(url)
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, 'a.news-page__card'))
        )
        print("Page loaded successfully.")
    finally:
        driver.quit()

if __name__ == '__main__':
    scrape_news()

Why This Works?

Downgrades Selenium to 4.9.1 (last stable version with undetected-chromedriver 3.4.6).
Installs ChromeDriver version 114 (compatible with Selenium and undetected-chromedriver).
Uses version_main=114 in uc.Chrome() to ensure compatibility.
Avoids google-colab-selenium, which has module issues.

Alternative to Selenium?

Scrapy (Better for large-scale scraping)
Playwright (More reliable anti-bot evasion)
Requests + BeautifulSoup (If JS rendering is not needed)

I bet this will fix your issue!

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始