最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How can I scrape prices and titles of this website? - Stack Overflow

programmeradmin4浏览0评论

I am not able to scrape the page : and the next pages due to cookies banner I believe or maybe there is something that block the scraping. You can find below my code. Can you help please?

import csv
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager

# -------------------- SETUP SELENIUM --------------------
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)
wait = WebDriverWait(driver, 10)  # Explicit wait for elements

# -------------------- FUNCTION TO CLICK COOKIE POPUP --------------------
def accept_cookies():
    try:
        # Wait for the cookie button & click (Corrected CSS Selector)
        cookie_button = wait.until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, ".c-button.c-button--primary"))
        )
        cookie_button.click()
        print("✅ Cookies accepted!")

        # ⏳ Add short delay after clicking to let the page adjust
        time.sleep(3)

    except Exception:
        print("⚠️ No cookie popup found or already accepted.")

# -------------------- PREPARE CSV FILE --------------------
csv_filename = "cata.csv"

with open(csv_filename, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Product Name", "Price"])  # Write header

    # -------------------- LOOP THROUGH PAGES --------------------
    for page_num in range(1, 2):  # Pages 1 to 13
        url = f"={page_num}"
        driver.get(url)

        # ✅ Accept cookies on every page
        accept_cookies()

        try:
            # ⏳ Wait until products are fully loaded before scraping
            name_elements = wait.until(
                EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".u-typography-h6.c-lot-card__title.u-truncate-2-lines"))
            )

            price_elements = wait.until(
                EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".u-typography-h5.c-lot-card__price"))
            )

            # Extract text and store in CSV
            for name, price in zip(name_elements, price_elements):
                writer.writerow([name.text.strip(), price.text.strip()])

            print(f"✅ Page {page_num} scraped successfully!")

        except Exception as e:
            print(f"⚠️ Error on page {page_num}: {e}")

        # 
发布评论

评论列表(0)

  1. 暂无评论