最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Scraping Amazon Seller Info with Selenium: Can’t Extract Business NamePhone Number - Stack Overflow

programmeradmin0浏览0评论

Problem Description: I’m trying to extract the seller’s business name and phone number from Amazon product pages using Selenium and BeautifulSoup. My code navigates to the seller profile page, but it fails to retrieve the business name and phone number.

Code Attempt:

from selenium import webdriver
from selenium.webdrivermon.by import By
from bs4 import BeautifulSoup
import time

def get_page_content(url):
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    driver = webdriver.Chrome(options=options)
    driver.get(url)
    time.sleep(3)  # Wait for the page to load

    page_content = driver.page_source
    driver.quit()
    return BeautifulSoup(page_content, 'html.parser')

def extract_seller_info(product_url):
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    driver = webdriver.Chrome(options=options)
    driver.get(product_url)
    time.sleep(3)  # Wait for the page to load

    try:
        sold_by_element = driver.find_element(By.ID, 'sellerProfileTriggerId')
        sold_by_url = '' + sold_by_element.get_attribute('href')
        driver.get(sold_by_url)
        time.sleep(3)  # Wait for the page to load

        soup = BeautifulSoup(driver.page_source, 'html.parser')
        business_name = soup.find('div', class_='a-row a-spacing-none').find('span', class_='a-text-bold', text='Business Name:').find_next_sibling('span').text.strip()
        phone_number = soup.find('div', class_='a-row a-spacing-none').find('span', class_='a-text-bold', text='Phone Number:').find_next_sibling('span').text.strip()

        return business_name, phone_number
    except Exception as e:
        print(f"Error extracting seller info: {e}")
        return None, None
    finally:
        driver.quit()

# Function to search for earbuds on Amazon
def search_earbuds():
    url = ";
    soup = get_page_content(url)
    product_urls = [a['href'] for a in soup.select('a.a-link-normal.s-no-outline')]
    return ["; + url for url in product_urls]

# Main function
def main():
    product_urls = search_earbuds()
    for url in product_urls:
        business_name, phone_number = extract_seller_info(url)
        print(f"Product URL: {url}")
        print(f"Business Name: {business_name}")
        print(f"Phone Number: {phone_number}")
        print("-" * 80)

if __name__ == "__main__":
    main()

What Happens:

  • The script navigates to the seller profile page successfully.

  • no errors are thrown, but business_name and phone_number return None.

  • Manually checking the seller page shows the data exists (e.g., in the "Business Details" section).

Specific Questions:

  • Are my selectors outdated for Amazon’s seller profile page?
  • How can I reliably locate the "Business Name" and "Phone Number"
    fields?
  • Is there a permission/anti-scraping mechanism blocking this data?
发布评论

评论列表(0)

  1. 暂无评论