javascript - Get element from website with python without opening a browser

I'm trying to write a python script which parses one element from a website and simply prints it.

I couldn't figure out how to achieve this, without selenium's webdiver, in order to open a browser which handles the scripts to properly display the website.

from selenium import webdriver
browser = webdriver.Firefox()
browser.get('.shtml#!product/910000800509')
content = browser.page_source
print(content[42000:43000])
browser.close()

This is just a rough draft which will print the contents, including the element of interest <span class="prod-price-inner">£13.00</span>.

How could I get the element of interest without the browser opening, or even without a browser at all?

edit: I've previously tried to use urllib or in bash wget, which both lack the required javascript interpretation.

I'm trying to write a python script which parses one element from a website and simply prints it.

I couldn't figure out how to achieve this, without selenium's webdiver, in order to open a browser which handles the scripts to properly display the website.

from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://groceries.asda./asda-webstore/pages/landing/home.shtml#!product/910000800509')
content = browser.page_source
print(content[42000:43000])
browser.close()

This is just a rough draft which will print the contents, including the element of interest <span class="prod-price-inner">£13.00</span>.

How could I get the element of interest without the browser opening, or even without a browser at all?

edit: I've previously tried to use urllib or in bash wget, which both lack the required javascript interpretation.

Share Improve this question edited Oct 13, 2015 at 0:31 asked Oct 13, 2015 at 0:19 boolean.is.null 8815 gold badges12 silver badges20 bronze badges

I'm planning to create a small Python script. – boolean.is.null Commented Oct 13, 2015 at 0:30
Ok, I'm working on it :) I'll post my answer in a bit. Just to make sure I got it right, You need the price element, right ? – Pedro Lobito Commented Oct 13, 2015 at 0:32
1 You want to hide the browser? Duplicate of stackoverflow./questions/5370762/… – RobertB Commented Oct 13, 2015 at 0:36
In the meanwhile, you can take a look at crummy./software/BeautifulSoup/bs4/doc, to install use pip install BeautifulSoup4 – Pedro Lobito Commented Oct 13, 2015 at 0:51
You can only parse that page with a browser. The page doesn't display anything if javascript isn't enabled. Selenium is the way to go. – Pedro Lobito Commented Oct 13, 2015 at 0:57

Add a ment |

2 Answers 2

Sorted by: Reset to default 2

As other answers mentioned, this webpage requires javascript to render content, so you can't simply get and process the page with lxml, Beautiful Soup, or similar library. But there's a much simpler way to get the information you want.

I noticed that the link you provided fetches data from an internal API in a structured fashion. It appears that the product number is 910000800509 based on the url. If you look at the networking tab in Chrome dev tools (or your brower's equivalent dev tools), you'll see that a GET request is being made to following URL: http://groceries.asda./api/items/view?itemid=910000800509.

You can make the request like this with just the json and requests modules:

import json
import requests

url = 'http://groceries.asda./api/items/view?itemid=910000800509'
r = requests.get(url)
price = r.json()['items'][0]['price']

print price
£13.00

This also gives you access to lots of other information about the product, since the request returns some JSON with product details.

How could I get the element of interest without the browser opening, or even without a browser at all?

After inspecting the page you're trying to parse :

http://groceries.asda./asda-webstore/pages/landing/home.shtml#!product/910000800509

I realized that it only displays the content if javascript is enabled, based on that, you need to use a real browser.

Conclusion:

The way to go, if you need to automatize, is:

selenium

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Get element from website with python without opening a browser - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)