最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How can I render JavaScript HTML to HTML in python? - Stack Overflow

programmeradmin1浏览0评论

I have looked around and only found solutions that render a URL to HTML. However I need a way to be able to render a webpage (That I already have, and that has JavaScript) to proper HTML.

Want: Webpage (with JavaScript) ---> HTML

Not: URL --> Webpage (with JavaScript) ---> HTML

I couldn't figure out how to make the other code work the way I wanted.

This is the code I was using that renders URLs: /

For clarity, the code above takes a URL of a webpage that has some parts of the page rendered by JavaScript, so if I scrape the page normally using say urllib2 then I won't get all the links etc that are rendered as after the JavaScript.

However I want to be able to scrape a page, say again with urllib2, and then render that page and get the outcome HTML. (Different to the above code since it takes a URL as it's argument.

Any help is appreciated, thanks guys :)

I have looked around and only found solutions that render a URL to HTML. However I need a way to be able to render a webpage (That I already have, and that has JavaScript) to proper HTML.

Want: Webpage (with JavaScript) ---> HTML

Not: URL --> Webpage (with JavaScript) ---> HTML

I couldn't figure out how to make the other code work the way I wanted.

This is the code I was using that renders URLs: http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

For clarity, the code above takes a URL of a webpage that has some parts of the page rendered by JavaScript, so if I scrape the page normally using say urllib2 then I won't get all the links etc that are rendered as after the JavaScript.

However I want to be able to scrape a page, say again with urllib2, and then render that page and get the outcome HTML. (Different to the above code since it takes a URL as it's argument.

Any help is appreciated, thanks guys :)

Share Improve this question edited Apr 2, 2015 at 7:50 user3928006 asked Apr 2, 2015 at 4:11 user3928006user3928006 931 gold badge1 silver badge8 bronze badges 13
  • I find what you want unclear. Perhaps you can give an example of what you mean by "render a webpage to proper HTML". Do you want the actual DOM? Do you want the textual HTML? Rendering can be done when you "feed the webpage into a browser" (i.e., open this text file with a browser), so it's not clear what else you want to achieve that is not already done by the browser. – barak manos Commented Apr 2, 2015 at 4:20
  • Now that you've made it clearer - I would go with Selenium Web Driver. Have you considered that? If you give a more concrete example of your urllib2 code, then I might be able to refer to it with a corresponding Selenium code. – barak manos Commented Apr 2, 2015 at 4:36
  • Now it's completely unclear what it is that you want: "I want this part but in a way like the first example" - But the first example doesn't do any of that. It just says in a comment "I want to render text and get the pure HTML". So do you want to render the URL or not??? What difference does it make if you first fetch the data from the URL into a file using urllib2? In either case you have to send an HTTP request at some point. You can take the text file and feed it into Selenium (or any other scraping utility), but it's not going to be any different than using the URL directly. – barak manos Commented Apr 2, 2015 at 4:56
  • The URL is protected by cloudflare and I don't know how to fetch the bypassed url because it gives me the cloud flare block page if I fetch the URL directly. I have a way to get the bypassed HTML however – user3928006 Commented Apr 2, 2015 at 5:08
  • So you can fetch it only withurllib2? How is that possible??? – barak manos Commented Apr 2, 2015 at 5:16
 |  Show 8 more comments

3 Answers 3

Reset to default 13

You can pip install selenium from a command line, and then run something like:

from selenium import webdriver
from urllib2 import urlopen

url = 'http://www.google.com'
file_name = 'C:/Users/Desktop/test.txt'

conn = urlopen(url)
data = conn.read()
conn.close()

file = open(file_name,'wt')
file.write(data)
file.close()

browser = webdriver.Firefox()
browser.get('file:///'+file_name)
html = browser.page_source
browser.quit()

The module I use for doing so is request_html. The first time used it automatically downloads a chromium browser, then you can render any webpage(with JavaScript)

requests_html also supports html parsing.

basically an alternative for selenium

example:

from requests_html import HTMLSession

session = HTMLSession()

r = session.get(URL)

r.html.render() # you can use r.html.render(sleep=1) if you want


try webdriver.Firefox().get('url')

发布评论

评论列表(0)

  1. 暂无评论