最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How can Python work with javascript - Stack Overflow

programmeradmin5浏览0评论

I am working on a scrapy app to scrapte some data on a web page

But there is some data loaded by ajax, and thus python just cannot execute that to get the data.

Is there any lib that simulate the behavior of a browser?

I am working on a scrapy app to scrapte some data on a web page

But there is some data loaded by ajax, and thus python just cannot execute that to get the data.

Is there any lib that simulate the behavior of a browser?

Share Improve this question asked Jul 26, 2011 at 7:24 enzoenzo 291 gold badge1 silver badge2 bronze badges 1
  • 1 stackoverflow./questions/6682503/click-a-button-in-scrapy – warvariuc Commented Jul 26, 2011 at 11:58
Add a ment  | 

5 Answers 5

Reset to default 6

For that you'd have to use a full-blown Javascript engine (like Google V8 in Chrome), to get the real functionality of the browser and how it interacts. However, you could possibly get some information by looking up all URLs in the source and doing a request to each, hoping for some valid data. But in overall, you're stuck without a full Javascript engine.

Something like python-spidermonkey. A wrapper to the Javascript engine of Mozilla. However using it might be rather plicated, but that's dependant on your specific application.

You'd basically have to build a browser, but seems Python-people have made it simple. With PyWebkitGtk you'd get the dom and using either python-spidermonkey mentioned before or PyV8 mentioned by Duncan you'd theoretically get the full functionality needed for a browser/webscraper.

The problem is that you don't just have to be able to execute some Javascript (that's easy), you also have to emulate the browser DOM and that's a lot of work.

If you want to be able to run Javascript then you can use PyV8. Install it with easy_install PyV8 and then you can execute any standalone javascript:

>>> import PyV8
>>> ctxt = PyV8.JSContext()
>>> ctxt.enter()
>>> ctxt.eval("(function(a,b) { return [a+b, a*b, a/b, a-b] })(13,29)")
<_PyV8.JSArray object at 0x01F26A30>
>>> list(_)
[42, 377, 0.4482758620689655, -16]

You can also pass in classes defined in Python, so in principle might be able could emulate enough of the DOM for your purposes.

An AJAX request is a normal web request which is executed asynchronously. All you need is the URL which the JavaScript code sends to the server. Use that URL with urllib to get at the same data.

The simplest way to get work done is by using 'PyExecJS'. https://pypi.python/pypi/PyExecJS

PyExecJS is a porting of ExecJS from Ruby. PyExecJS automatically picks the best runtime available to evaluate your JavaScript program, then returns the result to you as a Python object.

I use Macbook and installed node.js so pyexecjs can use node.js javascript runtime.

pip install PyExecJS

Test code:

import execjs execjs.eval("'red yellow blue'.split(' ')")

Good luck!

A little update for 2020

i reviewed the results remended by Google Search. Turns out the best choice is #4 and then #1 #2 are deprecated!

  • #4 https://github./sqreen/PyMiniRacer is actually the most straight forward installation.
  • #3 https://github./kovidgoyal/dukpy is based lightweight JS engine. I could not find any limitations pared with v8. No significant benefit in terms of performance either.

Deprecated

  • #1 https://github./sony/v8eval has received very little maintenance for 2 years. takes 20 minutes to build and fails... (filed a bug report for that https://github./sony/v8eval/issues/34
  • #2 https://github./doloopwhile/PyExecJS is discontinued by the owner

发布评论

评论列表(0)

  1. 暂无评论