最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

java - Getting Jsoup to support dynamically generated html by JavaScript - Stack Overflow

programmeradmin0浏览0评论

right now I'm working on a webcrawler. This one should parse some specific sites and give me an output into an xml-file. Up to this point, it's no problem. The Crawler works and you can customize it realy quickly via a cfg-file. I use Jsoup to parse the HTML-content.

I just added a few more sites and noticed that I got a huge problem with HTML-content that is created via JavaScript. Isn't there a way to make Jsoup supporting Javascript? Or at least get the full HTML-content I can see in my browser.

I already tried HtmlUnit, but this one didn't do well. It did not give me the content I would get in my browser.

Sincerly,

Ogofo

right now I'm working on a webcrawler. This one should parse some specific sites and give me an output into an xml-file. Up to this point, it's no problem. The Crawler works and you can customize it realy quickly via a cfg-file. I use Jsoup to parse the HTML-content.

I just added a few more sites and noticed that I got a huge problem with HTML-content that is created via JavaScript. Isn't there a way to make Jsoup supporting Javascript? Or at least get the full HTML-content I can see in my browser.

I already tried HtmlUnit, but this one didn't do well. It did not give me the content I would get in my browser.

Sincerly,

Ogofo

Share Improve this question asked Sep 27, 2012 at 15:37 OgofoOgofo 3662 gold badges6 silver badges13 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 7

Jsoup does not support javascript and it does not emulate a browser. Just forget about it if you're planning to execute Javascript. In my experience HtmlUnit, which is a headless browser, has given me the best results (always talking about Java frameworks).

One thing that worths trying in HtmlUnit is changing the BrowserVersion (Chrome / InternetEplorer / FireFox) while creating the WebClient instance. Some sites react in a different way and sometimes just changing that value might give you the results you expect to get.

发布评论

评论列表(0)

  1. 暂无评论