最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c# - Running Scripts in HtmlAgilityPack - Stack Overflow

programmeradmin5浏览0评论

I'm trying to scrape a particular webpage which works as follows.

First the page loads, then it runs some sort of javascript to fetch the data it needs to populate the page. I'm interested in that data.

If I Get the page with HtmlAgilityPack - the script doesn't run so I get what it essentially a mostly-blank page.

Is there a way to force it to run a script, so I can get the data?

I'm trying to scrape a particular webpage which works as follows.

First the page loads, then it runs some sort of javascript to fetch the data it needs to populate the page. I'm interested in that data.

If I Get the page with HtmlAgilityPack - the script doesn't run so I get what it essentially a mostly-blank page.

Is there a way to force it to run a script, so I can get the data?

Share Improve this question asked Jul 9, 2012 at 10:17 AabelaAabela 1,4185 gold badges19 silver badges28 bronze badges 2
  • have a look at phantomjs.org – Mahmoud Farahat Commented Mar 30, 2017 at 9:56
  • Also consider investigating Selenium. – mjwills Commented Nov 18, 2018 at 22:57
Add a comment  | 

2 Answers 2

Reset to default 17

You are getting what the server is returning - the same as a web browser. A web browser, of course, then runs the scripts. Html Agility Pack is an HTML parser only - it has no way to interpret the javascript or bind it to its internal representation of the document. If you wanted to run the script you would need a web browser. The perfect answer to your problem would be a complete "headless" web browser. That is something that incorporates an HTML parser, a javascript interpreter, and a model that simulates the browser DOM, all working together. Basically, that's a web browser, except without the rendering part of it. At this time there isn't such a thing that works entirely within the .NET environment.

Your best bet is to use a WebBrowser control and actually load and run the page in Internet Explorer under programmatic control. This won't be fast or pretty, but it will do what you need to do.

Also see my answer to a similar question: Load a DOM and Execute javascript, server side, with .Net which discusses the available technology in .NET to do this. Most of the pieces exist right now but just aren't quite there yet or haven't been integrated in the right way, unfortunately.

You can use Awesomium for this, http://www.awesomium.com/. It works fairly well but has no support for x64 and is not thread safe. I'm using it to scan some web sites 24x7 and it's running fine for at least a couple of days in a row but then it usually crashes.

发布评论

评论列表(0)

  1. 暂无评论