I would like to load a DOM using a document (in string form) or a URL, and then Execute javascript functions (including jquery selectors) against it. This would be totally server side, in process, no client/browser.
Basically I need to load the dom and then use jquery selectors and text() & type val() functions to extract strings from it. I don't really need to manipulate the dom.
I have looked at .Net javascript engines such as Jurassic and Jint, but neither support loading a DOM, and so therefore can't do what I need.
I would be willing to consider non .Net solutions (node.js, ruby, etc) if they exist, but would really prefer .Net.
edit The below is a good answer, but currently I'm trying a different route, I'm attempting to port envjs to jurassic. If I can get that working I think it will do what I want, stay tuned....
I would like to load a DOM using a document (in string form) or a URL, and then Execute javascript functions (including jquery selectors) against it. This would be totally server side, in process, no client/browser.
Basically I need to load the dom and then use jquery selectors and text() & type val() functions to extract strings from it. I don't really need to manipulate the dom.
I have looked at .Net javascript engines such as Jurassic and Jint, but neither support loading a DOM, and so therefore can't do what I need.
I would be willing to consider non .Net solutions (node.js, ruby, etc) if they exist, but would really prefer .Net.
edit The below is a good answer, but currently I'm trying a different route, I'm attempting to port envjs to jurassic. If I can get that working I think it will do what I want, stay tuned....
Share Improve this question edited Jun 11, 2012 at 19:15 Brook asked Jun 4, 2012 at 18:20 BrookBrook 6,0093 gold badges32 silver badges45 bronze badges 1- How is it coming? I would love to benefit from - or contribute to - such a project, since I made my own attempt but have stalled for the time being. If you want, just add @gmail.com to my SO name and you can contact me there. I have a JavaScript project that adds ActiveX to Jurassic here: jurascript.codeplex.com – aikeru Commented Jun 27, 2012 at 13:41
1 Answer
Reset to default 15The answer depends on what you are trying to do. If your goal is basically a complete web browser simulation, or a "headless browser," there are a number of solutions, but none of them (that I know of) exist cleanly in .NET. To mimic a browser, you need a javascript engine and a DOM. You've identified a few engines; I've found Jurassic to be both the most robust and fastest. The google chrome V8 engine is also very popular; the Neosis Javascript.NET project provides a .NET wrapper for it. It's not quite pure .NET since you have a non-.NET dependency, but it integrates cleanly and is not much trouble to use.
But as you've noted, you still need a DOM. In pure C# there is XBrowser, but it looks a bit stale. There are javascript-based representations of the entire browser DOM like jsdom, too. You could probably run jsdom in Jurassic, giving you a DOM simulation without a browser, all in C# (though likely very slowly!) It would definitely run just fine in V8. If you get outside the .NET realm, there are other better-supported solutions. This question discusses HtmlUnit. Then there's Selenium for automating actual web browsers.
Also, bear in mind that a lot of the work done around the these tools is for testing. While that doesn't mean you couldn't use them for something else, they may not perform or integrate well for any kind of stable use in inline production code. If you are trying to basically do real-time HTML manipulation, then a solution mixing a lot of technologies not that aren't widely used except for testing might be a poor choice.
If your need is actually HTML manipulation, and it doesn't really need to use Javascript but you are thinking more about the wealth of such tools available in JS, then I would look at C# tools designed for this purpose. For example HTML Agility Pack, or my own project CsQuery, which is a C# jQuery port.
If you are basically trying to take some code that was written for the client, but run it on a server -- e.g. for sophisticated/accelerated web scraping -- I'd search around using those terms. For example this question discusses this, with answers including PhantomJS, a headless webkit browser stack, as well as some of the testing tools I have already mentioned. For web scraping, I would imagine you can live without it all being in .NET, and that may be the only reasonable answer anyway.