最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Accessing html generated by Javascript with htmlunit -Java - Stack Overflow

programmeradmin1浏览0评论

I am trying to be able to test a website that uses javascript to render most of the HTML. With the HTMLUNIT browser how would you be able to access the html generated by the javascript? I was looking through their documentation but wasn't sure what the best approach might be.

WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("some url");
String Source = currentPage.asXml();
System.out.println(Source);

This is an easy way to get back the html of the page but would you use the domNode or another way to access the html generated by the javascript?

I am trying to be able to test a website that uses javascript to render most of the HTML. With the HTMLUNIT browser how would you be able to access the html generated by the javascript? I was looking through their documentation but wasn't sure what the best approach might be.

WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("some url");
String Source = currentPage.asXml();
System.out.println(Source);

This is an easy way to get back the html of the page but would you use the domNode or another way to access the html generated by the javascript?

Share Improve this question asked Jun 2, 2010 at 22:05 rush66rush66 1511 gold badge2 silver badges5 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 11

You gotta give some time for the JavaScript to execute.

Check a sample working code below. The bucket divs aren't in the original source.

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class GetPageSourceAfterJS {
    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); /* comment out to turn off annoying htmlunit warnings */
        WebClient webClient = new WebClient();
        String url = "http://www.futurebazaar.com/categories/Home--Living-Luggage--Travel-Airbags--Duffel-bags/cid-CU00089575.aspx";
        System.out.println("Loading page now: "+url);
        HtmlPage page = webClient.getPage(url);
        webClient.waitForBackgroundJavaScript(30 * 1000); /* will wait JavaScript to execute up to 30s */

        String pageAsXml = page.asXml();
        System.out.println("Contains bucket? --> "+pageAsXml.contains("bucket"));

        //get divs which have a 'class' attribute of 'bucket'
        List<?> buckets = page.getByXPath("//div[@class='bucket']");
        System.out.println("Found "+buckets.size()+" 'bucket' divs.");

        //System.out.println("#FULL source after JavaScript execution:\n "+pageAsXml);
    }
}

Output:

Loading page now: http://www.futurebazaar.com/categories/Mobiles-Mobile-Phones/cid-CU00089697.asp‌​x?Rfs=brandZZFly001PYXQcurtrayZZBrand
Contains bucket? --> true
Found 3 'bucket' divs.

HtmlUnit version used:

<dependency>
    <groupId>net.sourceforge.htmlunit</groupId>
    <artifactId>htmlunit</artifactId>
    <version>2.12</version>
</dependency>

Assuming the issue is HTML generated by JavaScript as a result of AJAX calls, have you tried the 'AJAX does not work' section in the HtmlUnit FAQ?

There's also a section in the howtos about how to use HtmlUnit with JavaScript.

If your question isn't answered here, I think we'll need some more specifics to be able to help.

发布评论

评论列表(0)

  1. 暂无评论