最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

php - Detect if a web page has a javascript redirect - Stack Overflow

programmeradmin3浏览0评论

I'm using cURL to access a number of different pages. I want an elegant way of checking if the page has a javascript redirect. I could check for presence of a window.location in the body, but because it may be inside a .js file or using a library like jQuery, it seems like any solution wouldn't be perfect. Anyone have any ideas?

I'm using cURL to access a number of different pages. I want an elegant way of checking if the page has a javascript redirect. I could check for presence of a window.location in the body, but because it may be inside a .js file or using a library like jQuery, it seems like any solution wouldn't be perfect. Anyone have any ideas?

Share Improve this question edited Nov 26, 2012 at 19:51 hopper 13.4k7 gold badges51 silver badges53 bronze badges asked Nov 26, 2012 at 19:47 madphpmadphp 1,7645 gold badges32 silver badges72 bronze badges 12
  • Not (easily) possible with simple curl requests since curl doesn't support javascript. – PeeHaa Commented Nov 26, 2012 at 19:49
  • Yes, i was thinking more of running markup through a parser, rather than executing it. – madphp Commented Nov 26, 2012 at 19:50
  • 1 If you are using a parser (or writing one), you can pile a list of .js files that are in the content of the requested file. With that list, you can download those files and parse them for the presence of a redirect as well. Since you have access to the source when downloading the file in your parser, you would be able to append the base url (extrapolated from the url you used originally) to links used in the document to download them – renab Commented Nov 26, 2012 at 19:52
  • 1 @popnoodles cURL won't fire the javascript redirect, so there will be no url to resolve – renab Commented Nov 26, 2012 at 19:57
  • 1 Maybe you could use something like capybara/selenium: christopherbloom./2012/03/12/… – sroes Commented Nov 26, 2012 at 20:01
 |  Show 7 more ments

4 Answers 4

Reset to default 2

Thanks to Ikstar for pointing out phantomjs I worked out the following example:

test.js

var page = require('webpage').create();
var testUrls = [
    "http://www.google.nl",
    "http://www.example."
];

function testNextUrl()
{
    var testUrl = testUrls.shift();
    page.open(testUrl, function() {
        var hasRedirect = page.url.indexOf(testUrl) !== 0;
        console.log(testUrl + ": " + hasRedirect.toString());
        if (testUrls.length) {
            testNextUrl();
        } else {
            phantom.exit();
        }
    });
}

testNextUrl();

Result:

D:\Tools\phantomjs-1.7.0-windows>phantomjs test.js
http://www.google.nl: false
http://www.example.: true

You cannot do it by only parsing the script. Only executing will show you he true flow of the page's JS.

One way to imitate the execution is to have different levels of code level which has a redirection. The top most would be under <script> tag and any redirects here would be a straight redirect. If any redirects are found inside functions then you have to track the structure of the program and make a guess.

Depending on the purpose of using Curl and actually needing the redirect on the page. It is possible to incorporate headless framework like PhantomJS (http://phantomjs/) to do the necessary browsing. You would be able to see whether a redirect would happen as well as track any other javascript executing on the page.

It is impossible to detect the presence of a redirect just analyzing the webpage source code.

The undecidable Halting problem can be encoded in JavaScript. The algorithm may halt, resulting in the generation of a redirect, or run forever. Since we do not know if the code will halt, it is impossible also to decide if the redirect will be executed or not.

发布评论

评论列表(0)

  1. 暂无评论