最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Prevent cURL requests from my website - Stack Overflow

programmeradmin4浏览0评论

I have a website containing a large DB of products and prices.
I am being constantly cURLed for prices.

I thought of preventing it with a <noscript> tag but all I can do with this is hide the content, bots would still be able to scrape my content.

Is there a way of running a JS test to see if js is disabled (to detect bots) and redirect these requests, maybe in a blacklist.

Will doing so block google from going through my website?

I have a website containing a large DB of products and prices.
I am being constantly cURLed for prices.

I thought of preventing it with a <noscript> tag but all I can do with this is hide the content, bots would still be able to scrape my content.

Is there a way of running a JS test to see if js is disabled (to detect bots) and redirect these requests, maybe in a blacklist.

Will doing so block google from going through my website?

Share Improve this question asked Jun 8, 2014 at 7:21 Nir TzezanaNir Tzezana 2,3423 gold badges34 silver badges61 bronze badges 7
  • You can deny requests without an userAgent (but with cURL you can bypass this) or whitelist Google, Facebook, Twitter bots userAgent etc.. – Adam Azad Commented Jun 8, 2014 at 7:26
  • As long as the data is public, there really is no easy automated solution. The bots can always be rewritten to bypass your checks. – John V. Commented Jun 8, 2014 at 7:26
  • Why don't use htaccess to block bots by IP or location ? – Vincent Decaux Commented Jun 8, 2014 at 7:29
  • you may probably want to use some authentication or track users with cookies – source.rar Commented Jun 8, 2014 at 7:35
  • @VincentDecaux they just change their IP, it won't last long – Nir Tzezana Commented Jun 8, 2014 at 7:38
 |  Show 2 more ments

3 Answers 3

Reset to default 1

Since CURL is just an html request your server can't differentiate unless you limit certain urls' access or check for referrer url's and implement a filter for anything not referred locally. An example of how to build a check can be found here:

Checking the referrer

You can block unspoofed cURL requests in php by checking the User Agent. As far as I know none of the search engine crawlers have curl in their user user agent string, so this shouldn't block them.

if(stripos($_SERVER['HTTP_USER_AGENT'],'curl') !== false) {
    http_response_code(403); //FORBIDDEN
    exit;
}

Note that changing the User Agent string of a cURL request is trivial, so someone could easily bypass this.

You would need to create a block list and block the ips from accessing the content, all headers including referrer and user agent can be set in curl very easily with the simple following code

$agent = 'Mozilla/4.0 (patible; MSIE 6.0; Windows NT 5.1; SV1)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, 'http://www.yoursite.?data=anydata');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.yoursite.');
$html = curl_exec($ch);

the above will make the curl request look like a normal connection from a browser using firefox.

发布评论

评论列表(0)

  1. 暂无评论