Visitors on my page have the option to save their prefered settings as a cookie (I know some are against it, but this is not the point of this discussion).
If the user does not have a cookie, the user is asked if he/she wants to set up settings and then if yes redirect with javascript.
Can I detect non human trafic and not ask the "question" to them? I have noticed google speed analytics are always beeing redirected to my settingspage which gives me wrong data in the analytics page.
So can I detect the non human trafic, by php or javascript?
EDIT: I would prefer to detect them in php as I have plans to phase out the javascript as much as possible
Visitors on my page have the option to save their prefered settings as a cookie (I know some are against it, but this is not the point of this discussion).
If the user does not have a cookie, the user is asked if he/she wants to set up settings and then if yes redirect with javascript.
Can I detect non human trafic and not ask the "question" to them? I have noticed google speed analytics are always beeing redirected to my settingspage which gives me wrong data in the analytics page.
So can I detect the non human trafic, by php or javascript?
EDIT: I would prefer to detect them in php as I have plans to phase out the javascript as much as possible
Share Improve this question asked Apr 11, 2016 at 7:15 AndreasAndreas 24k6 gold badges32 silver badges74 bronze badges 4- 1 Use captcha, Generating Captcha using client side scripting is quite a simple but make sure that the javascript is enabled. – Prabhat-VS Commented Apr 11, 2016 at 7:22
- @Prabhat, thank you for the suggestion. I will look in to it. I have no previous knowledge of Captcha. – Andreas Commented Apr 11, 2016 at 7:24
- All the search engines use specific user agents which can be detected and dealt with in php. Also you can set up a robots.txt excluding your settingspage. – Thomas B Commented Apr 11, 2016 at 7:25
- It's worth noting that the "robots.txt" file does not prevent crawlers from visiting the listed URIs, it merely asks them politely not to do so. As a matter of fact, most illegitimate crawlers do not even look for a robots.txt file, let alone obey it. – kalatabe Commented Apr 11, 2016 at 7:51
3 Answers
Reset to default 8Use a honeypot - an empty, non-visible (but not hidden) field that bots will likely fill in. Also you can try and catch the click event, since bots like Google are not likely to emulate it crawling your page. Overall your best option though is using your .htaccess file (or robots.txt) to disable crawling of unwanted pages - check this out: Block all bots/crawlers/spiders for a special directory with htaccess
It is quite easy to do this, even so, there are many options, depending on your specific needs.
Here is a simple solution:
on each page, make the first link styled to be "invisible" (opacity:0), which points to some place that either triggers some javascript, or points to some place you want for robots; also place it off-screen (top:-999px)
set a
timeout
(like 500ms) on page load to give a robot some time to "click" the linkafter the timeout, it should be a human user -if the "trap" was not triggered
optionally you can also check for mouse activity, but the above should suffice
This should work well, because a "human user" cannot click the link, but a bot can because it reads the HTML. Beware not to: "display:none", else the bot may skip this.
I'd remend using honeypots to detect them.
Here's an interesting Article about this.