multisite - .htaccess rules for blocking bots with an extra condition

I have a WP multisite installed. I am integrated with Ezoic and Cloudflare. I had implemented firewall rules. However, I have an issue blocking bots that have X-Middleton in the User-Agent, because my origin server allows it, for detecting real user IPs.

The X-Middleton is appended when the bots pass through Ezoic that acts as a reverse proxy. I had inserted a rule in the .htaccess but it is not working, because the origin server allows every request that has X-Middleton.

The rule is as follows:

# BLOCK BOTS
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(ADmantX|Proximic|Barkrowler).*$ [NC]
RewriteRule .* - [F,L]

It is possible a rule that blocks by user agent if the "bad-bot" has X-Middleton in the User-Agent?

The rule is as follows:

# BLOCK BOTS
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(ADmantX|Proximic|Barkrowler).*$ [NC]
RewriteRule .* - [F,L]

It is possible a rule that blocks by user agent if the "bad-bot" has X-Middleton in the User-Agent?

Share Improve this question edited Mar 3, 2021 at 15:25 MrWhite 3,8911 gold badge20 silver badges23 bronze badges asked Mar 2, 2021 at 19:44 Irene 235 bronze badges

Not sure that I follow? Have you tried modifying that rule for "X-Middleton"? – MrWhite Commented Mar 2, 2021 at 23:33
Thanks Mr White I don't know how. – Irene Commented Mar 3, 2021 at 14:58

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Try it like this:

RewriteCond %{HTTP_USER_AGENT} (ADmantX|Proximic|Barkrowler|X-Middleton) [NC]
RewriteRule ^ - [F]

This will block any request where the User-Agent string contains either ADmantX, Proximic, Barkrowler or X-Middleton. The NC flag makes this a case-insensitive match - whether this is strictly required or not in this example I don't know, but generally this should be a case-sensitive match, since User-Agents (even from bad bots) is usually consistent with regards to case.

The regex prefix ^.* and suffix .*$ are superfluous.

The regex pattern (A|B|C|D) is called alternation. Essentially means A or B or C or D will match.

The RewriteRule pattern .* can be simplified to ^ - which is also marginally more efficient, since you don't need to actually match anything here, just to be successful.

The L flag is not required when the F flag is used - it is implied.

UPDATE:

It appears that X-Middleton (or rather X-Middleton/1) is appended to all User-Agent strings that reach your site, as they pass through the Ezoic reverse proxy. So, simply blocking based on the presence of this string in the User-Agent header (as above) is not going to work, since it will block all requests!

If X-Middleton is simply appended to the UA string and no further processing occurs then you could theoretically block the request when X-Middleton appears twice (or more times) in the UA string in order to block any request where X-Middleton occurs in the original request.

To handle this situation you would create an additional rule. For example:

RewriteCond %{HTTP_USER_AGENT} (X-Middleton).*\1 [NC]
RewriteRule ^ - [F]

\1 is an internal backreference that matches the first matched subpattern, ie. "X-Middleton". So, it is only successful when the string "X-Middleton" occurs at least twice, separated by any number of characters (or none).

The above will block blah X-Middleton blah X-Middleton/1, but not blah blah X-Middleton/1 (case-insensitive match).

However, I would want to see an example access log entry (User-Agent string) of such a request before going live with this. It shouldn't block real user requests, but it might not block fake requests either. If you don't have an actual fake request, then you can mock one up by customising the User-Agent string your browser sends (in Chrome's "Inspector" > Customize Menu > More tools > Network conditions OR install a User-Agent switcher plugin) OR use CURL to make the request, eg. curl -A "<custom-user-agent>" <siteurl> - where you'd set <custom-user-agent> to blah X-Middleton blah or something.

I would also be interested to see the complete list of HTTP request headers that are reaching your application server, as there may be a better way to solve this. (I find it unusual that an intermediate proxy would modify the User-Agent, without also providing the original value. Although, maybe there are no additional options?)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

multisite - .htaccess rules for blocking bots with an extra condition

1 Answer 1

与本文相关的文章

评论列表(0)