I have a WP multisite installed. I am integrated with Ezoic and Cloudflare. I had implemented firewall rules. However, I have an issue blocking bots that have X-Middleton
in the User-Agent
, because my origin server allows it, for detecting real user IPs.
The X-Middleton
is appended when the bots pass through Ezoic that acts as a reverse proxy. I had inserted a rule in the .htaccess
but it is not working, because the origin server allows every request that has X-Middleton
.
The rule is as follows:
# BLOCK BOTS
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(ADmantX|Proximic|Barkrowler).*$ [NC]
RewriteRule .* - [F,L]
It is possible a rule that blocks by user agent if the "bad-bot" has X-Middleton
in the User-Agent
?
I have a WP multisite installed. I am integrated with Ezoic and Cloudflare. I had implemented firewall rules. However, I have an issue blocking bots that have X-Middleton
in the User-Agent
, because my origin server allows it, for detecting real user IPs.
The X-Middleton
is appended when the bots pass through Ezoic that acts as a reverse proxy. I had inserted a rule in the .htaccess
but it is not working, because the origin server allows every request that has X-Middleton
.
The rule is as follows:
# BLOCK BOTS
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(ADmantX|Proximic|Barkrowler).*$ [NC]
RewriteRule .* - [F,L]
It is possible a rule that blocks by user agent if the "bad-bot" has X-Middleton
in the User-Agent
?
- Not sure that I follow? Have you tried modifying that rule for "X-Middleton"? – MrWhite Commented Mar 2, 2021 at 23:33
- Thanks Mr White I don't know how. – Irene Commented Mar 3, 2021 at 14:58
1 Answer
Reset to default 0Try it like this:
RewriteCond %{HTTP_USER_AGENT} (ADmantX|Proximic|Barkrowler|X-Middleton) [NC]
RewriteRule ^ - [F]
This will block any request where the User-Agent
string contains either ADmantX
, Proximic
, Barkrowler
or X-Middleton
. The NC
flag makes this a case-insensitive match - whether this is strictly required or not in this example I don't know, but generally this should be a case-sensitive match, since User-Agents
(even from bad bots) is usually consistent with regards to case.
The regex prefix ^.*
and suffix .*$
are superfluous.
The regex pattern (A|B|C|D)
is called alternation. Essentially means A
or B
or C
or D
will match.
The RewriteRule
pattern .*
can be simplified to ^
- which is also marginally more efficient, since you don't need to actually match anything here, just to be successful.
The L
flag is not required when the F
flag is used - it is implied.
UPDATE:
It appears that X-Middleton
(or rather X-Middleton/1
) is appended to all User-Agent strings that reach your site, as they pass through the Ezoic reverse proxy. So, simply blocking based on the presence of this string in the User-Agent header (as above) is not going to work, since it will block all requests!
If X-Middleton
is simply appended to the UA string and no further processing occurs then you could theoretically block the request when X-Middleton
appears twice (or more times) in the UA string in order to block any request where X-Middleton
occurs in the original request.
To handle this situation you would create an additional rule. For example:
RewriteCond %{HTTP_USER_AGENT} (X-Middleton).*\1 [NC]
RewriteRule ^ - [F]
\1
is an internal backreference that matches the first matched subpattern, ie. "X-Middleton". So, it is only successful when the string "X-Middleton" occurs at least twice, separated by any number of characters (or none).
The above will block blah X-Middleton blah X-Middleton/1
, but not blah blah X-Middleton/1
(case-insensitive match).
However, I would want to see an example access log entry (User-Agent string) of such a request before going live with this. It shouldn't block real user requests, but it might not block fake requests either. If you don't have an actual fake request, then you can mock one up by customising the User-Agent string your browser sends (in Chrome's "Inspector" > Customize Menu > More tools > Network conditions OR install a User-Agent switcher plugin) OR use CURL to make the request, eg. curl -A "<custom-user-agent>" <siteurl>
- where you'd set <custom-user-agent>
to blah X-Middleton blah
or something.
I would also be interested to see the complete list of HTTP request headers that are reaching your application server, as there may be a better way to solve this. (I find it unusual that an intermediate proxy would modify the User-Agent, without also providing the original value. Although, maybe there are no additional options?)