Are there any security concerns if I run a user defined regular expression on my server with a user defined input string? I'm not asking about a single language, but any language really, with PHP as one of the main language I would like to know about.
For example, if I have the code below:
<?php
if(isset($_POST['regex'])) {
preg_match($_POST['regex'], $_POST['match'], $matches);
var_dump($matches);
}
?>
<form action="" method="post">
<input type="text" name="regex">
<textarea name="match"></textarea>
<input type="submit">
</form>
Providing this is not a controlled environment (i.e. the user can't be trusted), what are the risks of the above code? If a similar code is written for other languages, are there risks in these other languages? If so, which languages consist of threats?
I already found out about 'evil regular expressions', however, no matter what I try on my puter, they seem to work fine, see below.
PHP
<?php
php > preg_match('/^((ab)*)+$/', 'ababab', $matches);var_dump($matches);
array(3) {
[0] =>
string(6) "ababab"
[1] =>
string(0) ""
[2] =>
string(2) "ab"
}
php > preg_match('/^((ab)*)+$/', 'abababa', $matches);var_dump($matches);
array(0) {
}
JavaScript
phantomjs> /^((ab)*)+$/g.exec('ababab');
{
"0": "ababab",
"1": "ababab",
"2": "ab",
"index": 0,
"input": "ababab"
}
phantomjs> /^((ab)*)+$/g.exec('abababa');
null
This leads me to believe that PHP and JavaScript have a fail-safe mechanism for evil regexes. Based on that, I would have that other languages have similar features.
Is this a correct assumption?
Finally, for any or all of the languages that may be harmful, are there any ways to make sure the regular expressions doesn't cause damage?
Are there any security concerns if I run a user defined regular expression on my server with a user defined input string? I'm not asking about a single language, but any language really, with PHP as one of the main language I would like to know about.
For example, if I have the code below:
<?php
if(isset($_POST['regex'])) {
preg_match($_POST['regex'], $_POST['match'], $matches);
var_dump($matches);
}
?>
<form action="" method="post">
<input type="text" name="regex">
<textarea name="match"></textarea>
<input type="submit">
</form>
Providing this is not a controlled environment (i.e. the user can't be trusted), what are the risks of the above code? If a similar code is written for other languages, are there risks in these other languages? If so, which languages consist of threats?
I already found out about 'evil regular expressions', however, no matter what I try on my puter, they seem to work fine, see below.
PHP
<?php
php > preg_match('/^((ab)*)+$/', 'ababab', $matches);var_dump($matches);
array(3) {
[0] =>
string(6) "ababab"
[1] =>
string(0) ""
[2] =>
string(2) "ab"
}
php > preg_match('/^((ab)*)+$/', 'abababa', $matches);var_dump($matches);
array(0) {
}
JavaScript
phantomjs> /^((ab)*)+$/g.exec('ababab');
{
"0": "ababab",
"1": "ababab",
"2": "ab",
"index": 0,
"input": "ababab"
}
phantomjs> /^((ab)*)+$/g.exec('abababa');
null
This leads me to believe that PHP and JavaScript have a fail-safe mechanism for evil regexes. Based on that, I would have that other languages have similar features.
Is this a correct assumption?
Finally, for any or all of the languages that may be harmful, are there any ways to make sure the regular expressions doesn't cause damage?
Share Improve this question asked Jan 5, 2014 at 0:48 GManzGManz 1,6772 gold badges23 silver badges45 bronze badges 5- Those regexes are evil when used on maliciously crafted, very long strings. – SLaks Commented Jan 5, 2014 at 0:50
-
1
With the
e
modifier (in PHP) something will be evaluated (what you probably don't want), see the manual – kero Commented Jan 5, 2014 at 0:56 - @SLaks, when you say "long strings", any idea how long we're talking about? – GManz Commented Jan 5, 2014 at 1:30
-
@kingkero, I know about that, but that's been deprecated and only works with
preg_replace()
– GManz Commented Jan 5, 2014 at 1:31 - Very closely related: stackoverflow./a/4579675 – tchrist Commented Jan 5, 2014 at 18:10
1 Answer
Reset to default 8When you are running user-defined regex with user-defined string on your side, it is possible for user to craft a catastrophic backtracking regex, usually with failing input to cause denial of service on your system.
Using your example ^((ab)*)+$
, you need a slightly longer, failing input to cause catastrophic backtracking to take effect: "ababababababababababababababababababababababd"
.
- For PHP version, a call to
preg_last_error
should returnPREG_BACKTRACK_LIMIT_ERROR
. - For JS version, the code above does not cause catastrophic backtracking in Firefox 26 and the browser returns
false
. On Chrome 31.0.1650.63 m and Internet Explorer 11, catastrophic backtracking can be observed.
Depending on the API of the language/library, the API may provide an option to limit the number of backtracking attempts or set time-out to the operation; it is strongly remended that you set the limit in order to prevent DoS on your server.
- PCRE defaults to stop after 10 million backtracking attempts, and the number can be configured.
- .NET
Regex
class es with an API to limit the time taken for matching.
If the language doesn't e with such convenient API, it is strongly remended that you implement your own time out mechanism to time-out the execution.
Unless the specs of the regex engine includes requirement to prevent catastrophic backtracking (e.g. PCRE has a default backtracking limit), you shouldn't rely on the behavior of specific implementation (like the case of Firefox as described above).