最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - User defined regular expression security concerns - Stack Overflow

programmeradmin4浏览0评论

Are there any security concerns if I run a user defined regular expression on my server with a user defined input string? I'm not asking about a single language, but any language really, with PHP as one of the main language I would like to know about.

For example, if I have the code below:

<?php

if(isset($_POST['regex'])) {
    preg_match($_POST['regex'], $_POST['match'], $matches);
    var_dump($matches);
}

?>
<form action="" method="post">
<input type="text" name="regex">
<textarea name="match"></textarea>
<input type="submit">
</form>

Providing this is not a controlled environment (i.e. the user can't be trusted), what are the risks of the above code? If a similar code is written for other languages, are there risks in these other languages? If so, which languages consist of threats?

I already found out about 'evil regular expressions', however, no matter what I try on my puter, they seem to work fine, see below.

PHP

<?php
php > preg_match('/^((ab)*)+$/', 'ababab', $matches);var_dump($matches);
array(3) {
  [0] =>
  string(6) "ababab"
  [1] =>
  string(0) ""
  [2] =>
  string(2) "ab"
}
php > preg_match('/^((ab)*)+$/', 'abababa', $matches);var_dump($matches);
array(0) {
}

JavaScript

phantomjs> /^((ab)*)+$/g.exec('ababab');
{
   "0": "ababab",
   "1": "ababab",
   "2": "ab",
   "index": 0,
   "input": "ababab"
}
phantomjs> /^((ab)*)+$/g.exec('abababa');
null

This leads me to believe that PHP and JavaScript have a fail-safe mechanism for evil regexes. Based on that, I would have that other languages have similar features.

Is this a correct assumption?

Finally, for any or all of the languages that may be harmful, are there any ways to make sure the regular expressions doesn't cause damage?

Are there any security concerns if I run a user defined regular expression on my server with a user defined input string? I'm not asking about a single language, but any language really, with PHP as one of the main language I would like to know about.

For example, if I have the code below:

<?php

if(isset($_POST['regex'])) {
    preg_match($_POST['regex'], $_POST['match'], $matches);
    var_dump($matches);
}

?>
<form action="" method="post">
<input type="text" name="regex">
<textarea name="match"></textarea>
<input type="submit">
</form>

Providing this is not a controlled environment (i.e. the user can't be trusted), what are the risks of the above code? If a similar code is written for other languages, are there risks in these other languages? If so, which languages consist of threats?

I already found out about 'evil regular expressions', however, no matter what I try on my puter, they seem to work fine, see below.

PHP

<?php
php > preg_match('/^((ab)*)+$/', 'ababab', $matches);var_dump($matches);
array(3) {
  [0] =>
  string(6) "ababab"
  [1] =>
  string(0) ""
  [2] =>
  string(2) "ab"
}
php > preg_match('/^((ab)*)+$/', 'abababa', $matches);var_dump($matches);
array(0) {
}

JavaScript

phantomjs> /^((ab)*)+$/g.exec('ababab');
{
   "0": "ababab",
   "1": "ababab",
   "2": "ab",
   "index": 0,
   "input": "ababab"
}
phantomjs> /^((ab)*)+$/g.exec('abababa');
null

This leads me to believe that PHP and JavaScript have a fail-safe mechanism for evil regexes. Based on that, I would have that other languages have similar features.

Is this a correct assumption?

Finally, for any or all of the languages that may be harmful, are there any ways to make sure the regular expressions doesn't cause damage?

Share Improve this question asked Jan 5, 2014 at 0:48 GManzGManz 1,6772 gold badges23 silver badges45 bronze badges 5
  • Those regexes are evil when used on maliciously crafted, very long strings. – SLaks Commented Jan 5, 2014 at 0:50
  • 1 With the e modifier (in PHP) something will be evaluated (what you probably don't want), see the manual – kero Commented Jan 5, 2014 at 0:56
  • @SLaks, when you say "long strings", any idea how long we're talking about? – GManz Commented Jan 5, 2014 at 1:30
  • @kingkero, I know about that, but that's been deprecated and only works with preg_replace() – GManz Commented Jan 5, 2014 at 1:31
  • Very closely related: stackoverflow./a/4579675 – tchrist Commented Jan 5, 2014 at 18:10
Add a ment  | 

1 Answer 1

Reset to default 8

When you are running user-defined regex with user-defined string on your side, it is possible for user to craft a catastrophic backtracking regex, usually with failing input to cause denial of service on your system.

Using your example ^((ab)*)+$, you need a slightly longer, failing input to cause catastrophic backtracking to take effect: "ababababababababababababababababababababababd".

  • For PHP version, a call to preg_last_error should return PREG_BACKTRACK_LIMIT_ERROR.
  • For JS version, the code above does not cause catastrophic backtracking in Firefox 26 and the browser returns false. On Chrome 31.0.1650.63 m and Internet Explorer 11, catastrophic backtracking can be observed.

Depending on the API of the language/library, the API may provide an option to limit the number of backtracking attempts or set time-out to the operation; it is strongly remended that you set the limit in order to prevent DoS on your server.

  • PCRE defaults to stop after 10 million backtracking attempts, and the number can be configured.
  • .NET Regex class es with an API to limit the time taken for matching.

If the language doesn't e with such convenient API, it is strongly remended that you implement your own time out mechanism to time-out the execution.

Unless the specs of the regex engine includes requirement to prevent catastrophic backtracking (e.g. PCRE has a default backtracking limit), you shouldn't rely on the behavior of specific implementation (like the case of Firefox as described above).

发布评论

评论列表(0)

  1. 暂无评论