最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

php - Validating items in CSV with regex - Stack Overflow

programmeradmin6浏览0评论

I have a CSV string that I am trying to validate via regex to ensure it only has N items. I've tried the following pattern (which look for 2 items):

/([^,]+){2}/

But it doesn't seem to work, I am guessing because the inner pattern isn't greedy enough.

Any ideas? Ideally it should work with both the PHP and Javscript regex engines.

Update:

For technical reasons I really want to do this via regex rather than another solution. The CSV is not quoted and the values will not contain mas, so that isn't a problem.

/([^,]*[,]{1}[^,]*){1}/

Is where I am at now, which sort of works but is still a bit ugly, and has issues matching one item.

CSV looks like:

apples,bananas,pears,oranges,grapefruit

I have a CSV string that I am trying to validate via regex to ensure it only has N items. I've tried the following pattern (which look for 2 items):

/([^,]+){2}/

But it doesn't seem to work, I am guessing because the inner pattern isn't greedy enough.

Any ideas? Ideally it should work with both the PHP and Javscript regex engines.

Update:

For technical reasons I really want to do this via regex rather than another solution. The CSV is not quoted and the values will not contain mas, so that isn't a problem.

/([^,]*[,]{1}[^,]*){1}/

Is where I am at now, which sort of works but is still a bit ugly, and has issues matching one item.

CSV looks like:

apples,bananas,pears,oranges,grapefruit
Share Improve this question edited May 29, 2011 at 13:27 Meep3D asked May 29, 2011 at 10:12 Meep3DMeep3D 3,7564 gold badges37 silver badges58 bronze badges
Add a ment  | 

7 Answers 7

Reset to default 5

In PHP, you'll be much better off using this function:

http://www.php/manual/en/function.str-getcsv.php

It will deal with the likes of:

a,"b,c"

... which contains two items rather than three.

I'm not aware of an equivalent function for javascript.

Untested, because I don't know what your input looks like:

/^([^,]+,){1}([^,]+$)/

This requires two fields (one ma, so no ma after the last field).

How about using the g (global) modifier to make the RegExp greedier?

var foobar = 'foo,bar',
    foobarbar = 'foo,bar,"bar"',
    foo = 'foo,',
    bar = 'bar';
foo.match(/([^,]+)/g).length === 2; //=> false
bar.match(/([^,]+)/g).length === 2; //=> false
foobar.match(/([^,]+)/g).length === 2; //=> true
foobarbar.match(/([^,]+)/g).length === 2; //=> false
var vals       = "something,sthelse,anotherone,woohoo".split(','),
    maxlength = 4;

return vals.length<=maxlength

should work in js.

Depending on how the CSV is formatted, it may be able to split on /\",\"/ (i.e. double_quote ma double_quote) and get the length of the resulting array.

Regular expressions aren't very good for parsing, so if the string is plex you may need to parse it some other way.

Got it.

/^([^,]+([,]{1}|$)){1}$/

Set the last {N} to the quantity of results or range {1,3} to check.

Take a look at this answer.

To quote:

re_valid = r"""
# Validate a CSV string having single, double or un-quoted values.
^                                   # Anchor to start of string.
\s*                                 # Allow whitespace before value.
(?:                                 # Group for value alternatives.
  '[^'\\]*(?:\\[\S\s][^'\\]*)*'     # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*"     # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*    # or Non-ma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Allow whitespace after value.
(?:                                 # Zero or more additional values
  ,                                 # Values separated by a ma.
  \s*                               # Allow whitespace before value.
  (?:                               # Group for value alternatives.
    '[^'\\]*(?:\\[\S\s][^'\\]*)*'   # Either Single quoted string,
  | "[^"\\]*(?:\\[\S\s][^"\\]*)*"   # or Double quoted string,
  | [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*  # or Non-ma, non-quote stuff.
  )                                 # End group of value alternatives.
  \s*                               # Allow whitespace after value.
)*                                  # Zero or more additional values
$                                   # Anchor to end of string.
"""

Or the usable form (since JS can't handle multi-line regex strings):

var re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;

It can be called using RegEx.test()

if (!re_valid.test(text)) return null;

The first match looks for valid single-quoted strings. The second match looks for valid double-quoted strings, the third looks for unquoted strings.

If you remove the single-quote matches it is an almost 100% implementation of a working IETF RFC 4810 spec CSV validator.

Note: It might be 100% but I can't remember whether it can handle newline chars in values (I think the [\S\s] is a javascript-specific hack to check for newline chars).

Note: This is a JavaScript-only implementation, there are no guarantees that the RegEx source string will work in PHP.

If you're planning on doing anything non-trivial with CSV data, I suggest you adopt an existing library. It gets pretty ugly if you're looking for a RFC-pliant implementation.

发布评论

评论列表(0)

  1. 暂无评论