最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex - perl s metacharacter not matching when empty lines exists - Stack Overflow

programmeradmin0浏览0评论

I have 2 similar files which contains the following

hello


  peppa

The only difference is that ./c/d/b.txt has blank line which contains space while ./c/d/m.txt has empty line in line 3

Running the find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello\s+peppa/msi);' {} \; will print ./c/d/b.txt but not ./c/d/m.txt. I expecting it also print ./c/d/m.txt

Below is the hexdump of the file if it help

[user@host]$ hexdump -v -C ./c/d/m.txt
00000000  68 65 6c 6c 6f 0a 20 20  20 20 0a 0a 20 20 20 20  |hello.    ..    |
00000010  70 65 70 70 61 0a                                 |peppa.|
00000016

[user@host]$ hexdump -v -C ./c/d/b.txt
00000000  68 65 6c 6c 6f 0a 20 20  20 20 0a 20 0a 20 20 20  |hello.    . .   |
00000010  20 70 65 70 70 61 0a                              | peppa.|
00000017


I was able to verify that this occurs for both perl 5.16 and 5.38

I have 2 similar files which contains the following

hello


  peppa

The only difference is that ./c/d/b.txt has blank line which contains space while ./c/d/m.txt has empty line in line 3

Running the find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello\s+peppa/msi);' {} \; will print ./c/d/b.txt but not ./c/d/m.txt. I expecting it also print ./c/d/m.txt

Below is the hexdump of the file if it help

[user@host]$ hexdump -v -C ./c/d/m.txt
00000000  68 65 6c 6c 6f 0a 20 20  20 20 0a 0a 20 20 20 20  |hello.    ..    |
00000010  70 65 70 70 61 0a                                 |peppa.|
00000016

[user@host]$ hexdump -v -C ./c/d/b.txt
00000000  68 65 6c 6c 6f 0a 20 20  20 20 0a 20 0a 20 20 20  |hello.    . .   |
00000010  20 70 65 70 70 61 0a                              | peppa.|
00000017


I was able to verify that this occurs for both perl 5.16 and 5.38

Share Improve this question asked Mar 19 at 7:00 Demeter P. ChenDemeter P. Chen 1,0411 gold badge10 silver badges20 bronze badges 4
  • Are you sure you want /m? See Perldoc Modifiers – David C. Rankin Commented Mar 19 at 8:26
  • 2 @David C. Rankin, m has no effect since neither ^ nor $ is used in the pattern. s also has no effect since . isn't used. They could be left out since they're useless, but it's also harmless to keep them. – ikegami Commented Mar 19 at 10:28
  • Is your input file separated by null characters (\0 hex 00, NUL), or is your use of -00 for some other reason? – TLP Commented Mar 19 at 16:25
  • 1 @TLP - The use of -00 was out of copy and pasting without understanding. :-) – Demeter P. Chen Commented Mar 20 at 7:31
Add a comment  | 

2 Answers 2

Reset to default 8

-00 is special, it doesn't mean "separated by a null byte". From perldoc perlrun:

The special value 00 will cause Perl to slurp files in paragraph mode.

Any value 0400 or above will cause Perl to slurp files whole, but by convention the value 0777 is the one normally used for this purpose. The "-g" flag is a simpler alias for it.

In paragraph mode, Perl doesn't read the whole file, as the empty line separates two paragraphs.

By the way, to use the null byte, use just -0 with no digits.

So your solution is to use -0777 instead of -00.

Ahoy!

It seems like you have some special characters in your file. I had some trouble trying to convert your file from hex back to ascii, but I think the problem is \s is not matching your special characters properly. Hopefully I replicated the file right, but I was able to get it to work by replacing \s with [\s\0].

I realized some special characters were there because [\s.] would not match, but [\w\W] would match, meaning there was something there that is not matched by \s or .. I looked at the last answer and figured it was a null byte, which is matched with either \x00 or \0.

Assuming I replicated the file properly, changing the regular expression from

/hello\s+peppa/msi

to

/hello[\s\0]+peppa/

Also

/hello[\s\x00]+peppa/ 

worked as well. I converted the hexdump to ascii by putting your hexdump in a file and running

$ xxd -r b.hexdump.txt > b.convert.txt

My hex dump is a little different from yours so I dont know if that could have caused something to go wrong...

$ hexdump -v -C m.txt
00000000  68 65 6c 6c 6f 0a 20 20  00 00 00 00 00 00 00 00  |hello.  ........|
00000010  70 65 70 70 61 0a                                 |peppa.|
00000016

$ hexdump -v -C b.txt
00000000  68 65 6c 6c 6f 0a 20 20  00 00 00 00 00 00 00 00  |hello.  ........|
00000010  20 70 65 70 70 61 0a                              | peppa.|
00000017

Here is the code I ran, try it out and let me know if it works.

$ find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello[\s\x00]+peppa/);' {} \;
./b.txt
./m.txt

This one is working for me as well

$ find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello[\s\0]+peppa/);' {} \;
./b.txt
./m.txt
发布评论

评论列表(0)

  1. 暂无评论