I have 2 similar files which contains the following
hello
peppa
The only difference is that
./c/d/b.txt
has blank line which contains space
while ./c/d/m.txt
has empty line in line 3
Running the find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello\s+peppa/msi);' {} \;
will print ./c/d/b.txt
but not ./c/d/m.txt
. I expecting it also print ./c/d/m.txt
Below is the hexdump of the file if it help
[user@host]$ hexdump -v -C ./c/d/m.txt
00000000 68 65 6c 6c 6f 0a 20 20 20 20 0a 0a 20 20 20 20 |hello. .. |
00000010 70 65 70 70 61 0a |peppa.|
00000016
[user@host]$ hexdump -v -C ./c/d/b.txt
00000000 68 65 6c 6c 6f 0a 20 20 20 20 0a 20 0a 20 20 20 |hello. . . |
00000010 20 70 65 70 70 61 0a | peppa.|
00000017
I was able to verify that this occurs for both perl 5.16 and 5.38
I have 2 similar files which contains the following
hello
peppa
The only difference is that
./c/d/b.txt
has blank line which contains space
while ./c/d/m.txt
has empty line in line 3
Running the find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello\s+peppa/msi);' {} \;
will print ./c/d/b.txt
but not ./c/d/m.txt
. I expecting it also print ./c/d/m.txt
Below is the hexdump of the file if it help
[user@host]$ hexdump -v -C ./c/d/m.txt
00000000 68 65 6c 6c 6f 0a 20 20 20 20 0a 0a 20 20 20 20 |hello. .. |
00000010 70 65 70 70 61 0a |peppa.|
00000016
[user@host]$ hexdump -v -C ./c/d/b.txt
00000000 68 65 6c 6c 6f 0a 20 20 20 20 0a 20 0a 20 20 20 |hello. . . |
00000010 20 70 65 70 70 61 0a | peppa.|
00000017
I was able to verify that this occurs for both perl 5.16 and 5.38
Share Improve this question asked Mar 19 at 7:00 Demeter P. ChenDemeter P. Chen 1,0411 gold badge10 silver badges20 bronze badges 4 |2 Answers
Reset to default 8-00
is special, it doesn't mean "separated by a null byte". From perldoc perlrun:
The special value 00 will cause Perl to slurp files in paragraph mode.
Any value 0400 or above will cause Perl to slurp files whole, but by convention the value 0777 is the one normally used for this purpose. The "-g" flag is a simpler alias for it.
In paragraph mode, Perl doesn't read the whole file, as the empty line separates two paragraphs.
By the way, to use the null byte, use just -0
with no digits.
So your solution is to use -0777
instead of -00
.
Ahoy!
It seems like you have some special characters in your file. I had some trouble trying to convert your file from hex back to ascii, but I think the problem is \s
is not matching your special characters properly. Hopefully I replicated the file right, but I was able to get it to work by replacing \s
with [\s\0]
.
I realized some special characters were there because [\s.]
would not match, but [\w\W]
would match, meaning there was something there that is not matched by \s
or .
. I looked at the last answer and figured it was a null byte, which is matched with either \x00
or \0
.
Assuming I replicated the file properly, changing the regular expression from
/hello\s+peppa/msi
to
/hello[\s\0]+peppa/
Also
/hello[\s\x00]+peppa/
worked as well. I converted the hexdump to ascii by putting your hexdump in a file and running
$ xxd -r b.hexdump.txt > b.convert.txt
My hex dump is a little different from yours so I dont know if that could have caused something to go wrong...
$ hexdump -v -C m.txt
00000000 68 65 6c 6c 6f 0a 20 20 00 00 00 00 00 00 00 00 |hello. ........|
00000010 70 65 70 70 61 0a |peppa.|
00000016
$ hexdump -v -C b.txt
00000000 68 65 6c 6c 6f 0a 20 20 00 00 00 00 00 00 00 00 |hello. ........|
00000010 20 70 65 70 70 61 0a | peppa.|
00000017
Here is the code I ran, try it out and let me know if it works.
$ find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello[\s\x00]+peppa/);' {} \;
./b.txt
./m.txt
This one is working for me as well
$ find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello[\s\0]+peppa/);' {} \;
./b.txt
./m.txt
/m
? See Perldoc Modifiers – David C. Rankin Commented Mar 19 at 8:26m
has no effect since neither^
nor$
is used in the pattern.s
also has no effect since.
isn't used. They could be left out since they're useless, but it's also harmless to keep them. – ikegami Commented Mar 19 at 10:28\0
hex 00, NUL), or is your use of-00
for some other reason? – TLP Commented Mar 19 at 16:25