最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

perl - How can I split a string on the first occurrence of a digit - Stack Overflow

programmeradmin1浏览0评论

I have strings which consist of a name and two digits. I would like to extract the name and the digits into one variable for each. The problem I have is that some names have spaces in them. When I split on /\s+/ the name is split into two.

my (${st_name}, $val1, $val2) = split(/\s+/, $line, 3);

I have tried to split on /\d+/, I do not get the digits. I have tried to get the index of the first digit, not sure if it is really

my $index = index ($line, \d);

I will appreciated any assistance. Code tried

use strict;
use warnings;

while (my $line = <DATA>){
my (${st_name}, $val1, $val2) = split(/\s+/, $line, 3);   #doesn't work

my $index = index ($line, \d);
${st_name}=$line(0, $index);
my ($val1, $val2) = $line($index)


__DATA__
Maputsoe 2       1
Butha-Buthe (Butha-Buthe District) 2       1

I have strings which consist of a name and two digits. I would like to extract the name and the digits into one variable for each. The problem I have is that some names have spaces in them. When I split on /\s+/ the name is split into two.

my (${st_name}, $val1, $val2) = split(/\s+/, $line, 3);

I have tried to split on /\d+/, I do not get the digits. I have tried to get the index of the first digit, not sure if it is really

my $index = index ($line, \d);

I will appreciated any assistance. Code tried

use strict;
use warnings;

while (my $line = <DATA>){
my (${st_name}, $val1, $val2) = split(/\s+/, $line, 3);   #doesn't work

my $index = index ($line, \d);
${st_name}=$line(0, $index);
my ($val1, $val2) = $line($index)


__DATA__
Maputsoe 2       1
Butha-Buthe (Butha-Buthe District) 2       1
Share Improve this question edited 16 hours ago Robert 8,60853 gold badges116 silver badges159 bronze badges asked 16 hours ago Zilore MumbaZilore Mumba 1,5124 gold badges26 silver badges35 bronze badges 2
  • Is your data tab-separated by any chance? It kind of looks like it is, and then you should split on tab \t+ and not whitespace \s+. That would solve all your problems. – TLP Commented 14 hours ago
  • @TLP, \t, not \t+ – ikegami Commented 4 hours ago
Add a comment  | 

3 Answers 3

Reset to default 1

You can make a regular expression match and capture the pieces you want. Looks like you want some text, then a space, then a number, more space(s), and another number?

use strict;
use warnings;

while (my $line = <DATA>) {
    my ($st_name, $val1, $val2) = $line =~ m/^(.+)\s+(\d+)\s+(\d+)/;
    print "$st_name, $val1, $val2\n";
}

__DATA__
Maputsoe 2       1
Butha-Buthe (Butha-Buthe District) 2       1

This prints

Maputsoe, 2, 1
Butha-Buthe (Butha-Buthe District), 2, 1

The regular expression matches one or more (+) characters (.), followed by one or more spaces (\s), followed by \d numbers, and again spaces and numbers.

The expression /^(.*?)\s+(\d+)\s+(\d+)$/ should work.

Explanation:

  • ^(.*?): This captures the name part. The .*? is a non-greedy match that captures everything up to the first digit
  • \s+: Matches one or more whitespace
  • (\d+): Captures the first group of digits
  • \s+: Matches one or more whitespace characters
  • (\d+)$: Captures the second sequence of digits at the end of the line
use strict;
use warnings;

while (my $line = <DATA>) {
    if ($line =~ /^(.*?)\s+(\d+)\s+(\d+)$/) {
        my $st_name = $1;
        my $val1 = $2;
        my $val2 = $3;
        print "Name: $st_name, Val1: $val1, Val2: $val2\n";
    } else {
        warn "Line does not match the expected pattern: $line";
    }
}

__DATA__
Maputsoe 2       1
Butha-Buthe (Butha-Buthe District) 2       1

Your code is filled with nonsense. For example:

while (my $line = <DATA>){
my (${st_name}, $val1, $val2) = split(/\s+/, $line, 3);   #doesn't work
  • You neved finished the while loop. How can it work?
  • You don't have to use the curly braces for variables in regular code. $st_name is fine, and it doesn't do anything different than ${st_name}.
my $index = index ($line, \d);

You can't use \d unquoted. That will turn into an error. index takes a string as an argument, so you need to quote it. I.e. "\d". But you cannot use regexes with index, only string literals, and \d is a regex character class. So all in all, this is just a mess.

${st_name}=$line(0, $index);
my ($val1, $val2) = $line($index)

The idea that you could put parentheses on a variable to make it do something is quite strange, and certainly not a Perl idiom. That's just not how Perl works.

But the thing you are trying to do can be done with index and substr. With the exception that index can't search for a regex. So you would have to use a pattern match and pos instead. Then it would look something like:

my $line = "Butha-Buthe (Butha-Buthe District) 2       1";
if ($line =~ /\d/g) {      # have to use /g
    my $pos   = pos $line;
    $pos--;                # back up one
    my $start = substr $line, 0, $pos;
    my $end   = substr $line, $pos;
    print "Pos: $pos, start: '$start', end: '$end'\n";
}

Although much simpler it could be done like this:

my $line = "Butha-Buthe (Butha-Buthe District) 2       1";
if ($line =~ /(.+)(\d.+)/) {
    my $start = $1;
    my $end = $2;
    print "Start: '$start', end: '$end'\n";
}

Of course you could simplify that down to

my ($start, $end) = ($line =~ /(.+)(\d.+)/);

But my suspicion is that your data is actually tab-separated, because it kinda looks like it. I can prepare my code by changing your data to have tabs, but sadly Stackoverflow will not keep those tabs in the code due to formatting. Then it would look like this:

my $line = "Butha-Buthe (Butha-Buthe District)  2   1";  # <- tabs in here!!!
my @fields = split /\t/, $line;
print Dumper \@fields;

And print this:

$VAR1 = [
          'Butha-Buthe (Butha-Buthe District)',
          '2',
          '1'
        ];

You can try using this split statement with your original code and see how that works.

发布评论

评论列表(0)

  1. 暂无评论