linux - How to use perl script for multiple files

I have a perl script.

I can use it in linux command-line be executing on a single file by specifying the input file and the name of an output.

perl removesmalls.pl 500 1.fasta > 1_500.fasta

In this example 500 stands for specified cutoff number.

How to use it for multiple fasta files in one folder.

I would like still to have an option to specify the cutoff number.

The script:

## removesmalls.pl
#!/usr/bin/perl
use strict;
use warnings;

my $minlen = shift or die "Error: `minlen` parameter not provided\n";
{
    local $/=">";
    while(<>) {
        chomp;
        next unless /\w/;
        s/>$//gs;
        my @chunk = split /\n/;
        my $header = shift @chunk;
        my $seqlen = length join "", @chunk;
        print ">$_" if($seqlen >= $minlen);
    }
    local $/="\n";
}

So let say I have a folder "test_A" full of fasta files.

1.fasta 2.fasta 3.fasta ect.

I would like to specify cutoff number as 500

I would like to have an output in the same test_A catalog named as:

1_500.fasta 2_500.fasta 3_500.fasta ect.

I have a perl script.

I can use it in linux command-line be executing on a single file by specifying the input file and the name of an output.

perl removesmalls.pl 500 1.fasta > 1_500.fasta

In this example 500 stands for specified cutoff number.

How to use it for multiple fasta files in one folder.

I would like still to have an option to specify the cutoff number.

The script:

## removesmalls.pl
#!/usr/bin/perl
use strict;
use warnings;

my $minlen = shift or die "Error: `minlen` parameter not provided\n";
{
    local $/=">";
    while(<>) {
        chomp;
        next unless /\w/;
        s/>$//gs;
        my @chunk = split /\n/;
        my $header = shift @chunk;
        my $seqlen = length join "", @chunk;
        print ">$_" if($seqlen >= $minlen);
    }
    local $/="\n";
}

So let say I have a folder "test_A" full of fasta files.

1.fasta 2.fasta 3.fasta ect.

I would like to specify cutoff number as 500

I would like to have an output in the same test_A catalog named as:

1_500.fasta 2_500.fasta 3_500.fasta ect.

Share Improve this question edited Mar 7 at 14:39 Alphin Thomas 1 asked Mar 6 at 12:38 k_a_r_o_l 112 bronze badges

Add a comment |

3 Answers 3

Sorted by: Reset to default 2

Perl gets its command line arguments in @ARGV. This is the same array that <>, the default line read operator which is shorthand for <ARGV>, looks for the source of its lines. If you pass multiple files on the command line, <> reads them all. If you don't care that the output for all the files is merged into one stream,

% perl my_script.pl file1 file2 ...

That is sometimes not convenient in the case that you want more control over each file. You can go through the files yourself, open each one in turn, then read from that filehandle:

foreach my $file (@ARGV) { 
   open my $fh, '<', $file or do { warn ...; next };
   while( <$fh> ) { ... stuff you already have ... }
   }

possible solution using a shell loop:

for i in *.fasta
do
  if [ -f "$i" ]
  then
    perl removesmalls.pl 500 "$i" > "${i%.fasta}_500.fasta"
  fi
done

or in one line

 for i in *.fasta; do if [ -f "$i" ]; then perl removesmalls.pl 500 "$i" > "${i%.fasta}_500.fasta"; fi; done

If you don't need to handle the case that no matching file is found, then you can omit the if.

for i in *.fasta
do
  perl removesmalls.pl 500 "$i" > "${i%.fasta}_500.fasta"
done

If you don't want to type the number two times, you could use something like

n=500
for i in *.fasta
do
  if [ -f "$i" ]
  then
    perl removesmalls.pl "$n" "$i" "${i%.fasta}_$n.fasta"
  fi
done

_{Of course you could also adapt your perl script to handle multiple files as mentioned in the other answer.}

You could use GNU Parallel, which is rather fittingly a Perl script itself. It will not only run your commands succinctly, but also in parallel:

parallel --dry-run 'perl removesmalls.pl 500 {} > {.}_500.fasta' ::: *fasta

Sample Output

perl removesmalls.pl 500 1.fasta > 1_500.fasta
perl removesmalls.pl 500 2.fasta > 2_500.fasta
perl removesmalls.pl 500 3.fasta > 3_500.fasta

If that looks correct, remove --dry-run and run it again.

Note that {.} is GNU Parallel syntax meaning "the current file without its subscript".

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

linux - How to use perl script for multiple files - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)