I have a perl script.
I can use it in linux command-line be executing on a single file by specifying the input file and the name of an output.
perl removesmalls.pl 500 1.fasta > 1_500.fasta
In this example 500 stands for specified cutoff number.
How to use it for multiple fasta files in one folder.
I would like still to have an option to specify the cutoff number.
The script:
## removesmalls.pl
#!/usr/bin/perl
use strict;
use warnings;
my $minlen = shift or die "Error: `minlen` parameter not provided\n";
{
local $/=">";
while(<>) {
chomp;
next unless /\w/;
s/>$//gs;
my @chunk = split /\n/;
my $header = shift @chunk;
my $seqlen = length join "", @chunk;
print ">$_" if($seqlen >= $minlen);
}
local $/="\n";
}
So let say I have a folder "test_A
" full of fasta files.
1.fasta 2.fasta 3.fasta ect.
I would like to specify cutoff number as 500
I would like to have an output in the same test_A catalog named as:
1_500.fasta 2_500.fasta 3_500.fasta ect.
I have a perl script.
I can use it in linux command-line be executing on a single file by specifying the input file and the name of an output.
perl removesmalls.pl 500 1.fasta > 1_500.fasta
In this example 500 stands for specified cutoff number.
How to use it for multiple fasta files in one folder.
I would like still to have an option to specify the cutoff number.
The script:
## removesmalls.pl
#!/usr/bin/perl
use strict;
use warnings;
my $minlen = shift or die "Error: `minlen` parameter not provided\n";
{
local $/=">";
while(<>) {
chomp;
next unless /\w/;
s/>$//gs;
my @chunk = split /\n/;
my $header = shift @chunk;
my $seqlen = length join "", @chunk;
print ">$_" if($seqlen >= $minlen);
}
local $/="\n";
}
So let say I have a folder "test_A
" full of fasta files.
1.fasta 2.fasta 3.fasta ect.
I would like to specify cutoff number as 500
I would like to have an output in the same test_A catalog named as:
Share Improve this question edited Mar 7 at 14:39 Alphin Thomas 1 asked Mar 6 at 12:38 k_a_r_o_lk_a_r_o_l 112 bronze badges 01_500.fasta 2_500.fasta 3_500.fasta ect.
3 Answers
Reset to default 2Perl gets its command line arguments in @ARGV
. This is the same array that <>
, the default line read operator which is shorthand for <ARGV>
, looks for the source of its lines. If you pass multiple files on the command line, <>
reads them all. If you don't care that the output for all the files is merged into one stream,
% perl my_script.pl file1 file2 ...
That is sometimes not convenient in the case that you want more control over each file. You can go through the files yourself, open each one in turn, then read from that filehandle:
foreach my $file (@ARGV) {
open my $fh, '<', $file or do { warn ...; next };
while( <$fh> ) { ... stuff you already have ... }
}
possible solution using a shell loop:
for i in *.fasta
do
if [ -f "$i" ]
then
perl removesmalls.pl 500 "$i" > "${i%.fasta}_500.fasta"
fi
done
or in one line
for i in *.fasta; do if [ -f "$i" ]; then perl removesmalls.pl 500 "$i" > "${i%.fasta}_500.fasta"; fi; done
If you don't need to handle the case that no matching file is found, then you can omit the if
.
for i in *.fasta
do
perl removesmalls.pl 500 "$i" > "${i%.fasta}_500.fasta"
done
If you don't want to type the number two times, you could use something like
n=500
for i in *.fasta
do
if [ -f "$i" ]
then
perl removesmalls.pl "$n" "$i" "${i%.fasta}_$n.fasta"
fi
done
Of course you could also adapt your perl script to handle multiple files as mentioned in the other answer.
You could use GNU Parallel, which is rather fittingly a Perl script itself. It will not only run your commands succinctly, but also in parallel:
parallel --dry-run 'perl removesmalls.pl 500 {} > {.}_500.fasta' ::: *fasta
Sample Output
perl removesmalls.pl 500 1.fasta > 1_500.fasta
perl removesmalls.pl 500 2.fasta > 2_500.fasta
perl removesmalls.pl 500 3.fasta > 3_500.fasta
If that looks correct, remove --dry-run
and run it again.
Note that {.}
is GNU Parallel syntax meaning "the current file without its subscript".