I'm using a python program in a bash script. The python program directly spits out an output.csv file (can't change this). I wanted to pipe to an awk command and extract only one column, but that doesn't work so I'm trying to figure out how to awk after running the python command.
#!/bin/bash
echo "A" > data.log
n=1
maxn=5000
while [ $n -le $maxn ]
do
python python.py
c1=`awk '{print $4}' < output.csv`
echo $n", "$c1 >> data.log
((n++))
done
This writes the data.log file with the title and line numbers but nothing else. I've confirmed that the output.csv is populated and I'm indeed running the python script inside the bash script (python python.py is just used for simplicity). What am I doing wrong?
output.csv looks like this:
1,T,T,0.925
2,T,T,0.875
3,G,G,0.968
...
I would like data.log to look like this:
A
1, 0.925
1, 0.875
1, 0.968
...
I'm using a python program in a bash script. The python program directly spits out an output.csv file (can't change this). I wanted to pipe to an awk command and extract only one column, but that doesn't work so I'm trying to figure out how to awk after running the python command.
#!/bin/bash
echo "A" > data.log
n=1
maxn=5000
while [ $n -le $maxn ]
do
python python.py
c1=`awk '{print $4}' < output.csv`
echo $n", "$c1 >> data.log
((n++))
done
This writes the data.log file with the title and line numbers but nothing else. I've confirmed that the output.csv is populated and I'm indeed running the python script inside the bash script (python python.py is just used for simplicity). What am I doing wrong?
output.csv looks like this:
1,T,T,0.925
2,T,T,0.875
3,G,G,0.968
...
I would like data.log to look like this:
A
1, 0.925
1, 0.875
1, 0.968
...
Share
Improve this question
edited Mar 14 at 19:17
Ga3258
asked Mar 14 at 19:05
Ga3258Ga3258
575 bronze badges
10
|
Show 5 more comments
1 Answer
Reset to default 2Assumptions/understandings:
- for each pass through the loop we need to ...
- run a python program which generates a new
output.csv
file - strip the 4th comma-delimited field from
output.csv
and ... - prepend with the loop counter and then ...
- append this loop counter + 4th field to
data.log
One idea:
echo "A" > data.log
n=1
maxn=5000
for ((i=n; i<=maxn; i++))
do
python python.py
awk -F, -v pfx="$i" '{print pfx ", " $4}' output.csv
done >> data.log
Where:
for ((i=n; i<=maxn; i++))
- replace thewhile / ((n++))
looping construct-F,
- tellawk
to parse input on a comma delimiter-v pfx="$i"
- pass the loop counter ($i
) toawk
as the (awk) variablepfx
done >> data.log
- limit our opening/writing/closing of thedata.log
file to just the one instance for the entire loop- Notice there's no need for the intermediate variable
c1
sinceawk
can read directly fromoutput.csv
and write directly to stdout (with stdout directed todata.log
via the>> data.log
)
For demo purposes we'll use the following:
$ cat output.csv
1,T,T,0.925
2,T,T,0.875
3,G,G,0.968
$ python() { return; } # do nothing function to simulate the 'python python.py' call in my environment
Taking for a test drive with maxn=2
this generates:
$ cat data.log
A
1, 0.925
1, 0.875
1, 0.968
2, 0.925
2, 0.875
2, 0.968
If this doesn't work for the OP then we'll need to have the question updated with more details.
output.csv
, the (wrong) output generated by your script and the (correct) expected output; fwiw,awk '{print $4}' output.csv
would be sufficient (ie, no need, in this case, for the<
) – markp-fuso Commented Mar 14 at 19:09python python.py
5000 times; is this going to generate 5000 completely different sets of data inoutput.csv
? ifpython python.py
is going to generate the same output on each call then why call it 5000 times? – markp-fuso Commented Mar 14 at 19:13awk
to parse the input with a comma delimiter, soawk -F, '{print $4}' output.csv
; your next problem is thatc1
is going to have a linefeed delimited list of values assigned to it, eg,c1=0.925\n0.875\n0.968\n...
which likely isn't what you're expecting – markp-fuso Commented Mar 14 at 19:20awk
the field delimiter (a comma in this case),awk
will use white space as the default field delimiter, so in this caseawk
sees one field (eg,1,T,T,0.925
) with the net result being that$4
is undefined/empty, hence nothing showing up indata.log
– markp-fuso Commented Mar 14 at 19:28