最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Awk in a bash script as a variable - Stack Overflow

programmeradmin1浏览0评论

I'm using a python program in a bash script. The python program directly spits out an output.csv file (can't change this). I wanted to pipe to an awk command and extract only one column, but that doesn't work so I'm trying to figure out how to awk after running the python command.

#!/bin/bash

echo "A" > data.log
n=1
maxn=5000

while [ $n -le $maxn ]
do
     python python.py
     c1=`awk '{print $4}' < output.csv`
     echo $n", "$c1 >> data.log
((n++))
done

This writes the data.log file with the title and line numbers but nothing else. I've confirmed that the output.csv is populated and I'm indeed running the python script inside the bash script (python python.py is just used for simplicity). What am I doing wrong?

output.csv looks like this:

1,T,T,0.925
2,T,T,0.875
3,G,G,0.968
...

I would like data.log to look like this:

A
1, 0.925
1, 0.875
1, 0.968
...

I'm using a python program in a bash script. The python program directly spits out an output.csv file (can't change this). I wanted to pipe to an awk command and extract only one column, but that doesn't work so I'm trying to figure out how to awk after running the python command.

#!/bin/bash

echo "A" > data.log
n=1
maxn=5000

while [ $n -le $maxn ]
do
     python python.py
     c1=`awk '{print $4}' < output.csv`
     echo $n", "$c1 >> data.log
((n++))
done

This writes the data.log file with the title and line numbers but nothing else. I've confirmed that the output.csv is populated and I'm indeed running the python script inside the bash script (python python.py is just used for simplicity). What am I doing wrong?

output.csv looks like this:

1,T,T,0.925
2,T,T,0.875
3,G,G,0.968
...

I would like data.log to look like this:

A
1, 0.925
1, 0.875
1, 0.968
...
Share Improve this question edited Mar 14 at 19:17 Ga3258 asked Mar 14 at 19:05 Ga3258Ga3258 575 bronze badges 10
  • consider updating the question with the 1st 10 lines from output.csv, the (wrong) output generated by your script and the (correct) expected output; fwiw, awk '{print $4}' output.csv would be sufficient (ie, no need, in this case, for the <) – markp-fuso Commented Mar 14 at 19:09
  • 1 Python is a red herring in your question. Every processing happens on the CSV file and has nothing to do any more with the Python code that previously ran. Now, please read How to Ask, because "that doesn't work" is totally useless as an error description. Also, you're doing things in a loop, but only the one instance that fails is relevant, see also minimal reproducible example. – Ulrich Eckhardt Commented Mar 14 at 19:10
  • you're going to run python python.py 5000 times; is this going to generate 5000 completely different sets of data in output.csv? if python python.py is going to generate the same output on each call then why call it 5000 times? – markp-fuso Commented Mar 14 at 19:13
  • 3 you have to tell awk to parse the input with a comma delimiter, so awk -F, '{print $4}' output.csv; your next problem is that c1 is going to have a linefeed delimited list of values assigned to it, eg, c1=0.925\n0.875\n0.968\n... which likely isn't what you're expecting – markp-fuso Commented Mar 14 at 19:20
  • 2 clarification: without telling awk the field delimiter (a comma in this case), awk will use white space as the default field delimiter, so in this case awk sees one field (eg, 1,T,T,0.925) with the net result being that $4 is undefined/empty, hence nothing showing up in data.log – markp-fuso Commented Mar 14 at 19:28
 |  Show 5 more comments

1 Answer 1

Reset to default 2

Assumptions/understandings:

  • for each pass through the loop we need to ...
  • run a python program which generates a new output.csv file
  • strip the 4th comma-delimited field from output.csv and ...
  • prepend with the loop counter and then ...
  • append this loop counter + 4th field to data.log

One idea:

echo "A" > data.log
n=1
maxn=5000

for ((i=n; i<=maxn; i++))
do
     python python.py
     awk -F, -v pfx="$i" '{print pfx ", " $4}' output.csv
done >> data.log

Where:

  • for ((i=n; i<=maxn; i++)) - replace the while / ((n++)) looping construct
  • -F, - tell awk to parse input on a comma delimiter
  • -v pfx="$i" - pass the loop counter ($i) to awk as the (awk) variable pfx
  • done >> data.log - limit our opening/writing/closing of the data.log file to just the one instance for the entire loop
  • Notice there's no need for the intermediate variable c1 since awk can read directly from output.csv and write directly to stdout (with stdout directed to data.log via the >> data.log)

For demo purposes we'll use the following:

$ cat output.csv
1,T,T,0.925
2,T,T,0.875
3,G,G,0.968

$ python() { return; }    # do nothing function to simulate the 'python python.py' call in my environment

Taking for a test drive with maxn=2 this generates:

$ cat data.log
A
1, 0.925
1, 0.875
1, 0.968
2, 0.925
2, 0.875
2, 0.968

If this doesn't work for the OP then we'll need to have the question updated with more details.

发布评论

评论列表(0)

  1. 暂无评论