Let's say I have the following CSV file (sample.csv)
filepath1
filepath1,filepath2
filepath1,filepath2,filepath3
filepath1,filepath2,filepath3,filepath4
filepath1,filepath2
And the program I need to use uses the following syntax:
program_im_using --file filepath1
But if multiple files are present, the syntax for input is:
program_im_using --file filepath1 --file filepath2
or
program_im_using --file filepath1 --file filepath2 --file filepath3
What I would do to paste the values from the CSV file would be something like this:
filepath_first=( $(awk -F "\"*,\"*" '{print $1}' $sample_csv) )
filepath_second=( $(awk -F "\"*,\"*" '{print $2}' $sample_csv) )
for i in "${!filenames[@]}"; do
program_im_using --file "${filepath_first[i]}" --file "${filepath_second[i]}"
done
My example of what I'm doing only works if I have exactly 2 columns of values in every row. How can I generalize in a way such that if I have one column, I get:
program_im_using --file filepath1
But if I have two
program_im_using --file filepath1 --file filepath2
etc?
Let's say I have the following CSV file (sample.csv)
filepath1
filepath1,filepath2
filepath1,filepath2,filepath3
filepath1,filepath2,filepath3,filepath4
filepath1,filepath2
And the program I need to use uses the following syntax:
program_im_using --file filepath1
But if multiple files are present, the syntax for input is:
program_im_using --file filepath1 --file filepath2
or
program_im_using --file filepath1 --file filepath2 --file filepath3
What I would do to paste the values from the CSV file would be something like this:
filepath_first=( $(awk -F "\"*,\"*" '{print $1}' $sample_csv) )
filepath_second=( $(awk -F "\"*,\"*" '{print $2}' $sample_csv) )
for i in "${!filenames[@]}"; do
program_im_using --file "${filepath_first[i]}" --file "${filepath_second[i]}"
done
My example of what I'm doing only works if I have exactly 2 columns of values in every row. How can I generalize in a way such that if I have one column, I get:
program_im_using --file filepath1
But if I have two
program_im_using --file filepath1 --file filepath2
etc?
Share Improve this question asked Feb 14 at 19:02 Gabriel G.Gabriel G. 8641 gold badge6 silver badges18 bronze badges 2- can the fields (in the csv) contain white space? – markp-fuso Commented Feb 14 at 19:24
- no special characters like spaces and symbols besides common path ones (like /) – Gabriel G. Commented Feb 14 at 19:26
4 Answers
Reset to default 5You can use xargs
:
sed 's/,/ --file /g' file.csv | xargs -L1 program_im_using --file
For a variable number of args one common approach is to store the args in an array, eg:
$ args=(--file filepath1 --file filepath2)
$ typeset -p args
declare -a args=([0]="--file" [1]="filepath1" [2]="--file" [3]="filepath2")
A sample script to list input args:
$ cat program_im_using
#!/bin/bash
n=1
for i in "$@"; do echo "arg #$n:$i:";((n++)); done
We can then feed these args to the program like such:
$ ./program_im_using "${args[@]}"
arg #1 :--file:
arg #2 :filepath1:
arg #3 :--file:
arg #4 :filepath2:
Sample input file:
$ cat sample.csv
filepath1
filepath1,filepath2
filepath1,filepath2,filepath3
/a/b/c.txt,/d/e/f/g.txt,/h.txt
Using sed
to reformat our csv lines:
$ sed 's/^/--file /; s/,/ --file /g' sample.csv
--file filepath1
--file filepath1 --file filepath2
--file filepath1 --file filepath2 --file filepath3
--file /a/b/c.txt --file /d/e/f/g.txt --file /h.txt
Feeding the sed
results to a bash/while-read
loop:
while read -ra args
do
printf "\n####### ${args[*]}\n"
./program_im_using "${args[@]}"
done < <(sed 's/^/--file /; s/,/ --file /g' sample.csv)
NOTES:
- OP has stated in comments that we do not need to worry about spaces or special characters in the csv fields otherwise ...
- this approach will fail if the csv fields contain (white)space or characters that have a special meaning to
bash
This generates:
####### --file filepath1
arg #1 :--file:
arg #2 :filepath1:
####### --file filepath1 --file filepath2
arg #1 :--file:
arg #2 :filepath1:
arg #3 :--file:
arg #4 :filepath2:
####### --file filepath1 --file filepath2 --file filepath3
arg #1 :--file:
arg #2 :filepath1:
arg #3 :--file:
arg #4 :filepath2:
arg #5 :--file:
arg #6 :filepath3:
####### --file /a/b/c.txt --file /d/e/f/g.txt --file /h.txt
arg #1 :--file:
arg #2 :/a/b/c.txt:
arg #3 :--file:
arg #4 :/d/e/f/g.txt:
arg #5 :--file:
arg #6 :/h.txt:
Using GNU awk
:
awk '
BEGIN{OFS=","; FPAT="([^,]*)|(\"[^\"]+\")"}
{printf "program_im_using "}
{for (i=1; i<=NF; i++) printf "--file %s ", $i}
{print ""}
' file
Yields:
program_im_using --file filepath1
program_im_using --file filepath1 --file filepath2
program_im_using --file filepath1 --file filepath2 --file filepath3
program_im_using --file filepath1 --file filepath2 --file filepath3 --file filepath4
program_im_using --file filepath1 --file filepath2
Then, you can pipe awk
to bash
to execute:
awk .... | bash
Parsing CSV
with awk
:
https://www.gnu./software/gawk/manual/html_node/Splitting-By-Content.html
More advanced: https://www.gnu./software/gawk/manual/gawk.html#More-CSV
Bash alone offers you all the tools to proceed:
#!/usr/bin/env bash
# Dummy command to show its name and arguments for demo/testing purposes
dummyCommand() { printf '%s %s\n' "${FUNCNAME[0]}" "${*@Q}";}
shopt -s extglob # Enable extended globbing
while IFS=, read -ra csvFiles || [ "${#csvFiles[@]}" -gt 0 ]; do
lssFiles=("${csvFiles[@]##*([[:space:]])}") # Strip leading spaces
tssFiles=("${lssFiles[@]%%*([[:space:]])}") # Strip trailing spaces
# Prepares the arguments array for the command call
args=()
for fileName in "${tssFiles[@]}"; do
args+=(--file "$fileName")
done
dummyCommand "${args[@]}"
done
cat input.csv
:
filepath 1
filepath 1, filepath2
filepath 1, filepath 2,filepath3
filepath 1, filepath 2 , filepath3,filepath 4
filepath1,filepath 2
Sample output:
dummyCommand '--file' 'filepath 1'
dummyCommand '--file' 'filepath 1' '--file' 'filepath2'
dummyCommand '--file' 'filepath 1' '--file' 'filepath 2' '--file' 'filepath3'
dummyCommand '--file' 'filepath 1' '--file' 'filepath 2' '--file' 'filepath3' '--file' 'filepath 4'
dummyCommand '--file' 'filepath1' '--file' 'filepath 2'
Alternative with Bash's Regex:
#!/usr/bin/env bash
# Dummy command to show its name and arguments for demo/testing purposes
dummyCommand() { printf '%s %s\n' "${FUNCNAME[0]}" "${*@Q}";}
while IFS=, read -ra csvFiles || [ "${#csvFiles[@]}" -gt 0 ]; do
# Prepares the arguments array for the command call
args=()
for csvEntry in "${csvFiles[@]}"; do
# Strips out leading and trailing spaces
[[ "$csvEntry" =~ [[:space:]]*(.*[^[:space:]])[[:space:]]* ]]
args+=(--file "${BASH_REMATCH[1]}")
done
dummyCommand "${args[@]}"
done
Implementation not iterating file names but directly processing array entries:
#!/usr/bin/env bash
# Dummy command to show its name and arguments for demo/testing purposes
dummyCommand() { printf '%s %s\n' "${FUNCNAME[0]}" "${*@Q}";}
shopt -s extglob # Enable extended globbing
while IFS=, read -ra csvFiles || [ "${#csvFiles[@]}" -gt 0 ]; do
lssFiles=("${csvFiles[@]##*([[:space:]])}") # Strip leading spaces
tssFiles=("${lssFiles[@]%%*([[:space:]])}") # Strip trailing spaces
qFiles=("${tssFiles[@]@Q}") # Quote each file entry
# Prepares the arguments array for the command call
args=() && declare -a "args=(${qFiles[*]/#/--file })"
dummyCommand "${args[@]}"
done
Step 1: args=()
(Only for ShellCheck)
Purpose: Ensures args is initialized as an array before being dynamically declared.
Why? ShellCheck cannot see dynamic array declarations inside declare -a
, so without args=()
, it might throw an error like "args appears uninitialized".
Does this actually affect execution? No, it's just a safeguard for static analysis.
Step 2: declare -a "args=(${qFiles[*]/#/--file })"
This is the actual logic that fills args dynamically. Breaking it down:
(${qFiles[*]/#/--file })
${qFiles[*]}
expands the array qFiles
into a single space-separated string.
/#/--file
applies Bash pattern substitution:
#
refers to the beginning of each array element.
--file
is prepended to each element.
"args=(${qFiles[*]/#/--file })"
is a dynamic declaration of an array, where each already quoted entries of qFiles
is prepended with --file
to for an entry declaration for the arg
array.
declare -a
explicitly marks args as an array.
This ensures Bash correctly treats multiple elements rather than a single space-separated string.