最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

arrays - Including more commands if CSV column is not empty - Stack Overflow

programmeradmin3浏览0评论

Let's say I have the following CSV file (sample.csv)

filepath1
filepath1,filepath2
filepath1,filepath2,filepath3
filepath1,filepath2,filepath3,filepath4
filepath1,filepath2

And the program I need to use uses the following syntax:

program_im_using --file filepath1

But if multiple files are present, the syntax for input is:

program_im_using --file filepath1 --file filepath2
or
program_im_using --file filepath1 --file filepath2 --file filepath3

What I would do to paste the values from the CSV file would be something like this:

filepath_first=( $(awk -F "\"*,\"*" '{print $1}' $sample_csv) )
filepath_second=( $(awk -F "\"*,\"*" '{print $2}' $sample_csv) )

for i in "${!filenames[@]}"; do
    program_im_using --file "${filepath_first[i]}" --file "${filepath_second[i]}"
done

My example of what I'm doing only works if I have exactly 2 columns of values in every row. How can I generalize in a way such that if I have one column, I get:

program_im_using --file filepath1

But if I have two

program_im_using --file filepath1 --file filepath2

etc?

Let's say I have the following CSV file (sample.csv)

filepath1
filepath1,filepath2
filepath1,filepath2,filepath3
filepath1,filepath2,filepath3,filepath4
filepath1,filepath2

And the program I need to use uses the following syntax:

program_im_using --file filepath1

But if multiple files are present, the syntax for input is:

program_im_using --file filepath1 --file filepath2
or
program_im_using --file filepath1 --file filepath2 --file filepath3

What I would do to paste the values from the CSV file would be something like this:

filepath_first=( $(awk -F "\"*,\"*" '{print $1}' $sample_csv) )
filepath_second=( $(awk -F "\"*,\"*" '{print $2}' $sample_csv) )

for i in "${!filenames[@]}"; do
    program_im_using --file "${filepath_first[i]}" --file "${filepath_second[i]}"
done

My example of what I'm doing only works if I have exactly 2 columns of values in every row. How can I generalize in a way such that if I have one column, I get:

program_im_using --file filepath1

But if I have two

program_im_using --file filepath1 --file filepath2

etc?

Share Improve this question asked Feb 14 at 19:02 Gabriel G.Gabriel G. 8641 gold badge6 silver badges18 bronze badges 2
  • can the fields (in the csv) contain white space? – markp-fuso Commented Feb 14 at 19:24
  • no special characters like spaces and symbols besides common path ones (like /) – Gabriel G. Commented Feb 14 at 19:26
Add a comment  | 

4 Answers 4

Reset to default 5

You can use xargs:

sed 's/,/ --file /g' file.csv | xargs -L1 program_im_using --file

For a variable number of args one common approach is to store the args in an array, eg:

$ args=(--file filepath1 --file filepath2)
$ typeset -p args
declare -a args=([0]="--file" [1]="filepath1" [2]="--file" [3]="filepath2")

A sample script to list input args:

$ cat program_im_using
#!/bin/bash
n=1
for i in "$@"; do echo "arg #$n:$i:";((n++));  done

We can then feed these args to the program like such:

$ ./program_im_using "${args[@]}"
arg #1 :--file:
arg #2 :filepath1:
arg #3 :--file:
arg #4 :filepath2:

Sample input file:

$ cat sample.csv
filepath1
filepath1,filepath2
filepath1,filepath2,filepath3
/a/b/c.txt,/d/e/f/g.txt,/h.txt

Using sed to reformat our csv lines:

$ sed 's/^/--file /; s/,/ --file /g' sample.csv
--file filepath1
--file filepath1 --file filepath2
--file filepath1 --file filepath2 --file filepath3
--file /a/b/c.txt --file /d/e/f/g.txt --file /h.txt

Feeding the sed results to a bash/while-read loop:

while read -ra args
do
    printf "\n####### ${args[*]}\n"
    ./program_im_using "${args[@]}"
done < <(sed 's/^/--file /; s/,/ --file /g' sample.csv)

NOTES:

  • OP has stated in comments that we do not need to worry about spaces or special characters in the csv fields otherwise ...
  • this approach will fail if the csv fields contain (white)space or characters that have a special meaning to bash

This generates:

####### --file filepath1
arg #1 :--file:
arg #2 :filepath1:

####### --file filepath1 --file filepath2
arg #1 :--file:
arg #2 :filepath1:
arg #3 :--file:
arg #4 :filepath2:

####### --file filepath1 --file filepath2 --file filepath3
arg #1 :--file:
arg #2 :filepath1:
arg #3 :--file:
arg #4 :filepath2:
arg #5 :--file:
arg #6 :filepath3:

####### --file /a/b/c.txt --file /d/e/f/g.txt --file /h.txt
arg #1 :--file:
arg #2 :/a/b/c.txt:
arg #3 :--file:
arg #4 :/d/e/f/g.txt:
arg #5 :--file:
arg #6 :/h.txt:

Using GNU awk:

awk '
    BEGIN{OFS=","; FPAT="([^,]*)|(\"[^\"]+\")"}
    {printf "program_im_using "}
    {for (i=1; i<=NF; i++) printf "--file %s ", $i}
    {print ""}
' file

Yields:

program_im_using --file filepath1 
program_im_using --file filepath1 --file filepath2 
program_im_using --file filepath1 --file filepath2 --file filepath3 
program_im_using --file filepath1 --file filepath2 --file filepath3 --file filepath4 
program_im_using --file filepath1 --file filepath2

Then, you can pipe awk to bash to execute:

awk .... | bash

Parsing CSV with awk:

https://www.gnu./software/gawk/manual/html_node/Splitting-By-Content.html

More advanced: https://www.gnu./software/gawk/manual/gawk.html#More-CSV

Bash alone offers you all the tools to proceed:

#!/usr/bin/env bash

# Dummy command to show its name and arguments for demo/testing purposes
dummyCommand() { printf '%s %s\n' "${FUNCNAME[0]}" "${*@Q}";}

shopt -s extglob # Enable extended globbing

while IFS=, read -ra csvFiles || [ "${#csvFiles[@]}" -gt 0 ]; do
    lssFiles=("${csvFiles[@]##*([[:space:]])}") # Strip leading spaces
    tssFiles=("${lssFiles[@]%%*([[:space:]])}") # Strip trailing spaces

    # Prepares the arguments array for the command call
    args=()
    for fileName in "${tssFiles[@]}"; do
        args+=(--file "$fileName")
    done
    dummyCommand "${args[@]}"
done

cat input.csv:

filepath 1
    filepath 1, filepath2
filepath 1, filepath 2,filepath3
filepath 1,  filepath 2  , filepath3,filepath 4
filepath1,filepath 2

Sample output:

dummyCommand '--file' 'filepath 1'
dummyCommand '--file' 'filepath 1' '--file' 'filepath2'
dummyCommand '--file' 'filepath 1' '--file' 'filepath 2' '--file' 'filepath3'
dummyCommand '--file' 'filepath 1' '--file' 'filepath 2' '--file' 'filepath3' '--file' 'filepath 4'
dummyCommand '--file' 'filepath1' '--file' 'filepath 2'

Alternative with Bash's Regex:

#!/usr/bin/env bash

# Dummy command to show its name and arguments for demo/testing purposes
dummyCommand() { printf '%s %s\n' "${FUNCNAME[0]}" "${*@Q}";}

while IFS=, read -ra csvFiles || [ "${#csvFiles[@]}" -gt 0 ]; do
    # Prepares the arguments array for the command call
    args=()
    for csvEntry in "${csvFiles[@]}"; do
        # Strips out leading and trailing spaces
        [[ "$csvEntry" =~ [[:space:]]*(.*[^[:space:]])[[:space:]]* ]]
        args+=(--file "${BASH_REMATCH[1]}")
    done
    dummyCommand "${args[@]}"
done

Implementation not iterating file names but directly processing array entries:

#!/usr/bin/env bash

# Dummy command to show its name and arguments for demo/testing purposes
dummyCommand() { printf '%s %s\n' "${FUNCNAME[0]}" "${*@Q}";}

shopt -s extglob # Enable extended globbing

while IFS=, read -ra csvFiles || [ "${#csvFiles[@]}" -gt 0 ]; do
    lssFiles=("${csvFiles[@]##*([[:space:]])}") # Strip leading spaces
    tssFiles=("${lssFiles[@]%%*([[:space:]])}") # Strip trailing spaces
    qFiles=("${tssFiles[@]@Q}") # Quote each file entry

    # Prepares the arguments array for the command call
    args=() && declare -a "args=(${qFiles[*]/#/--file })"

    dummyCommand "${args[@]}"
done

Step 1: args=() (Only for ShellCheck)

Purpose: Ensures args is initialized as an array before being dynamically declared. Why? ShellCheck cannot see dynamic array declarations inside declare -a, so without args=(), it might throw an error like "args appears uninitialized". Does this actually affect execution? No, it's just a safeguard for static analysis.

Step 2: declare -a "args=(${qFiles[*]/#/--file })"

This is the actual logic that fills args dynamically. Breaking it down:

(${qFiles[*]/#/--file })
${qFiles[*]} expands the array qFiles into a single space-separated string. /#/--file applies Bash pattern substitution:
# refers to the beginning of each array element. --file is prepended to each element.

"args=(${qFiles[*]/#/--file })" is a dynamic declaration of an array, where each already quoted entries of qFiles is prepended with --file to for an entry declaration for the arg array.
declare -a explicitly marks args as an array.
This ensures Bash correctly treats multiple elements rather than a single space-separated string.

发布评论

评论列表(0)

  1. 暂无评论