awk - Extract lines after string and print multiple values in single line

I have several files with a format like this

some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text

I'm trying to print for each file

the filename
the string after "This section is for " in same line
the line below the string containing (ABC)
the line below the string containing (FFG)

This is my current script (based on the answer in this thread)

awk '/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
     a{print;a=0} /\(ABC\)/{a=1}
     b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"

I'm getting this output

testfile.txt|WXYZ
901 98
1 99

And my desired output for each file would be a single line like this

testfile.txt|WXYZ|901 98|1 99

How to modify the script to get my goal? Thanks

I have several files with a format like this

some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text

I'm trying to print for each file

the filename
the string after "This section is for " in same line
the line below the string containing (ABC)
the line below the string containing (FFG)

This is my current script (based on the answer in this thread)

awk '/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
     a{print;a=0} /\(ABC\)/{a=1}
     b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"

I'm getting this output

testfile.txt|WXYZ
901 98
1 99

And my desired output for each file would be a single line like this

testfile.txt|WXYZ|901 98|1 99

How to modify the script to get my goal? Thanks

Share Improve this question asked 8 hours ago Rasec Malkic 6713 silver badges10 bronze badges

Add a comment |

3 Answers 3

Sorted by: Reset to default 1

Like this, using printf "%s" to avoid newlines:

$ awk '/This section is for/{sub(/This section is for /,""); printf "%s", FILENAME "|" $0}
     a{printf "|%s", $0;a=0} /\(ABC\)/{a=1}
     b{printf "|%s\n", $0;b=0} /\(FFG\)/{b=1}
' testfile.txt
testfile.txt|WXYZ|901 98|1 99

GNU AWK does add output row separator to content of print, which by default is newline (\n), you might change that by setting other ORS value, in this particular case, let testfile.txt content be

some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text

then

awk 'BEGIN{ORS="|"}/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
     a{print;a=0} /\(ABC\)/{a=1}
     b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"

will give

testfile.txt|WXYZ|901 98|1 99|

Observe that there is trailing | and no newline at end, this might be fixed following way

awk 'BEGIN{ORS="|"}/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
     a{print;a=0;ORS="\n"} /\(ABC\)/{a=1}
     b{print;b=0;ORS="\n"} /\(FFG\)/{b=1}
' "testfile.txt"

which gives output

testfile.txt|WXYZ|901 98|1 99

Explanation: I change ORS to newline after printing 1st of two lines, independently from which lines (a or b) comes first. If you want to know more about ORS read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

(tested in GNU Awk 5.3.1)

Whenever I have input that contains tag-value pairs I find it useful to first construct an array of those mappings (f[] below) to separate detection of the value from use of the value and then I can print, compare, modify then in any order and any combination I like just by indexing the array with their tag (name).

For example, using any awk:

awk -v OFS='|' '
    /^This section is for/ { f["sect"] = $NF }
    tag != "" { f[tag] = $0; tag = "" }
    match($0, /\([^()]+)/) { tag = substr($0,RSTART+1,RLENGTH-2) }
    END { print FILENAME, f["sect"], f["ABC"], f["FFG"] }
' "testfile.txt"
testfile.txt|WXYZ|901 98|1 99

Note that that would consistently give you 4 |-separated output fields even if any of the tags were missing from an input file.

Since you said:

I have several files ...

if you wanted to process all input files at once you could do this with GNU awk:

awk -v OFS='|' '
    /^This section is for/ { f["sect"] = $NF }
    tag != "" { f[tag] = $0; tag = "" }
    match($0, /\(([^()]+))/, a) { tag = a[1] }
    ENDFILE {
        print FILENAME, f["sect"], f["ABC"], f["FFG"]
        delete f
    }
' *.txt

or this with any awk:

awk -v OFS='|' '
    FNR == 1 { prt() }
    /^This section is for/ { f["sect"] = $NF }
    tag != "" { f[tag] = $0; tag = "" }
    match($0, /\([^()]+)/) { tag = substr($0,RSTART+1,RLENGTH-2) }
    END { prt() }

    function prt() {
        if ( prevFname != "" ) {
            print prevFname, f["sect"], f["ABC"], f["FFG"]
            delete f
        }
        prevFname = FILENAME
    }
' *.txt

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

awk - Extract lines after string and print multiple values in single line - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)