最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

awk - Extract lines after string and print multiple values in single line - Stack Overflow

programmeradmin2浏览0评论

I have several files with a format like this

some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text

I'm trying to print for each file

  1. the filename
  2. the string after "This section is for " in same line
  3. the line below the string containing (ABC)
  4. the line below the string containing (FFG)

This is my current script (based on the answer in this thread)

awk '/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
     a{print;a=0} /\(ABC\)/{a=1}
     b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"

I'm getting this output

testfile.txt|WXYZ
901 98
1 99

And my desired output for each file would be a single line like this

testfile.txt|WXYZ|901 98|1 99

How to modify the script to get my goal? Thanks

I have several files with a format like this

some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text

I'm trying to print for each file

  1. the filename
  2. the string after "This section is for " in same line
  3. the line below the string containing (ABC)
  4. the line below the string containing (FFG)

This is my current script (based on the answer in this thread)

awk '/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
     a{print;a=0} /\(ABC\)/{a=1}
     b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"

I'm getting this output

testfile.txt|WXYZ
901 98
1 99

And my desired output for each file would be a single line like this

testfile.txt|WXYZ|901 98|1 99

How to modify the script to get my goal? Thanks

Share Improve this question asked 8 hours ago Rasec MalkicRasec Malkic 6713 silver badges10 bronze badges
Add a comment  | 

3 Answers 3

Reset to default 1

Like this, using printf "%s" to avoid newlines:

$ awk '/This section is for/{sub(/This section is for /,""); printf "%s", FILENAME "|" $0}
     a{printf "|%s", $0;a=0} /\(ABC\)/{a=1}
     b{printf "|%s\n", $0;b=0} /\(FFG\)/{b=1}
' testfile.txt
testfile.txt|WXYZ|901 98|1 99

GNU AWK does add output row separator to content of print, which by default is newline (\n), you might change that by setting other ORS value, in this particular case, let testfile.txt content be

some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text

then

awk 'BEGIN{ORS="|"}/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
     a{print;a=0} /\(ABC\)/{a=1}
     b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"

will give

testfile.txt|WXYZ|901 98|1 99|

Observe that there is trailing | and no newline at end, this might be fixed following way

awk 'BEGIN{ORS="|"}/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
     a{print;a=0;ORS="\n"} /\(ABC\)/{a=1}
     b{print;b=0;ORS="\n"} /\(FFG\)/{b=1}
' "testfile.txt"

which gives output

testfile.txt|WXYZ|901 98|1 99

Explanation: I change ORS to newline after printing 1st of two lines, independently from which lines (a or b) comes first. If you want to know more about ORS read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

(tested in GNU Awk 5.3.1)

Whenever I have input that contains tag-value pairs I find it useful to first construct an array of those mappings (f[] below) to separate detection of the value from use of the value and then I can print, compare, modify then in any order and any combination I like just by indexing the array with their tag (name).

For example, using any awk:

awk -v OFS='|' '
    /^This section is for/ { f["sect"] = $NF }
    tag != "" { f[tag] = $0; tag = "" }
    match($0, /\([^()]+)/) { tag = substr($0,RSTART+1,RLENGTH-2) }
    END { print FILENAME, f["sect"], f["ABC"], f["FFG"] }
' "testfile.txt"
testfile.txt|WXYZ|901 98|1 99

Note that that would consistently give you 4 |-separated output fields even if any of the tags were missing from an input file.

Since you said:

I have several files ...

if you wanted to process all input files at once you could do this with GNU awk:

awk -v OFS='|' '
    /^This section is for/ { f["sect"] = $NF }
    tag != "" { f[tag] = $0; tag = "" }
    match($0, /\(([^()]+))/, a) { tag = a[1] }
    ENDFILE {
        print FILENAME, f["sect"], f["ABC"], f["FFG"]
        delete f
    }
' *.txt

or this with any awk:

awk -v OFS='|' '
    FNR == 1 { prt() }
    /^This section is for/ { f["sect"] = $NF }
    tag != "" { f[tag] = $0; tag = "" }
    match($0, /\([^()]+)/) { tag = substr($0,RSTART+1,RLENGTH-2) }
    END { prt() }

    function prt() {
        if ( prevFname != "" ) {
            print prevFname, f["sect"], f["ABC"], f["FFG"]
            delete f
        }
        prevFname = FILENAME
    }
' *.txt
发布评论

评论列表(0)

  1. 暂无评论