I have several files with a format like this
some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text
I'm trying to print for each file
- the filename
- the string after "This section is for " in same line
- the line below the string containing
(ABC)
- the line
below the string containing
(FFG)
This is my current script (based on the answer in this thread)
awk '/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
a{print;a=0} /\(ABC\)/{a=1}
b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"
I'm getting this output
testfile.txt|WXYZ
901 98
1 99
And my desired output for each file would be a single line like this
testfile.txt|WXYZ|901 98|1 99
How to modify the script to get my goal? Thanks
I have several files with a format like this
some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text
I'm trying to print for each file
- the filename
- the string after "This section is for " in same line
- the line below the string containing
(ABC)
- the line
below the string containing
(FFG)
This is my current script (based on the answer in this thread)
awk '/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
a{print;a=0} /\(ABC\)/{a=1}
b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"
I'm getting this output
testfile.txt|WXYZ
901 98
1 99
And my desired output for each file would be a single line like this
testfile.txt|WXYZ|901 98|1 99
How to modify the script to get my goal? Thanks
Share Improve this question asked 8 hours ago Rasec MalkicRasec Malkic 6713 silver badges10 bronze badges3 Answers
Reset to default 1Like this, using printf "%s"
to avoid newlines:
$ awk '/This section is for/{sub(/This section is for /,""); printf "%s", FILENAME "|" $0}
a{printf "|%s", $0;a=0} /\(ABC\)/{a=1}
b{printf "|%s\n", $0;b=0} /\(FFG\)/{b=1}
' testfile.txt
testfile.txt|WXYZ|901 98|1 99
GNU AWK
does add output row separator to content of print
, which by default is newline (\n
), you might change that by setting other ORS
value, in this particular case, let testfile.txt
content be
some text
some text
This section is for WXYZ
some text
some text
some text
some text
some text
some text (ABC) some text (CDF)
901 98
some text FFG
some text (FFG)
1 99
some text
some text
then
awk 'BEGIN{ORS="|"}/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
a{print;a=0} /\(ABC\)/{a=1}
b{print;b=0} /\(FFG\)/{b=1}
' "testfile.txt"
will give
testfile.txt|WXYZ|901 98|1 99|
Observe that there is trailing |
and no newline at end, this might be fixed following way
awk 'BEGIN{ORS="|"}/This section is for/{sub(/This section is for /,""); print FILENAME "|" $0}
a{print;a=0;ORS="\n"} /\(ABC\)/{a=1}
b{print;b=0;ORS="\n"} /\(FFG\)/{b=1}
' "testfile.txt"
which gives output
testfile.txt|WXYZ|901 98|1 99
Explanation: I change ORS
to newline after print
ing 1st of two lines, independently from which lines (a
or b
) comes first. If you want to know more about ORS read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
(tested in GNU Awk 5.3.1)
Whenever I have input that contains tag-value pairs I find it useful to first construct an array of those mappings (f[]
below) to separate detection of the value from use of the value and then I can print, compare, modify then in any order and any combination I like just by indexing the array with their tag (name).
For example, using any awk:
awk -v OFS='|' '
/^This section is for/ { f["sect"] = $NF }
tag != "" { f[tag] = $0; tag = "" }
match($0, /\([^()]+)/) { tag = substr($0,RSTART+1,RLENGTH-2) }
END { print FILENAME, f["sect"], f["ABC"], f["FFG"] }
' "testfile.txt"
testfile.txt|WXYZ|901 98|1 99
Note that that would consistently give you 4 |
-separated output fields even if any of the tags were missing from an input file.
Since you said:
I have several files ...
if you wanted to process all input files at once you could do this with GNU awk:
awk -v OFS='|' '
/^This section is for/ { f["sect"] = $NF }
tag != "" { f[tag] = $0; tag = "" }
match($0, /\(([^()]+))/, a) { tag = a[1] }
ENDFILE {
print FILENAME, f["sect"], f["ABC"], f["FFG"]
delete f
}
' *.txt
or this with any awk:
awk -v OFS='|' '
FNR == 1 { prt() }
/^This section is for/ { f["sect"] = $NF }
tag != "" { f[tag] = $0; tag = "" }
match($0, /\([^()]+)/) { tag = substr($0,RSTART+1,RLENGTH-2) }
END { prt() }
function prt() {
if ( prevFname != "" ) {
print prevFname, f["sect"], f["ABC"], f["FFG"]
delete f
}
prevFname = FILENAME
}
' *.txt