I am trying to remove a chunk of text from 100s of files. The amount of lines to remove is not the same for each one, but they do all start with the same string. I am looking to remove the entire chunk of text from $PATTERN ## Table
, then select ALL of the text until a line break occurs. I am pretty "ok" with sed
but not getting anywhere with Googling.
text.txt contents:
Random text.
## Table of Contents
* [Design System](#<b>Design-System</b>)
* [Multisite Designs](#Multisite)
* [Recommended Changes Per Site](#mcetoc_123)
* [Recommended Consistent Elements Across Sites](#mcetoc_456)
* [Templates v. Pages](#Templates-v-Pages)
* [Layouts](#mcetoc_dfghj)
* [Features](#Features)
* [Something](#mcetoc_khjvfsdfsd)
* [Feature](#mcetoc_dfwyduegwu)
* [See More Logic](#mcetoc_fdsfsdugfs)
* [Advertising, and Print](#Advertising-and-Print)
* [Images](#Images)
* [Featured Image)](#mcetoc_fghjk)
* [Videos](#Videos)
* [Video Art](#mcetoc_4567890)
* [Author and Search Pages](#Author-Tag-and-Search-Pages)
* [Accessibility](#Accessibility)
Then here is some text.
Here is more text!
## Prerequisites
text.txt desired outcome:
Random text.
Then here is some text.
Here is more text!
## Prerequisites
I don't care about cleaning up extra line breaks, just deleting the chunk starting with ## Table
and ending with the first line break.
I had no idea where to even begin, so I posted here. The first answer I got resulted in:
sed -e '/^\s*[*#]/d'
Random text.
* [Recommended Changes Per Site](#mcetoc_123)
* [Recommended Consistent Elements Across Sites](#mcetoc_456)
* [Something](#mcetoc_khjvfsdfsd)
* [Feature](#mcetoc_dfwyduegwu)
* [See More Logic](#mcetoc_fdsfsdugfs)
* [Featured Image)](#mcetoc_fghjk)
* [Video Art](#mcetoc_4567890)
Then here is some text.
Here is more text!
awk '/^## Table/ {skip=1} NF==0 {skip=0} !skip'
worked as well as sed -i '' '/## Table/,/^$/d'
, and resulted in:
Random text.
Then here is some text.
Here is more text!
## Prerequisites
I am trying to remove a chunk of text from 100s of files. The amount of lines to remove is not the same for each one, but they do all start with the same string. I am looking to remove the entire chunk of text from $PATTERN ## Table
, then select ALL of the text until a line break occurs. I am pretty "ok" with sed
but not getting anywhere with Googling.
text.txt contents:
Random text.
## Table of Contents
* [Design System](#<b>Design-System</b>)
* [Multisite Designs](#Multisite)
* [Recommended Changes Per Site](#mcetoc_123)
* [Recommended Consistent Elements Across Sites](#mcetoc_456)
* [Templates v. Pages](#Templates-v-Pages)
* [Layouts](#mcetoc_dfghj)
* [Features](#Features)
* [Something](#mcetoc_khjvfsdfsd)
* [Feature](#mcetoc_dfwyduegwu)
* [See More Logic](#mcetoc_fdsfsdugfs)
* [Advertising, and Print](#Advertising-and-Print)
* [Images](#Images)
* [Featured Image)](#mcetoc_fghjk)
* [Videos](#Videos)
* [Video Art](#mcetoc_4567890)
* [Author and Search Pages](#Author-Tag-and-Search-Pages)
* [Accessibility](#Accessibility)
Then here is some text.
Here is more text!
## Prerequisites
text.txt desired outcome:
Random text.
Then here is some text.
Here is more text!
## Prerequisites
I don't care about cleaning up extra line breaks, just deleting the chunk starting with ## Table
and ending with the first line break.
I had no idea where to even begin, so I posted here. The first answer I got resulted in:
sed -e '/^\s*[*#]/d'
Random text.
* [Recommended Changes Per Site](#mcetoc_123)
* [Recommended Consistent Elements Across Sites](#mcetoc_456)
* [Something](#mcetoc_khjvfsdfsd)
* [Feature](#mcetoc_dfwyduegwu)
* [See More Logic](#mcetoc_fdsfsdugfs)
* [Featured Image)](#mcetoc_fghjk)
* [Video Art](#mcetoc_4567890)
Then here is some text.
Here is more text!
awk '/^## Table/ {skip=1} NF==0 {skip=0} !skip'
worked as well as sed -i '' '/## Table/,/^$/d'
, and resulted in:
Random text.
Then here is some text.
Here is more text!
## Prerequisites
Share
Improve this question
edited Nov 20, 2024 at 16:46
khaos119
asked Nov 19, 2024 at 21:19
khaos119khaos119
494 bronze badges
1
|
8 Answers
Reset to default 5sed is great for doing s/old/new/ on individual lines but for anything else, just use awk, e.g. using any awk:
$ awk -v RS= -v ORS='\n\n' '!/^## Table/' test.txt
Random text.
Then here is some text.
Here is more text!
## Prerequisites
Since the patterns are mutually exclusive, you can use an address range:
sed '/## Table/,/^$/d' infile >outfile
Or, if your definition of "linebreak" includes whitespace:
sed '/## Table/,/^[[:space:]]*$/d' infile >outfile
Many versions of sed support non-standard pseudo-in-place editing with a -i
option.
One awk
idea:
awk '
/^## Table/ { skip = 1 } # if line starts with '## Table' then enable/set skip flag
NF==0 { skip = 0 } # if empty/blank line then disable/clear skip flag
!skip # if skip flag not set (ie, skip == 0) then print current line
' test.txt
#######
# or as a one-liner
awk '/^## Table/ {skip=1} NF==0 {skip=0} !skip' test.txt
This generates:
Random text.
Then here is some text.
Here is more text!
## Prerequisites
For awk
flavors that support ranges:
awk '
/^## Table/,NF==0 {next} # if current line falls within the range of lines defined
# by "^## Table" and a blank line then skip it else ...
1 # print current line
' test.txt
#######
# or as a one-liner
awk '/^## Table/,NF==0 {next} 1' test.txt
This generates:
Random text.
Then here is some text.
Here is more text!
## Prerequisites
NOTES:
- this removes the matching blank line whereas OP's expected output shows the blank line is to be maintained
- if OP wishes to keep the blank line then one option would be to replace
{next}
with{if (NF!=0) next}
This might work for you (GNU sed):
sed '/## Table/{:a;N;/^$/M!ba;d}' file
Which will remove all lines in the range, or:
sed '/## Table/{:a;N;/^$/M!ba;z}' file
Which will remove all lines except the last newline.
Other perhaps easier though potentially wrong (this will delete every line following the start of the range even if the end of the range does not exist):
sed '/## Table/,/^$/d' file
Or
sed '/## Table/,/^$/!b;/./d' file
no idea which of the manysed
commands you found you have tried, but...
what is the actual pattern for you matching strings?
for simplicity, i assume it's the start of the line, followed by an arbitrary amount of whitespace, followed by an asterisk or a hash.
this would be a regular expression ^\s*[*#]
So we build a sed-expression that deletes all lines matching this pattern:
sed -e '/^\s*[*#]/d' test.txt
If your files do not have any NUL byte inside then you might exploit -z
option of GNU sed
following way, let text.txt
content be
Random text.
## Table of Contents
* [Design System](#<b>Design-System</b>)
* [Multisite Designs](#Multisite)
* [Recommended Changes Per Site](#mcetoc_123)
* [Recommended Consistent Elements Across Sites](#mcetoc_456)
* [Templates v. Pages](#Templates-v-Pages)
* [Layouts](#mcetoc_dfghj)
* [Features](#Features)
* [Something](#mcetoc_khjvfsdfsd)
* [Feature](#mcetoc_dfwyduegwu)
* [See More Logic](#mcetoc_fdsfsdugfs)
* [Advertising, and Print](#Advertising-and-Print)
* [Images](#Images)
* [Featured Image)](#mcetoc_fghjk)
* [Videos](#Videos)
* [Video Art](#mcetoc_4567890)
* [Author and Search Pages](#Author-Tag-and-Search-Pages)
* [Accessibility](#Accessibility)
Then here is some text.
Here is more text!
## Prerequisites
then
sed -z 's/## Table[^\n]*\n\([^\n][^\n]*\n\)*//' text.txt
gives output
Random text.
Then here is some text.
Here is more text!
## Prerequisites
Explanation: I use -z
which prompts GNU sed
to treat NUL byte as separating lines, if there is not such byte in file then it is treated as one giant line. Then I remove ## Table
followed by zero-or-more non-newline characters followed by non-newline character and trailing (0 or more) at least 1 non-newline characters followed by newline characters, that is non-empty lines (line understand as newline-separated entity).
(tested in GNU sed 4.8)
Perl is unmatched with multiline replacements:
perl -0777 -lpe 's/^## Table [\s\S]*?(?=^\s*$)//gm' file.txt
And Ruby supports that same approach:
ruby -0777 -lpe 'gsub(/^## Table [\s\S]*?(?=^\s*$)/,"")' file.txt
With the OP example input, both print:
Random text.
Then here is some text.
Here is more text!
## Prerequisites
just use 1 action-implicit pattern with awk
:
awk '_ = /^## Table/ < (_^NF || NR == !_)'
# gawk profile, created Tue Nov 26 20:05:51 2024
# Rule(s)
26 _ = /^## Table/ < (_^NF || NR == !_) { # 8
8 print
}
1 Random text.
2
3
4 Then here is some text.
5
6 Here is more text!
7
8 ## Prerequisites
_^NF
is a combo short-hand for "either skip flag is currently false, or line is blank".
Instead of a skip flag, the state is tracked by a "keep" flag that leverages these exponentiation properties :
1^anything := 1
+0^0 := 1
sed
attempts and the (wrong) results – markp-fuso Commented Nov 19, 2024 at 23:59