awk - How to find a chunk of string that starts with $PATTERN and ends with $LINEBREAK

I am trying to remove a chunk of text from 100s of files. The amount of lines to remove is not the same for each one, but they do all start with the same string. I am looking to remove the entire chunk of text from $PATTERN ## Table, then select ALL of the text until a line break occurs. I am pretty "ok" with sed but not getting anywhere with Googling.

text.txt contents:

Random text.

## Table of Contents
*   [Design System](#<b>Design-System</b>)
*   [Multisite Designs](#Multisite)
    *   [Recommended Changes Per Site](#mcetoc_123)
    *   [Recommended Consistent Elements Across Sites](#mcetoc_456)
*   [Templates v. Pages](#Templates-v-Pages)
*   [Layouts](#mcetoc_dfghj)
*   [Features](#Features)
    *   [Something](#mcetoc_khjvfsdfsd)
    *   [Feature](#mcetoc_dfwyduegwu)
    *   [See More Logic](#mcetoc_fdsfsdugfs)
*   [Advertising, and Print](#Advertising-and-Print)
*   [Images](#Images)
    *   [Featured Image)](#mcetoc_fghjk) 
*   [Videos](#Videos)
    *   [Video Art](#mcetoc_4567890)
*   [Author and Search Pages](#Author-Tag-and-Search-Pages)
*   [Accessibility](#Accessibility)

Then here is some text.

Here is more text!

## Prerequisites

text.txt desired outcome:

Random text.


Then here is some text.

Here is more text!

## Prerequisites

I don't care about cleaning up extra line breaks, just deleting the chunk starting with ## Table and ending with the first line break.

I had no idea where to even begin, so I posted here. The first answer I got resulted in:

sed -e '/^\s*[*#]/d'

Random text.

    *   [Recommended Changes Per Site](#mcetoc_123)
    *   [Recommended Consistent Elements Across Sites](#mcetoc_456)
    *   [Something](#mcetoc_khjvfsdfsd)
    *   [Feature](#mcetoc_dfwyduegwu)
    *   [See More Logic](#mcetoc_fdsfsdugfs)
    *   [Featured Image)](#mcetoc_fghjk)
    *   [Video Art](#mcetoc_4567890)

Then here is some text.

Here is more text!

awk '/^## Table/ {skip=1} NF==0 {skip=0} !skip' worked as well as sed -i '' '/## Table/,/^$/d', and resulted in:

Random text.


Then here is some text.

Here is more text!

## Prerequisites

text.txt contents:

Random text.

## Table of Contents
*   [Design System](#<b>Design-System</b>)
*   [Multisite Designs](#Multisite)
    *   [Recommended Changes Per Site](#mcetoc_123)
    *   [Recommended Consistent Elements Across Sites](#mcetoc_456)
*   [Templates v. Pages](#Templates-v-Pages)
*   [Layouts](#mcetoc_dfghj)
*   [Features](#Features)
    *   [Something](#mcetoc_khjvfsdfsd)
    *   [Feature](#mcetoc_dfwyduegwu)
    *   [See More Logic](#mcetoc_fdsfsdugfs)
*   [Advertising, and Print](#Advertising-and-Print)
*   [Images](#Images)
    *   [Featured Image)](#mcetoc_fghjk) 
*   [Videos](#Videos)
    *   [Video Art](#mcetoc_4567890)
*   [Author and Search Pages](#Author-Tag-and-Search-Pages)
*   [Accessibility](#Accessibility)

Then here is some text.

Here is more text!

## Prerequisites

text.txt desired outcome:

Random text.


Then here is some text.

Here is more text!

## Prerequisites

I don't care about cleaning up extra line breaks, just deleting the chunk starting with ## Table and ending with the first line break.

I had no idea where to even begin, so I posted here. The first answer I got resulted in:

sed -e '/^\s*[*#]/d'

Random text.

    *   [Recommended Changes Per Site](#mcetoc_123)
    *   [Recommended Consistent Elements Across Sites](#mcetoc_456)
    *   [Something](#mcetoc_khjvfsdfsd)
    *   [Feature](#mcetoc_dfwyduegwu)
    *   [See More Logic](#mcetoc_fdsfsdugfs)
    *   [Featured Image)](#mcetoc_fghjk)
    *   [Video Art](#mcetoc_4567890)

Then here is some text.

Here is more text!

awk '/^## Table/ {skip=1} NF==0 {skip=0} !skip' worked as well as sed -i '' '/## Table/,/^$/d', and resulted in:

Random text.


Then here is some text.

Here is more text!

## Prerequisites

Share Improve this question edited Nov 20, 2024 at 16:46 asked Nov 19, 2024 at 21:19 khaos119 494 bronze badges

2 consider updating the question with some of your sed attempts and the (wrong) results – markp-fuso Commented Nov 19, 2024 at 23:59

Add a comment |

8 Answers 8

Sorted by: Reset to default 5

sed is great for doing s/old/new/ on individual lines but for anything else, just use awk, e.g. using any awk:

$ awk -v RS= -v ORS='\n\n' '!/^## Table/' test.txt
Random text.

Then here is some text.

Here is more text!

## Prerequisites

Since the patterns are mutually exclusive, you can use an address range:

sed '/## Table/,/^$/d' infile >outfile

Or, if your definition of "linebreak" includes whitespace:

sed '/## Table/,/^[[:space:]]*$/d' infile >outfile

Many versions of sed support non-standard pseudo-in-place editing with a -i option.

One awk idea:

awk '
/^## Table/ { skip = 1 }             # if line starts with '## Table' then enable/set skip flag
NF==0       { skip = 0 }             # if empty/blank line then disable/clear skip flag
!skip                                # if skip flag not set (ie, skip == 0) then print current line
' test.txt

#######
# or as a one-liner

awk '/^## Table/ {skip=1} NF==0 {skip=0} !skip' test.txt

This generates:

Random text.


Then here is some text.

Here is more text!

## Prerequisites

For awk flavors that support ranges:

awk '
/^## Table/,NF==0 {next}     # if current line falls within the range of lines defined
                             # by "^## Table" and a blank line then skip it else ...

1                            # print current line
' test.txt

#######
# or as a one-liner

awk '/^## Table/,NF==0 {next} 1' test.txt

This generates:

Random text.

Then here is some text.

Here is more text!

## Prerequisites

NOTES:

this removes the matching blank line whereas OP's expected output shows the blank line is to be maintained
if OP wishes to keep the blank line then one option would be to replace {next} with {if (NF!=0) next}

This might work for you (GNU sed):

sed '/## Table/{:a;N;/^$/M!ba;d}' file

Which will remove all lines in the range, or:

sed '/## Table/{:a;N;/^$/M!ba;z}' file

Which will remove all lines except the last newline.

Other perhaps easier though potentially wrong (this will delete every line following the start of the range even if the end of the range does not exist):

sed '/## Table/,/^$/d' file

sed '/## Table/,/^$/!b;/./d' file

no idea which of the manysed commands you found you have tried, but...

what is the actual pattern for you matching strings? for simplicity, i assume it's the start of the line, followed by an arbitrary amount of whitespace, followed by an asterisk or a hash. this would be a regular expression ^\s*[*#]

So we build a sed-expression that deletes all lines matching this pattern:

sed -e '/^\s*[*#]/d' test.txt

If your files do not have any NUL byte inside then you might exploit -z option of GNU sed following way, let text.txt content be

Random text.

## Table of Contents
*   [Design System](#<b>Design-System</b>)
*   [Multisite Designs](#Multisite)
    *   [Recommended Changes Per Site](#mcetoc_123)
    *   [Recommended Consistent Elements Across Sites](#mcetoc_456)
*   [Templates v. Pages](#Templates-v-Pages)
*   [Layouts](#mcetoc_dfghj)
*   [Features](#Features)
    *   [Something](#mcetoc_khjvfsdfsd)
    *   [Feature](#mcetoc_dfwyduegwu)
    *   [See More Logic](#mcetoc_fdsfsdugfs)
*   [Advertising, and Print](#Advertising-and-Print)
*   [Images](#Images)
    *   [Featured Image)](#mcetoc_fghjk) 
*   [Videos](#Videos)
    *   [Video Art](#mcetoc_4567890)
*   [Author and Search Pages](#Author-Tag-and-Search-Pages)
*   [Accessibility](#Accessibility)

Then here is some text.

Here is more text!

## Prerequisites

then

sed -z 's/## Table[^\n]*\n\([^\n][^\n]*\n\)*//' text.txt

gives output

Random text.


Then here is some text.

Here is more text!

## Prerequisites

Explanation: I use -z which prompts GNU sed to treat NUL byte as separating lines, if there is not such byte in file then it is treated as one giant line. Then I remove ## Table followed by zero-or-more non-newline characters followed by non-newline character and trailing (0 or more) at least 1 non-newline characters followed by newline characters, that is non-empty lines (line understand as newline-separated entity).

(tested in GNU sed 4.8)

Perl is unmatched with multiline replacements:

perl -0777 -lpe 's/^## Table [\s\S]*?(?=^\s*$)//gm' file.txt

And Ruby supports that same approach:

ruby -0777 -lpe 'gsub(/^## Table [\s\S]*?(?=^\s*$)/,"")' file.txt

With the OP example input, both print:

Random text.


Then here is some text.

Here is more text!

## Prerequisites

just use 1 action-implicit pattern with awk :

awk '_ = /^## Table/ < (_^NF || NR == !_)'

# gawk profile, created Tue Nov 26 20:05:51 2024

# Rule(s)

  26  _ = /^## Table/ < (_^NF || NR == !_) { # 8
   8        print
      }

 1  Random text.
 2  
 3  
 4  Then here is some text.
 5  
 6  Here is more text!
 7  
 8  ## Prerequisites

_^NF is a combo short-hand for "either skip flag is currently false, or line is blank".

Instead of a skip flag, the state is tracked by a "keep" flag that leverages these exponentiation properties :

1^anything := 1 + 0^0 := 1

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

awk - How to find a chunk of string that starts with $PATTERN and ends with $LINEBREAK - Stack Overflow

8 Answers 8

与本文相关的文章

评论列表(0)