sed - Filter a line through an external program

I have a source code management system where I write annotated source and use sed to convert that into pure source, markdown documentation, and test cases.

I would like to have an annotation that allowed me to write

other text...
PR(
eval (internal);e expr env env -> val
PR)
other text...

and end up having the string inside PR tags converted into a table:

other text...
<table>
  <thead>
    <tr>
      <th colspan="2">eval (internal)</th>
    </tr>
  </thead>
  <tr>
    <td>e</td>
    <td>a Lisp expression</td>
  </tr>
  <tr>
    <td>env</td>
    <td>an Environment</td>
  </tr>
  <tr>
    <td><i>Returns:</i></td>
    <td>val</td>
  </tr>
</table>
other text...

^{editor's note (@Fravadona): the indentation doesn't matter in the expected output.}

The basic algorithm is to take the text before the ; to be the header, and the rest of the line is looked at two tokens at a time. If the first token is a name, it is put inside td as is. If it is "->", the "Returns:" text goes in the td. The second token is a key into a dictionary that goes something like this:

env   -> an Environment
val   -> a Lisp value
vals  -> some Lisp values
lvals -> a Lisp list of Lisp values
num   -> a number
nums  -> some numbers
...

Accessing the dictionary is done by keeping a key/value pair of C strings and traversing them with strcmp()..

I may have reached the end of my sed skills here, I don't even know if it is possible. I have written the conversion program myself in C, but don't know how to plug it in with sed.

I'm experimenting with the e command of sed. This works:

cat constcl.md | sed 's/\(eval (.*);.*\)/printf "%s" "$(echo "\1" | tr e i)"/e' |less

But if I try to simplify the regex or substitute my own command, it all goes bonkers.

I have a source code management system where I write annotated source and use sed to convert that into pure source, markdown documentation, and test cases.

I would like to have an annotation that allowed me to write

other text...
PR(
eval (internal);e expr env env -> val
PR)
other text...

and end up having the string inside PR tags converted into a table:

other text...
<table>
  <thead>
    <tr>
      <th colspan="2">eval (internal)</th>
    </tr>
  </thead>
  <tr>
    <td>e</td>
    <td>a Lisp expression</td>
  </tr>
  <tr>
    <td>env</td>
    <td>an Environment</td>
  </tr>
  <tr>
    <td><i>Returns:</i></td>
    <td>val</td>
  </tr>
</table>
other text...

^{editor's note (@Fravadona): the indentation doesn't matter in the expected output.}

env   -> an Environment
val   -> a Lisp value
vals  -> some Lisp values
lvals -> a Lisp list of Lisp values
num   -> a number
nums  -> some numbers
...

Accessing the dictionary is done by keeping a key/value pair of C strings and traversing them with strcmp()..

I may have reached the end of my sed skills here, I don't even know if it is possible. I have written the conversion program myself in C, but don't know how to plug it in with sed.

I'm experimenting with the e command of sed. This works:

cat constcl.md | sed 's/\(eval (.*);.*\)/printf "%s" "$(echo "\1" | tr e i)"/e' |less

But if I try to simplify the regex or substitute my own command, it all goes bonkers.

Share Improve this question edited Feb 1 at 3:07 asked Jan 31 at 11:39 Peter Lewerin 13.3k1 gold badge28 silver badges28 bronze badges

What would be the "table"(in raw format) expected from that eval ... line? – Fravadona Commented Jan 31 at 11:44
A html table, the words in the string would expand into text, like expr => "<td>a Lisp expression</td>" – Peter Lewerin Commented Jan 31 at 11:49
That isn't clear enough. Do you mean <table><tr><td>eval (internal)</td><td>e</td><td>expr</td><td>env</td><td>env</td><td>->val</td></tr></table>? – Fravadona Commented Jan 31 at 11:54
Yes exactly, except that the first td:s would be th:s, and that expr, env (the second one) and val would expand to more text and that the -> would be replaced with a <td><i>Returns:</i></td>. There would be more than one row, also. – Peter Lewerin Commented Jan 31 at 11:58
Please edit your question to add exactly what the expected output is, with some explanations when required (expansion of variables, etc...) – Fravadona Commented Jan 31 at 12:01

| Show 7 more comments

2 Answers 2

Sorted by: Reset to default 3

I have to say, sed isn't ideal for this task. An Awk/Python/Perl/etc solution is probably required.

Let's assume that your dictionary is stored in a dict.txt file with this format:

env   -> an Environment
val   -> a Lisp value
vals  -> some Lisp values
lvals -> a Lisp list of Lisp values
num   -> a number
nums  -> some numbers
expr  -> an Expression

And that your "template" in the following template.txt file:

other text...
PR(
eval (internal);e expr env env -> val
PR)
other text...

Then here's how you could expand the PR blocks using Awk.
The main idea is to load the key/values from dict.txt first, and then process template.txt to generate the HTML tables. But don't fet to escape your strings for HTML-text!!! I added a function for it.

awk '
    # remove the potential CR characters in the input line
    { gsub(/\r/, ""); }

    # load the key/values pairs from dict.txt
    # NOTE: NR is equal to FNR only while processing the first file
    NR == FNR {
        if (match($0, /[[:space:]]*->[[:space:]]*/))
            dict[substr($0, 1, RSTART-1)] = substr($0, RSTART+RLENGTH);
        next;
    }

    # expand the PR blocks as HTML tables in the remainder file(s)
    $1 == "PR(" { inside_pr_block = 1; next; }
    $1 == "PR)" { inside_pr_block = 0; next; }
    inside_pr_block {
        if (match($0, /;/)) {
            printf "<table>";
            th = substr($0, 1, RSTART-1);
            printf "<thead><tr colspan=2><th>%s</th></tr></thead>", \
                html_textify(th);
            $0 = substr($0, RSTART+RLENGTH);
            for (i = 1; i <= NF; i += 2) {
                td1 = ($i == "->" ? "Returns:" : $i);
                td2 = dict[$(i+1)];
                printf "<tr><td>%s</td><td>%s</td></tr>", \
                    html_textify(td1), html_textify(td2);
            }
            print "</table>";
        }
        next;
    }

    # output non PR lines
    { print; }

    # minimalist function that encodes a string as HTML text
    function html_textify(str) {
        gsub(/&/, "\\&amp;", str);
        gsub(/</, "\\&lt;", str);
        gsub(/>/, "\\&gt;", str);
        return str;
    }
' dict.txt template.txt

With the given input files, Awk outputs (the indentation is added by me):

other text...
<table>
  <thead>
    <tr colspan=2>
      <th>eval (internal)</th>
    </tr>
  </thead>
  <tr>
    <td>e</td>
    <td>an Expression</td>
  </tr>
  <tr>
    <td>env</td>
    <td>an Environment</td>
  </tr>
  <tr>
    <td>Returns:</td>
    <td>a Lisp value</td>
  </tr>
</table>
other text...

This is my solution, based on @Fravadona's answer.

I put the Awk source, plus my own tweaks, in a file called prototype.awk:

# due to Fravadona at stackoverflow
# 
# remove the potential CR characters in the input line
{ gsub(/\r/, ""); }

# load the key/values pairs from dict.txt
# NOTE: NR is equal to FNR only while processing the first file
NR == FNR {
    if (match($0, /[[:space:]]*->[[:space:]]*/))
        dict[substr($0, 1, RSTART-1)] = substr($0, RSTART+RLENGTH);
    next;
}

# expand the PR blocks as HTML tables in the remainder file(s)
$1 == "PR(" { inside_pr_block = 1; next; }
$1 == "PR)" { inside_pr_block = 0; next; }
inside_pr_block {
    if (match($0, /;/)) {
        printf "<table border=1>";
        th = substr($0, 1, RSTART-1);
        printf "<thead><tr><th colspan=2 align=\"left\">%s</th></tr></thead>", html_textify(th);
        $0 = substr($0, RSTART+RLENGTH);
        for (i = 1; i <= NF; i += 2) {
            td1 = ($i == "->" ? "Returns:" : $i);
            td2 = dict[$(i+1)];
            if (td1 == "Returns:")
                printf "<tr><td><i>%s</i></td><td>%s</td></tr>", html_textify(td1), html_textify(td2);
            else
                printf "<tr><td>%s</td><td>%s</td></tr>", html_textify(td1), html_textify(td2);
        }
        print "</table>";
    }
    next;
}

# output non PR lines
{ print; }

# minimalist function that encodes a string as HTML text
function html_textify(str) {
    gsub(/&/, "\\&amp;", str);
    gsub(/</, "\\&lt;", str);
    gsub(/>/, "\\&gt;", str);
    return str;
}

Then I used it in a make rule like this:

README.md: top.md constcl.md
    awk -f prototype.awk dict.txt $^ >$@

constcl.md itself is built like this:

constcl.md: $(source_files)
    cat $^ |sed -e s/^CB/\`\`\`/g -e /^MD/d -e /^TT/,/^TT/d >$@

with source_files being a bunch of annotated source files.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

sed - Filter a line through an external program - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)