I want to modify Markdown files programmatically.
I have been looking into Markdown parsers and tried a few of them; namely Marked, Markdown-it and Commonmark. They give access to an AST, which allows me to modify the content easily.
The problem is that they render to HTML only. I couldn't find any info on rendering back to Markdown.
I see two options right now, either write a custom renderer for one of these libraries (which would be quite time consuming) or use a separate tool that transforms HTML back to Markdown.
Is there an easier alternative? And why would a Markdown parser only render to HTML?
I want to modify Markdown files programmatically.
I have been looking into Markdown parsers and tried a few of them; namely Marked, Markdown-it and Commonmark. They give access to an AST, which allows me to modify the content easily.
The problem is that they render to HTML only. I couldn't find any info on rendering back to Markdown.
I see two options right now, either write a custom renderer for one of these libraries (which would be quite time consuming) or use a separate tool that transforms HTML back to Markdown.
Is there an easier alternative? And why would a Markdown parser only render to HTML?
Share Improve this question edited Jun 23, 2021 at 12:54 Inigo 15.1k5 gold badges50 silver badges81 bronze badges asked Jun 1, 2021 at 23:55 user16100351user16100351 3012 silver badges6 bronze badges 3- 1 You sound confused. Markdown is plain text. You should have plenty of options for editing that. Markdown parsers render html because what that's exactly what they do; they take plain text and translate it to html. – jmargolisvt Commented Jun 2, 2021 at 0:35
- 3 @jmargolisvt there are real reasons someone might want to operate on an AST instead of the source file, that said... OP should include some more information about their use-case, because converting to html and back does sound insane. – Evert Commented Jun 2, 2021 at 0:42
- 3 @jmargolisvt He did not sound confused at all. I edited the question to improve the English. – Inigo Commented Jun 23, 2021 at 12:55
3 Answers
Reset to default 6The best alternative is what you wanted to do in the first place!
There are many Markdown parsers that produce ASTs, and a good number of those can render it back to Markdown!
And why would a Markdown parser only render to HTML?
The reason a lot of them do is because the number one use of Markdown is as source code for HTML. Markdown was even designed for that in the first place. So the most mon use of a Markdown parser, including cases where people want to first manipulate the AST, is to output HTML.
That said, the really good libraries include the option to render to other formats, including back to Markdown.
Here are the libraries that I already know can do this:
Pandoc
Probably the number one Markdown toolkit in the world. Pandoc's native language is Haskell, but there are Javascript wrappers (just search npm). If you're going to do a lot of Markdown stuff down the road, it probably makes sense to bee knowledgable in Pandoc anyway.
Its support for filters" is all about AST manipulation. It has special support for Lua and Lua filters, which might be the easiest to code, but you can also write filters in other languages: Python, PHP, Perl, Javascript/Typescript, Groovy, Ruby.
It supports renderer to Markdown, amongst a huge number of other formats.
Its parser and renderer has many other options that might make your job even easier, or maybe already do exactly what you want. There are also many filters people have written that may already do what you want.
CMark
Though this reference implementation of CommonMark is written in C, there are many Node wrappers. There is even a port to JavaScript using Emscripten. It ports the GitHub extensions, so that tables and other GFM things can also be manipulated in the AST.
It can output CommonMark, as well as HTML and LaTeX, or even an XML representation of the AST.
remark
A Javascript-based framework specifically designed around AST manipulation. I've never used it, but it has tools to make a variety of AST manipulation easier, and many plugins, e.g. to support GFM, MDX, front-matter, etc. See its README for more info on it and the entire remark/mdast/unified ecosystem.
See the answer that gives example usage: https://stackoverflow./a/78969216/8910547
I just found mdast-util-from-markdown which seems to do the trick. Then you can convert it back to a string with mdast-util-to-markdown. mdast is basically a markdown syntax tree specification.
With remark
Here is an example of generating the Markdown Abstract Syntax Tree (MDAST) for a markdow file example.md using remark
:
import fs from 'fs';
import { remark } from 'remark';
import remarkParse from 'remark-parse';
const filePath = process.argv[2] || './example.md';
const markdownContent = fs.readFileSync(filePath, 'utf-8');
const ast = remark()
.use(remarkParse, {
gfm: true, // Enable GitHub Flavored Markdown
})
.parse(markdownContent);
console.log(JSON.stringify(ast, null, 2));
With mdast-util-from-markdown
Here is an example of how to produce the MDAST using mdast-util-from-markdown for this input file frontmatterplusmath.md:
import fs from 'node:fs/promises'
import {frontmatter} from 'micromark-extension-frontmatter'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {frontmatterFromMarkdown, frontmatterToMarkdown} from 'mdast-util-frontmatter'
import {toMarkdown} from 'mdast-util-to-markdown'
import {math} from 'micromark-extension-math'
import {mathFromMarkdown, mathToMarkdown} from 'mdast-util-math'
const doc = await fs.readFile('frontmatterplusmath.md')
const tree = fromMarkdown(doc, {
extensions: [
math(),
frontmatter(['yaml', 'toml']),
],
mdastExtensions: [
frontmatterFromMarkdown(['yaml', 'toml']),
mathFromMarkdown()
]
})
console.log(JSON.stringify(tree, null, 2))