As you all know, external resources, like images, can be embedded into the html file using base64 encoding:
<img src="data:image/png;base64,iVBORw0KGgoAAAANS..." />
I'm looking for a pure browser-based javascript way to traverse an html page and embed all the external resources into the file so when I say $("html").html()
, it returns all the page's contents. Even including its external resources.
Just so it makes sense, I'm trying to download web pages into single files using a headless browser on my server.
As you all know, external resources, like images, can be embedded into the html file using base64 encoding:
<img src="data:image/png;base64,iVBORw0KGgoAAAANS..." />
I'm looking for a pure browser-based javascript way to traverse an html page and embed all the external resources into the file so when I say $("html").html()
, it returns all the page's contents. Even including its external resources.
Just so it makes sense, I'm trying to download web pages into single files using a headless browser on my server.
Share Improve this question edited Oct 28, 2014 at 10:31 Mehran asked Oct 27, 2014 at 19:32 MehranMehran 16.9k31 gold badges139 silver badges245 bronze badges 2- If you're using JS, why encode the images? – Mooseman Commented Oct 27, 2014 at 19:34
- Because JS can easily traverse all the html elements. Otherwise I'll need a parser to read and turn the tags into DOM objects before I can query them for external resources. – Mehran Commented Oct 27, 2014 at 19:37
2 Answers
Reset to default 13There are tools out there to do that. Examples:
- https://github./remy/inliner
- https://github./jgallen23/grunt-inline-css
- https://github./ceee/grunt-datauri
While there are benefits to this approach, remember that a page visited more than once, or site with multiple pages with same JS/CSS files will enjoy client (browser) side caching.
Browser extensions
There are Save Page WE extension for Firefox and Chrome:
- Firefox: https://addons.mozilla/en-US/firefox/addon/save-page-we/
- Chrome: https://chrome.google./webstore/detail/save-page-we/dhhpefjklgkmgeafimnjhojgjamoafof/related
This extension can scroll or zoom out the page in order to allow fetching lazy-loading resources before saving.
Command line tools
monolith
(rust)
CLI tool for saving plete web pages as a single HTML file
Install
# any platform with rustc installed
cargo install monolith
# on macos
brew install monolith
# on windows
choco install monolith
obelisk
(golang)
Go package and CLI tool for saving web page as single HTML file
# any platform with go sdk installed
go install -v github./go-shiori/obelisk/cmd/obelisk@latest
binaries: https://github./go-shiori/obelisk/releases
inliner
inliner is a npm module which exposes the inliner
cli utility; works with some URLs but throws errors with others. Pipes output to stdout and therefore needs to be used like e.g. inliner https://http.cat > cats.html
.
It can be installed with (assuming you have nodejs+npm):
npm install -g inliner