最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Replace parts of HTML strings in multiple documents - Stack Overflow

programmeradmin0浏览0评论

I am saving parts of an existing Microsoft word document as HTML and embedding this HTML dynamically in panels to give instructions to the users.

This is working fine except for the images which are not appearing. Inspecting the HTML that is generated I see that the HTML to display the image is

<img src="home_files/image001.png" />

In Visual Studio the HTML help pages are stored in a folder called Help, so I changed this line to include the help folder

<img src="help/home_files/image001.png" />

With this change the image is displayed correctly.


I have to generate over 50 help pages from word documents so do not manually want to change all of the image locations, especially as if there are changes some pages will be regenerated.

Is there a way for the images to be displayed correctly without editing the messy documents gererated by Word?

Or is there a better way to generate HTML versions of word documents?

I didn't use PDF's as not everyones browser will display PDF's embedded into a web page

I am saving parts of an existing Microsoft word document as HTML and embedding this HTML dynamically in panels to give instructions to the users.

This is working fine except for the images which are not appearing. Inspecting the HTML that is generated I see that the HTML to display the image is

<img src="home_files/image001.png" />

In Visual Studio the HTML help pages are stored in a folder called Help, so I changed this line to include the help folder

<img src="help/home_files/image001.png" />

With this change the image is displayed correctly.


I have to generate over 50 help pages from word documents so do not manually want to change all of the image locations, especially as if there are changes some pages will be regenerated.

Is there a way for the images to be displayed correctly without editing the messy documents gererated by Word?

Or is there a better way to generate HTML versions of word documents?

I didn't use PDF's as not everyones browser will display PDF's embedded into a web page

Share Improve this question edited Jun 9, 2015 at 5:57 Vidya Sagar 1,7193 gold badges17 silver badges30 bronze badges asked May 30, 2015 at 8:32 Nick Le PageNick Le Page 4,3693 gold badges23 silver badges26 bronze badges 2
  • Nick, are you want just change in alredy generated .html files <img src="home_files/IMG_NAME.png" /> to <img src="help/home_files/IMG_NAME.png" /> by launching php script? Generated .html files stores in one folder? Maybe you can show us one of them? Or two. – Mykola Vasilaki Commented Jun 3, 2015 at 22:57
  • This has now been edited into a question I don't want to know the answer to! – Nick Le Page Commented Jul 14, 2015 at 8:26
Add a ment  | 

8 Answers 8

Reset to default 4 +25

Is there a way for the images to be displayed correctly without editing the messy documents gererated by Word?

I guess you could just run some simple client side code to change the src attribute of those <img> tags. You would get something like

var imgs = document.querySelector("container").querySelectorAll("img");
for(var i=0;i<imgs.length;i++){
  var oldSrc = imgs[i].getAttribute();
  imgs[i].setAttribute("src", "help/"+oldSrc);
}

The same can of course be done in any server side DOM implementation, do note that these can lack the features described in the snippet above and might thus require rewriting.

Or is there a better way to generate HTML versions of word documents?

To be honest it's a pretty bad idea in general (or at least it was in the past). Word isn't meant for this kind of stuff, so you might run into a lot of trouble. I worked for a pany years ago where they had a special tool just to clean up HTML content copied from Word and although I never did any maintance on it I do remember the code being quite plex, so I wouldn't be surprised if you might run into unexpected issues. Far more logical is to have the content being written in an editor that is meant for the web in the first place. Even copy pasting into an editor meant for the web might do wonders (if the editor is a fairly strict one).

<?php
function processFiles($root)
{
    $root = rtrim($root, DIRECTORY_SEPARATOR) . DIRECTORY_SEPARATOR;
    if($hDir = opendir($root))
    {
        while(false !== $filename = readdir($hDir))
        {
            if($filename == '.' || $filename == '..')
                continue;

            $file = $root . $filename;
            if(is_dir($file))
                call_user_func(__FUNCTION__, $file);
            elseif(pathinfo($file, PATHINFO_EXTENSION) == 'html')
            {
                $old = file_get_contents($file);
                $new = str_replace('home_files/', 'help/home_files/', $old);
                file_put_contents($file, $new);
            }
        }
        closedir($hDir);
    }
}

processFiles('folder/with/html-files/');

This will process all of your *.html files and do a str_replace() on them to fix the wrong path.

How about something like this:

foreach (glob("path/to/files/*.doc") as $filename)
{
$file = file_get_contents($filename);
file_put_contents($filename, preg_replace("home_files/","help/home_files/",$file));
}

Add this code to .htaccess and you will not need to do something with docs :)

RewriteEngine on 
RewriteRule ^home_files/([^\.]+\.(png|jpg))$  /help/home_files/$1 [L] 

Note: To exactly write the pathes, it is necessary to know folder structure

Or is there a better way to generate HTML versions of word documents?

If the location is the only problem, you could also just move the images in a console window with a simple

move home_files/*.* help/home_files

You could also put that mand in a batch file and access it from the desktop, start menu or even assign it to a Word macro.

From what I read you are not looking for code, but just a solution to your one-time conversion woes.

This change is actually very easy. Do a search-and-replace in files. Download Notepad++, install, run, hit Ctrl-F and go to the "Find in files" tab. In the "Find what" field, enter ""home_files/", in the "Replace with" field, enter ""help/home_files/". You can set the "Filters" to "*.*", and select the folder where you store your html files. Click "Replace in files" and voila, all your files are changed. No coding needed.

Note that by adding the quote (") in the search, you can re-run it and it won't break files that were already fixed.

Why not to simply change the base path of your documents with <base> tag?

This is an easy change (simply adding single tag in each header).

Parsing whole document to replace all matching paths is much more expensive and error prone.

Also you can do this by using Adobe Dreamweaver. select your folder and use replace all.

发布评论

评论列表(0)

  1. 暂无评论