I have a site with roughly 1200 posts. Each post has a featured image and a few placed images in the content. Over the years we have had a bad habit of using the WordPress image editor to crop and resize images. This has left quite a bit of trash/unused files in the Media Library.
I'd have an idea on a way to clean up the media library, but don't know if it is possible.
My idea is to runs a crawler on the site to pull down a local copy of only the image assets it encounters on page render or in the style sheet. The local copy would maintain the folder structure of the live site's uploads folder.
Once I have finished the crawl, I would replace the live site's uploads folder with the local version. Now only the images used on the site are in the image library.
I see two issues with this:
- The image database will be out of sync.
- The original source image file might not be pulled down
Issue 1 can be solved by regenerating thumbnails, but issue 2 might make that problematic.
The new local uploads folder won't have the original source images unless it is used at full size on a page or post. It's possible that scrset will provide the original source image path to the crawler, but I don't know if that will work for featured images and other programmatically placed images.
So the real question is-
Is it possible to use the child image with dimension in the file name (uploads/2020/11/2Y0A1576-300x300.jpg) to locate and retrieve the original source image file (uploads/2020/11/2Y0A1576.jpg) from the live site?
And is it scriptable?