In my project, I want to calculate hash of folders. For example, there are 10 folders and these folders have many sub files. I know many way to get hash of files but is there any way for get hash of every folder?
My purpose in doing this is to understand if the files in the folder have changed.
I am open to suggestions and different ideas, I need your help. Thanks in advance.
In my project, I want to calculate hash of folders. For example, there are 10 folders and these folders have many sub files. I know many way to get hash of files but is there any way for get hash of every folder?
My purpose in doing this is to understand if the files in the folder have changed.
I am open to suggestions and different ideas, I need your help. Thanks in advance.
Share Improve this question asked Jun 21, 2021 at 21:28 riderrider 993 silver badges12 bronze badges 6- 1 Are you asking how to iterate over those folders and get the hash of each file? – Andy Commented Jun 21, 2021 at 21:36
- No actually, I want to only one hash code from all files in a folder. I have only one folder, you can think of it as a react project. There should be a hash code for this folder and I need to understand whether any file in it has changed or not. Isn't there a way to do this without hashing each file individually? – rider Commented Jun 21, 2021 at 21:42
- 1 Does this help? – Andy Commented Jun 21, 2021 at 21:44
- 1 If you want to hash the actual content of the files, there is no way to do that without reading every byte of every file and puting a hash value for each file or a bined hash for the folder. You could, I guess, hash only the filenames and file sizes and hope that was enough to detect a change, but obviously a file of the same length, but different content (such as changing one character in the file) would not be detected that way. You could then add in the modification date for each file and maybe catch a bit more changes. – jfriend00 Commented Jun 21, 2021 at 22:00
-
1
a simple exec
find -type f -exec md5sum "{}" +
will yield each hash with file path, andfind -type f -exec md5sum "{}" + | md5sum | cut -c 1-32
will hash entire path, additionally if your after a hash of the mited state of a project grab the most recent mit hashgit rev-parse HEAD
and/or call remotegit ls-remote origin -h refs/heads/master
and pare it with localgit rev-parse refs/heads/master
– Lawrence Cherone Commented Jun 21, 2021 at 22:54
2 Answers
Reset to default 10It really depends upon how reliable you want your modification detection to be. The most reliable method would iterate through every file in every folder and calculate a hash of the actual file contents by reading every byte of every file.
Other than that, you can examine file metadata such as filenames, modification date, file size. A change in any of those DOES indicate a change in the contents. But, the lack of a change in any of those does not conclusively indicate that the file contents has not changed. It is possible to modify the file contents, keep the same filename, keep the same file size and manually set the modification date back to what it was - thus fooling an examination of only the metadata.
But, if you're willing to accept that it could be fooled via manipulation, but would normally detect changes, then you could iterate all the files of a folder and pute a bined hash that uses the metadata: the filenames, the file sizes and the file modification dates and e up with a single hash for the folder. Depending upon your purpose that may or may not be sufficient - you would have to make that call.
Other than that, you're going to have to read every byte of every file and pute a hash of the actual file contents.
Here's some demonstration code of the metadata hashing algorithm:
const fsp = require("fs/promises");
const { createHash } = require("crypto");
const path = require('path');
// -----------------------------------------------------
// Returns a buffer with a puted hash of all file's metadata:
// full path, modification time and filesize
// If you pass inputHash, it must be a Hash object from the crypto library
// and you must then call .digest() on it yourself when you're done
// If you don't pass inputHash, then one will be created automatically
// and the digest will be returned to you in a Buffer object
// -----------------------------------------------------
async function puteMetaHash(folder, inputHash = null) {
const hash = inputHash ? inputHash : createHash('sha256');
const info = await fsp.readdir(folder, { withFileTypes: true });
// construct a string from the modification date, the filename and the filesize
for (let item of info) {
const fullPath = path.join(folder, item.name);
if (item.isFile()) {
const statInfo = await fsp.stat(fullPath);
// pute hash string name:size:mtime
const fileInfo = `${fullPath}:${statInfo.size}:${statInfo.mtimeMs}`;
hash.update(fileInfo);
} else if (item.isDirectory()) {
// recursively walk sub-folders
await puteMetaHash(fullPath, hash);
}
}
// if not being called recursively, get the digest and return it as the hash result
if (!inputHash) {
return hash.digest();
}
}
puteMetaHash(__dirname).then(result => {
console.log(result);
}).catch(err => {
console.log(err);
});
Built off @jfriend00's implementation (thank you!), this solution accepts multiple paths and is TypeScript based:
import { Hash, createHash } from "node:crypto";
import { readdirSync, statSync } from "node:fs";
import { join } from "node:path";
/**
* Creates hash of given files/folders. Used to conditionally deploy custom
* resources depending if source files have changed
*/
export function puteMetaHash(paths: string[], inputHash?: Hash) {
const hash = inputHash ? inputHash : createHash("sha1");
for (const path of paths) {
const statInfo = statSync(path);
if (statInfo.isDirectory()) {
const directoryEntries = readdirSync(path, { withFileTypes: true });
const fullPaths = directoryEntries.map((e) => join(path, e.name));
// recursively walk sub-folders
puteMetaHash(fullPaths, hash);
} else {
const statInfo = statSync(path);
// pute hash string name:size:mtime
const fileInfo = `${path}:${statInfo.size}:${statInfo.mtimeMs}`;
hash.update(fileInfo);
}
}
// if not being called recursively, get the digest and return it as the hash result
if (!inputHash) {
return hash.digest().toString("base64");
}
return;
}