I have a large number of files (tens of thousands) in a Unix directory that I need to copy to Hadoop using the command:
hdfs dfs -put * /hdfs_folder/
However, some of these files have spaces in their filenames, such as "hello world.csv" or "this file has spaces.csv", and those fail during the transfer when using the wildcard approach.
Could you recommend a reliable method to copy all the files from Unix to HDFS that does not require renaming the files or using shell loops?
I’ve tried the following approaches, but none of them worked:
find . -type f -print0 | xargs -0 -I {} hdfs dfs -put "{}" /hdfs_folder/
find . -type f -exec hdfs dfs -put -f "{}" /hdfs_folder/ \;
printf '%s\0' "$folder_unix"/* | xargs -0 stat --format='%n' | awk -F/ -v basepath="$folder_unix" '{ printf "%s%c", basepath "/" $NF, 0 }' | xargs -0 hdfs dfs -put -f "${hdfs_folder}"
Any suggestions would be greatly appreciated.
Moving files with spaces in the name from Unix to a Hadoop folder