I am trying to recursively upload parquet files to an AWS S3 bucket using AWS CLI. I want to drop the .parquet and use the file name as the target table name.
So in a directory of table1.parquet, table2.parquet I am to run something like this:
aws s3 cp ./MyDir s3://mybucket/ --recursive
Where I get the below error, which makes sense because the expected table is table1 not table1.parque:
s3://mybucket/table1.parquet is not found
Ideally I would be able to specify in my CLI statement something like, where filename changes to table1, table2 etc:
aws s3 cp ./MyDir s3://mybucket/{filename} --recursive
I am trying to recursively upload parquet files to an AWS S3 bucket using AWS CLI. I want to drop the .parquet and use the file name as the target table name.
So in a directory of table1.parquet, table2.parquet I am to run something like this:
aws s3 cp ./MyDir s3://mybucket/ --recursive
Where I get the below error, which makes sense because the expected table is table1 not table1.parque:
s3://mybucket/table1.parquet is not found
Ideally I would be able to specify in my CLI statement something like, where filename changes to table1, table2 etc:
aws s3 cp ./MyDir s3://mybucket/{filename} --recursive
Share
Improve this question
edited Nov 19, 2024 at 20:51
John Rotenstein
270k28 gold badges446 silver badges530 bronze badges
Recognized by AWS Collective
asked Nov 19, 2024 at 15:23
cluelessclueless
111 silver badge1 bronze badge
1
|
1 Answer
Reset to default 0The AWS CLI does not have a built-in feature to rename files during upload directly. However, you can achieve your goal by using a script. Here’s a simple script in Bash to upload Parquet files to S3 and rename them by dropping the .parquet extension:
#!/bin/bash
# Directory containing the Parquet files
SOURCE_DIR="./MyDir"
# Target S3 bucket
S3_BUCKET="s3://mybucket/"
# Loop through all .parquet files in the directory
for filepath in "$SOURCE_DIR"/*.parquet; do
# Extract the filename without the path
filename=$(basename "$filepath")
# Remove the .parquet extension
target_name="${filename%.parquet}"
# Upload the file to S3 with the new name
aws s3 cp "$filepath" "$S3_BUCKET$target_name"
if [ $? -eq 0 ]; then
echo "Uploaded $filepath as $target_name"
else
echo "Failed to upload $filepath"
fi
done
This will upload all .parquet files from the ./MyDir directory to your S3 bucket, using the filename (without .parquet) as the key.
aws s3 cp
command? That's very strange, since it shouldn't be looking for any particular table names in the destination. What do you mean by "which makes sense"? Can you explain more? (Oh, and perhaps try changing./MyDir
into./MyDir/
?) – John Rotenstein Commented Nov 19, 2024 at 20:54