What needs to be changed in the code below in order to successfully decode the tar file so that it can be processed and saved to an S3 bucket without throwing errors?
The relevant part of the lambda function that processes the API request in the AWS backend is as follows:
import base64
import tarfile
bodyTar = event['body']
# Get the base64 encoded tar file from the body
nameStr = 'filename="' + tarBallName + '"'
print("nameStr is: ", nameStr)
tar_file = bodyTar.split(nameStr)[1].split('\r\n\r\n')[1]
# Decode the base64 encoded tar file
print("About to decode the tar file.")
tar_file_decoded = base64.b64decode(tar_file).decode('utf-8') # This throws: ValueError: string argument should contain only ASCII characters
# The following code should run next, but the ERROR in the preceding line prevents the following code from running:
print("Done decoding the tar file.")
# Save the tar file to a temporary file
tmpTarBallName = '/tmp/' + tarBallName
print("tmpTarBallName is: ", tmpTarBallName)
with open(tmpTarBallName, 'wb') as f:
f.write(tar_file_decoded)
print("Done writing the tar file.")
# Extract the tar file
with tarfile.open(tmpTarBallName, 'r:gz') as tar:
tar.extractall('/tmp')
# Upload the extracted files to an S3 bucket
s3 = boto3.client('s3')
print("About to upload the extracted files to S3.")
s3.upload_file(tmpTarBallName, 'my-s3-bucket', tarBallName)
The code runs as expected until the line tar_file_decoded = base64.b64decode(tar_file).decode('utf-8')
throws the error ValueError: string argument should contain only ASCII characters
The code that makes the API request from the remote Python 3 application starts with a tarball and sends the tarball as follows:
import requests
headers={
'Content-Type': 'application/x-tar',
'tar-ball-name': outputFile
}
files = {'file': open(outputFile, 'rb')}
response = requests.post(completeURL, headers=headers, files=files)
The tarball is packaged as follows:
def package_tar_recursive_without_root_folder(self, input_dir: str, output_file: str):
with tarfile.open(output_file, mode='w:gz') as archive:
for root, dirs, files in os.walk(input_dir):
for file in files:
file_path = os.path.join(root, file)
relative_path = os.path.relpath(file_path, input_dir)
archive.add(file_path, arcname=relative_path, recursive=False)
The contents of the tarball are a few private Git repositories that are entirely composed of code files in a few languages including Python, YAML, and Markdown.