最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Lambda function fails to process tarball - Stack Overflow

programmeradmin0浏览0评论

What needs to be changed in the code below in order to successfully decode the tar file so that it can be processed and saved to an S3 bucket without throwing errors?

The relevant part of the lambda function that processes the API request in the AWS backend is as follows:

import base64
import tarfile
bodyTar = event['body']
# Get the base64 encoded tar file from the body
nameStr = 'filename="' + tarBallName + '"'
print("nameStr is: ", nameStr)
tar_file = bodyTar.split(nameStr)[1].split('\r\n\r\n')[1]    
# Decode the base64 encoded tar file
print("About to decode the tar file.")
tar_file_decoded = base64.b64decode(tar_file).decode('utf-8')  # This throws: ValueError: string argument should contain only ASCII characters

# The following code should run next, but the ERROR in the preceding line prevents the following code from running:
print("Done decoding the tar file.")    
# Save the tar file to a temporary file
tmpTarBallName = '/tmp/' + tarBallName
print("tmpTarBallName is: ", tmpTarBallName)
with open(tmpTarBallName, 'wb') as f:
    f.write(tar_file_decoded)
print("Done writing the tar file.")    
# Extract the tar file
with tarfile.open(tmpTarBallName, 'r:gz') as tar:
    tar.extractall('/tmp')

# Upload the extracted files to an S3 bucket
s3 = boto3.client('s3')
print("About to upload the extracted files to S3.")
s3.upload_file(tmpTarBallName, 'my-s3-bucket', tarBallName)

The code runs as expected until the line tar_file_decoded = base64.b64decode(tar_file).decode('utf-8') throws the error ValueError: string argument should contain only ASCII characters

The code that makes the API request from the remote Python 3 application starts with a tarball and sends the tarball as follows:

import requests

headers={
    'Content-Type': 'application/x-tar',
    'tar-ball-name': outputFile
}
files = {'file': open(outputFile, 'rb')}
response = requests.post(completeURL, headers=headers, files=files)

The tarball is packaged as follows:

def package_tar_recursive_without_root_folder(self, input_dir: str, output_file: str):
    with tarfile.open(output_file, mode='w:gz') as archive:
        for root, dirs, files in os.walk(input_dir):
            for file in files:
                file_path = os.path.join(root, file)
                relative_path = os.path.relpath(file_path, input_dir)
                archive.add(file_path, arcname=relative_path, recursive=False)

The contents of the tarball are a few private Git repositories that are entirely composed of code files in a few languages including Python, YAML, and Markdown.

发布评论

评论列表(0)

  1. 暂无评论