最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

java - Unable to authenticate S3 with S3A pyspark config. I want to get the code to work in EMR hence, avoiding the temporal cre

programmeradmin0浏览0评论
Error: .apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
                                 "com.amazonaws.auth.profile.DefaultAWSCredentialsProviderChain")
                        .config("spark.hadoop.fs.s3a.access.key",
                                AWSHandler.get_session(Constant.aws_sso_profile).get_credentials().access_key)
                        .config("spark.hadoop.fs.s3a.secret.key",
                                AWSHandler.get_session(Constant.aws_sso_profile).get_credentials().secret_key)
                         .config("spark.hadoop.fs.s3a.impl", ".apache.hadoop.fs.s3a.S3AFileSystem")
                         .config('spark.executor.instances', 4).getOrCreate()
                         )
                return spark

In production, hard coding the access and secret key is not allowed which leaves me with either this apporach of getting the access from .aws

Error: .apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
                                 "com.amazonaws.auth.profile.DefaultAWSCredentialsProviderChain")
                        .config("spark.hadoop.fs.s3a.access.key",
                                AWSHandler.get_session(Constant.aws_sso_profile).get_credentials().access_key)
                        .config("spark.hadoop.fs.s3a.secret.key",
                                AWSHandler.get_session(Constant.aws_sso_profile).get_credentials().secret_key)
                         .config("spark.hadoop.fs.s3a.impl", ".apache.hadoop.fs.s3a.S3AFileSystem")
                         .config('spark.executor.instances', 4).getOrCreate()
                         )
                return spark

In production, hard coding the access and secret key is not allowed which leaves me with either this apporach of getting the access from .aws

Share Improve this question edited Jan 20 at 1:26 OneCricketeer 192k20 gold badges142 silver badges267 bronze badges asked Jan 19 at 19:54 shishi 19 1
  • idownvotedbecau.se/noresearch – OneCricketeer Commented Jan 20 at 1:30
Add a comment  | 

2 Answers 2

Reset to default 0

In production, hard coding the access and secret key is not allowed

Exactly

This is why you'd use ENVIRONMENT VARIABLES mentioned in the error; Read it and actually understand before posting here.

Also read the AWS documentation.

no need for explicit credentials using DefaultCredentialProvider is enough AWS Java SDK way :

```DefaultCredentialsProvider.create()``` = Uses ~/.aws/credentials or environment

import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.S3Object;
import software.amazon.awssdk.services.s3.model.S3Exception;

// Uses ~/.aws/credentials or environment
try (S3Client s3 = S3Client.builder()
.credentialsProvider(DefaultCredentialsProvider.create())
.build()) {

}

} catch (S3Exception e) {
System.err.println("Error occurred: " + e.awsErrorDetails().errorMessage());
}

Now ... Translating in to sparksession way

spark = SparkSession.builder \
.appName("S3A Example Without Explicit Credentials") \
.config("spark.hadoop.fs.s3a.aws.credentials.provider", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain") \
.config("spark.hadoop.fs.s3a.endpoint", "s3.amazonaws") \
.config("spark.hadoop.fs.s3a.fast.upload", "true") \
.config("spark.hadoop.fs.s3a.multipart.size", "104857600") \
.config("spark.hadoop.fs.s3a.threads.max", "10") \
.getOrCreate()

Sequence of resolution :

Environment variables: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

Java system properties: aws.accessKeyId and aws.secretKey.

AWS credentials file: By default, that is ~/.aws/credentials.

Instance profile credentials: Automatically populated for EC2 or AWS containers.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论