I have S3 with.csv.gz files and .json files inside it. Intially, I only want the code to get the .csv.gz files without the .json files, but the glue context keeps pulling the .json file also. Here is my code :
source_data = glueContext.create_dynamic_frame.from_options(
connection_type="s3",
connection_options={
"paths": [f"s3://{source_bucket}/{source_prefix}/"],
"recurse": True,
"exclusions": ["**.json"],
"compressionType": "GZIP"
},
format="csv",
format_options={"withHeader": True, "separator": ","}
)
I've tried adding additional configuration to the create_dynamic_frame.from_options, but no changes to the output. I've tried removing the json file and the output csv seems fine (currently the output csv contains additional columns filled with the json content), meaning that my code is not excluding the .json file.