最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Glue Context reads additional .json files when only wants to read .csv.gz files - Stack Overflow

programmeradmin4浏览0评论

I have S3 with.csv.gz files and .json files inside it. Intially, I only want the code to get the .csv.gz files without the .json files, but the glue context keeps pulling the .json file also. Here is my code :

source_data = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    connection_options={
        "paths": [f"s3://{source_bucket}/{source_prefix}/"],
        "recurse": True,
        "exclusions": ["**.json"],
        "compressionType": "GZIP"
    },
    format="csv",
    format_options={"withHeader": True, "separator": ","}
)

I've tried adding additional configuration to the create_dynamic_frame.from_options, but no changes to the output. I've tried removing the json file and the output csv seems fine (currently the output csv contains additional columns filled with the json content), meaning that my code is not excluding the .json file.

发布评论

评论列表(0)

  1. 暂无评论