I'm trying to use LangChains S3FileLoader or S3DirectoryLoader, but both are returning empty page_contents. I know it's finding the files, because the metadata source is properly filled in.
My code is very simple,
from langchain_community.document_loaders.s3_directory import S3DirectoryLoader
from langchain_community.document_loaders.s3_file import S3FileLoader
loader = S3FileLoader(bucket = s3_bucket,
key='myfile.json',
region_name="us-east-1")
documents = loader.load()
And when I print documents, I get something like the following:
Loader - <langchain_community.document_loaders.s3_file.S3FileLoader object at 0x0000014CBB850B90>Documents - [Document(metadata={'source': 's3://myS3bucket/myfile.json'}, page_content='')]
For simplicity, my Access Key, Secret Key and Session Token are all set from Windows command prompt where I'm running the python script from.
In AWS, my role/policy has PutObject, GetObject and ListObject permissions on that bucket.
I can't figure out why my page_contents are empty. The same occurs for S3FileLoader and S3DirectoryLoader APIs.
I have also tried running from both a Windows command prompt and from Ubuntu via WSL.