Scenario 1/ writer would produce a new version of a csv file every few hours 2/ reader, in another application, would read the contents of csv file
We want to ensure that reader won't be affected when it is reading latest version of the file, but at the same time writer publishes new version. Reader should still be able to read the file from the version it had started reading. When reader requests again for the file, it should get the new version.
Can I use AWS S3 versioning to achieve this? Is it possible to achieve it on a "directory" level in s3? Is it possible to stream file contents during read instead of downloading full file before doing processing?
Scenario 1/ writer would produce a new version of a csv file every few hours 2/ reader, in another application, would read the contents of csv file
We want to ensure that reader won't be affected when it is reading latest version of the file, but at the same time writer publishes new version. Reader should still be able to read the file from the version it had started reading. When reader requests again for the file, it should get the new version.
Can I use AWS S3 versioning to achieve this? Is it possible to achieve it on a "directory" level in s3? Is it possible to stream file contents during read instead of downloading full file before doing processing?
Share Improve this question asked Mar 3 at 16:02 sattusattu 6481 gold badge22 silver badges39 bronze badges 2- Read S3 consistency model and Use Byte-Range Fetches. – jarmod Commented Mar 3 at 16:45
- Amazon S3 will meet your requirements out-of-the-box. Each version is a separate object, so if you read an object and a new version is created, it will not impact the object you have been reading. This operates at the object-level. There is no concept of "directory-level" in Amazon S3. – John Rotenstein Commented Mar 4 at 0:19
1 Answer
Reset to default 0Versioning will let you keep track of file versions so the reader can stick with the version it started with until it decides to fetch a new one. No directory versioning directly in S3, but you can manage versions by using timestamps or version IDs in filenames. You can stream the file without downloading it all using get_object or use S3 Select for querying specific parts of the CSV.