python - AWS - put data on S3 results in TimeOutError

I am creating a dataset on AWS S3, for their Opendata program.

I am fetching audio files, which are already stored on S3. I then segment them into smaller audio chunks, and putting those on S3 again. The problem occurs when I need to PUT (if I comment out, it won't give error).

The S3 bucket is in US-region.

To test if it was connectivity error on my end I tried:

Sagemaker Free lab, using US-region: it hung without error, but after 4 hours there was no progress
Google Colab, in US-region : same error, but they restricted temporarily the resources due to data volume, and cannot try again
local environment, in EU-region: it returns TimeOutError, no progress

Can you please help in avoiding the error and possibly speed up operations ? I must only use basic S3 services, not using Lambda or other AWS services to meet allowed budget costs.

Below what I tried:

# boto3 client for uploading (signed requests)
s3_client = boto3.client(
    's3',
    aws_access_key_id=AWS_ACCESS_KEY,
    aws_secret_access_key=AWS_SECRET_KEY,
    region_name=REGION_NAME
)

%%time
import functools

cache_audio = {}

for o, (lb, up)  in enumerate(batches[6:]):
  for ix, row in annotated_segments.loc[lb:up].iterrows():

    # clear cache to keep memory safe
    if len(cache_audio) > 5:
      cache_audio.clear()

    # audio props
    file_name = row['File name']

    # path props
    file_folder = row['File folder']

    # segment props
    segment_name = row['segment_name']
    start = row['voice_start']
    end = row['voice_end']

    # read from Cache
    if file_name not in cache_audio:
      audio, rate = fetch_audio(row)
      cache_audio[file_name] = audio
    
    else:
      audio = cache_audio[file_name]

    # store segment on S3
    audio_segment = audio[start : end]

    
    try:
      s3_path = f"data/annotated_segments/{file_folder}/{file_name}/{segment_name}"

      # initialise the bianary file
      file_obj = io.BytesIO()

      # write the audio segment
      # .html#soundfile.write
      soundfile.write(file_obj, audio_segment, samplerate = rate, format='WAV')  # norm=False for raw data

      # Reset the file pointer to the beginning
      file_obj.seek(0)

      # put annotated segments in S3
      put_audio_to_s3(file_obj, s3_path)


    except Exception as e:
      print(f"Error uploading file: {e}. File name: { file_name }. Batch: {lb} - {up}")
     
    
    
  print(f'Success! Completed {o}-th batch: {lb} - {up}')

Error raised after a while:

---------------------------------------------------------------------------TimeoutError Traceback (most recent call last)File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:754, in HTTPResponse._error_catcher(self) 753 try:--> 754 yield 756 except SocketTimeout as e: 757 # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but 758 # there is yet no clean way to get at it from this context.File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:879, in HTTPResponse._raw_read(self, amt, read1) 878 with self._error_catcher():--> 879 data = self._fp_read(amt, read1=read1) if not fp_closed else b"" 880 if amt is not None and amt != 0 and not data: 881 # Platform-specific: Buggy versions of Python. 882 # Close the connection when no data is returned (...) 887 # not properly close the connection in all cases. There is 888 # no harm in redundantly calling close.File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:862, in HTTPResponse._fp_read(self, amt, read1) 860 else: 861 # StringIO doesn't like amt=None--> 862 return self._fp.read(amt) if amt is not None else self._fp.read()File ~/miniconda3/envs/fruitbats/lib/python3.10/http/client.py:482, in HTTPResponse.read(self, amt) 481 try:--> 482 s = self._safe_read(self.length) 483 except IncompleteRead:File ~/miniconda3/envs/fruitbats/lib/python3.10/http/client.py:631, in HTTPResponse._safe_read(self, amt) 625 """Read the number of bytes requested. 626 627 This function should be used when <amt> bytes "should" be present for 628 reading. If the bytes are truly not available (due to EOF), then the 629 IncompleteRead exception can be used to detect the problem. 630 """--> 631 data = self.fp.read(amt) 632 if len(data) < amt:File ~/miniconda3/envs/fruitbats/lib/python3.10/socket.py:717, in SocketIO.readinto(self, b) 716 try:--> 717 return self._sock.recv_into(b) 718 except timeout:File ~/miniconda3/envs/fruitbats/lib/python3.10/ssl.py:1307, in SSLSocket.recv_into(self, buffer, nbytes, flags) 1304 raise ValueError( 1305 "non-zero flags not allowed in calls to recv_into() on %s" % 1306 self.__class__)-> 1307 return self.read(nbytes, buffer) 1308 else:File ~/miniconda3/envs/fruitbats/lib/python3.10/ssl.py:1163, in SSLSocket.read(self, len, buffer) 1162 if buffer is not None:-> 1163 return self._sslobj.read(len, buffer) 1164 else:TimeoutError: The read operation timed outThe above exception was the direct cause of the following exception:ReadTimeoutError Traceback (most recent call last)File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/botocore/response.py:99, in StreamingBody.read(self, amt) 98 try:---> 99 chunk = self._raw_stream.read(amt) 100 except URLLib3ReadTimeoutError as e: 101 # TODO: the url will be None as urllib3 isn't setting it yetFile ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:955, in HTTPResponse.read(self, amt, decode_content, cache_content) 953 return self._decoded_buffer.get(amt)--> 955 data = self._raw_read(amt) 957 flush_decoder = amt is None or (amt != 0 and not data)File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:878, in HTTPResponse._raw_read(self, amt, read1) 876 fp_closed = getattr(self._fp, "closed", False)--> 878 with self._error_catcher(): 879 data = self._fp_read(amt, read1=read1) if not fp_closed else b""File ~/miniconda3/envs/fruitbats/lib/python3.10/contextlib.py:153, in _GeneratorContextManager.__exit__(self, typ, value, traceback) 152 try:--> 153 self.gen.throw(typ, value, traceback) 154 except StopIteration as exc: 155 # Suppress StopIteration *unless* it's the same exception that 156 # was passed to throw(). This prevents a StopIteration 157 # raised inside the "with" statement from being suppressed.File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:759, in HTTPResponse._error_catcher(self) 756 except SocketTimeout as e: 757 # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but 758 # there is yet no clean way to get at it from this context.--> 759 raise ReadTimeoutError(self._pool, None, "Read timed out.") from e # type: ignore[arg-type] 761 except BaseSSLError as e: 762 # FIXME: Is there a better way to differentiate between SSLErrors?ReadTimeoutError: AWSHTTPSConnectionPool(host='fruitbat-vocalizations.s3.us-west-2.amazonaws', port=443): Read timed out.During handling of the above exception, another exception occurred:ReadTimeoutError Traceback (most recent call last)File <timed exec>:25Cell In[25], line 15, in fetch_audio(row, sr) 12 s3_object_key = str(s3_path.relative_to(DSLOC)) 14 response = s3_client.get_object(Bucket=BUCKET_NAME, Key=s3_object_key)---> 15 file_content = response['Body'].read() 17 #  18 # this will read in float64 by default and multichannel if any 19 data, rate = soundfile.read(io.BufferedReader(io.BytesIO(file_content)), always_2d=True)File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/botocore/httpchecksum.py:240, in StreamingChecksumBody.read(self, amt) 239 def read(self, amt=None):--> 240 chunk = super().read(amt=amt) 241 self._checksum.update(chunk) 242 if amt is None or (not chunk and amt > 0):File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/botocore/response.py:102, in StreamingBody.read(self, amt) 99 chunk = self._raw_stream.read(amt) 100 except URLLib3ReadTimeoutError as e: 101 # TODO: the url will be None as urllib3 isn't setting it yet--> 102 raise ReadTimeoutError(endpoint_url=e.url, error=e) 103 except URLLib3ProtocolError as e: 104 raise ResponseStreamingError(error=e)ReadTimeoutError: Read timeout on endpoint URL: "None"

I am creating a dataset on AWS S3, for their Opendata program.

The S3 bucket is in US-region.

To test if it was connectivity error on my end I tried:

Sagemaker Free lab, using US-region: it hung without error, but after 4 hours there was no progress
Google Colab, in US-region : same error, but they restricted temporarily the resources due to data volume, and cannot try again
local environment, in EU-region: it returns TimeOutError, no progress

Can you please help in avoiding the error and possibly speed up operations ? I must only use basic S3 services, not using Lambda or other AWS services to meet allowed budget costs.

Below what I tried:

# boto3 client for uploading (signed requests)
s3_client = boto3.client(
    's3',
    aws_access_key_id=AWS_ACCESS_KEY,
    aws_secret_access_key=AWS_SECRET_KEY,
    region_name=REGION_NAME
)

%%time
import functools

cache_audio = {}

for o, (lb, up)  in enumerate(batches[6:]):
  for ix, row in annotated_segments.loc[lb:up].iterrows():

    # clear cache to keep memory safe
    if len(cache_audio) > 5:
      cache_audio.clear()

    # audio props
    file_name = row['File name']

    # path props
    file_folder = row['File folder']

    # segment props
    segment_name = row['segment_name']
    start = row['voice_start']
    end = row['voice_end']

    # read from Cache
    if file_name not in cache_audio:
      audio, rate = fetch_audio(row)
      cache_audio[file_name] = audio
    
    else:
      audio = cache_audio[file_name]

    # store segment on S3
    audio_segment = audio[start : end]

    
    try:
      s3_path = f"data/annotated_segments/{file_folder}/{file_name}/{segment_name}"

      # initialise the bianary file
      file_obj = io.BytesIO()

      # write the audio segment
      # https://python-soundfile.readthedocs.io/en/latest/index.html#soundfile.write
      soundfile.write(file_obj, audio_segment, samplerate = rate, format='WAV')  # norm=False for raw data

      # Reset the file pointer to the beginning
      file_obj.seek(0)

      # put annotated segments in S3
      put_audio_to_s3(file_obj, s3_path)


    except Exception as e:
      print(f"Error uploading file: {e}. File name: { file_name }. Batch: {lb} - {up}")
     
    
    
  print(f'Success! Completed {o}-th batch: {lb} - {up}')

Error raised after a while:

---------------------------------------------------------------------------TimeoutError Traceback (most recent call last)File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:754, in HTTPResponse._error_catcher(self) 753 try:--> 754 yield 756 except SocketTimeout as e: 757 # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but 758 # there is yet no clean way to get at it from this context.File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:879, in HTTPResponse._raw_read(self, amt, read1) 878 with self._error_catcher():--> 879 data = self._fp_read(amt, read1=read1) if not fp_closed else b"" 880 if amt is not None and amt != 0 and not data: 881 # Platform-specific: Buggy versions of Python. 882 # Close the connection when no data is returned (...) 887 # not properly close the connection in all cases. There is 888 # no harm in redundantly calling close.File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:862, in HTTPResponse._fp_read(self, amt, read1) 860 else: 861 # StringIO doesn't like amt=None--> 862 return self._fp.read(amt) if amt is not None else self._fp.read()File ~/miniconda3/envs/fruitbats/lib/python3.10/http/client.py:482, in HTTPResponse.read(self, amt) 481 try:--> 482 s = self._safe_read(self.length) 483 except IncompleteRead:File ~/miniconda3/envs/fruitbats/lib/python3.10/http/client.py:631, in HTTPResponse._safe_read(self, amt) 625 """Read the number of bytes requested. 626 627 This function should be used when <amt> bytes "should" be present for 628 reading. If the bytes are truly not available (due to EOF), then the 629 IncompleteRead exception can be used to detect the problem. 630 """--> 631 data = self.fp.read(amt) 632 if len(data) < amt:File ~/miniconda3/envs/fruitbats/lib/python3.10/socket.py:717, in SocketIO.readinto(self, b) 716 try:--> 717 return self._sock.recv_into(b) 718 except timeout:File ~/miniconda3/envs/fruitbats/lib/python3.10/ssl.py:1307, in SSLSocket.recv_into(self, buffer, nbytes, flags) 1304 raise ValueError( 1305 "non-zero flags not allowed in calls to recv_into() on %s" % 1306 self.__class__)-> 1307 return self.read(nbytes, buffer) 1308 else:File ~/miniconda3/envs/fruitbats/lib/python3.10/ssl.py:1163, in SSLSocket.read(self, len, buffer) 1162 if buffer is not None:-> 1163 return self._sslobj.read(len, buffer) 1164 else:TimeoutError: The read operation timed outThe above exception was the direct cause of the following exception:ReadTimeoutError Traceback (most recent call last)File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/botocore/response.py:99, in StreamingBody.read(self, amt) 98 try:---> 99 chunk = self._raw_stream.read(amt) 100 except URLLib3ReadTimeoutError as e: 101 # TODO: the url will be None as urllib3 isn't setting it yetFile ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:955, in HTTPResponse.read(self, amt, decode_content, cache_content) 953 return self._decoded_buffer.get(amt)--> 955 data = self._raw_read(amt) 957 flush_decoder = amt is None or (amt != 0 and not data)File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:878, in HTTPResponse._raw_read(self, amt, read1) 876 fp_closed = getattr(self._fp, "closed", False)--> 878 with self._error_catcher(): 879 data = self._fp_read(amt, read1=read1) if not fp_closed else b""File ~/miniconda3/envs/fruitbats/lib/python3.10/contextlib.py:153, in _GeneratorContextManager.__exit__(self, typ, value, traceback) 152 try:--> 153 self.gen.throw(typ, value, traceback) 154 except StopIteration as exc: 155 # Suppress StopIteration *unless* it's the same exception that 156 # was passed to throw(). This prevents a StopIteration 157 # raised inside the "with" statement from being suppressed.File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/urllib3/response.py:759, in HTTPResponse._error_catcher(self) 756 except SocketTimeout as e: 757 # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but 758 # there is yet no clean way to get at it from this context.--> 759 raise ReadTimeoutError(self._pool, None, "Read timed out.") from e # type: ignore[arg-type] 761 except BaseSSLError as e: 762 # FIXME: Is there a better way to differentiate between SSLErrors?ReadTimeoutError: AWSHTTPSConnectionPool(host='fruitbat-vocalizations.s3.us-west-2.amazonaws', port=443): Read timed out.During handling of the above exception, another exception occurred:ReadTimeoutError Traceback (most recent call last)File <timed exec>:25Cell In[25], line 15, in fetch_audio(row, sr) 12 s3_object_key = str(s3_path.relative_to(DSLOC)) 14 response = s3_client.get_object(Bucket=BUCKET_NAME, Key=s3_object_key)---> 15 file_content = response['Body'].read() 17 # https://stackoverflow/questions/73350508/read-audio-file-from-s3-directly-in-python 18 # this will read in float64 by default and multichannel if any 19 data, rate = soundfile.read(io.BufferedReader(io.BytesIO(file_content)), always_2d=True)File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/botocore/httpchecksum.py:240, in StreamingChecksumBody.read(self, amt) 239 def read(self, amt=None):--> 240 chunk = super().read(amt=amt) 241 self._checksum.update(chunk) 242 if amt is None or (not chunk and amt > 0):File ~/miniconda3/envs/fruitbats/lib/python3.10/site-packages/botocore/response.py:102, in StreamingBody.read(self, amt) 99 chunk = self._raw_stream.read(amt) 100 except URLLib3ReadTimeoutError as e: 101 # TODO: the url will be None as urllib3 isn't setting it yet--> 102 raise ReadTimeoutError(endpoint_url=e.url, error=e) 103 except URLLib3ProtocolError as e: 104 raise ResponseStreamingError(error=e)ReadTimeoutError: Read timeout on endpoint URL: "None"

Share Improve this question asked Mar 24 at 15:01 user305883 1,7493 gold badges27 silver badges60 bronze badges

It's saying that there is a Read Timeout on the Endpoint. Can you please confirm whether you are able to run AWS CLI commands from the same computer? Also, what is the value of REGION_NAME? – John Rotenstein Commented Mar 24 at 22:17

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

I have encountered the same. It seems it was AWS over IPV6 was not resolving.

First, debug aws cmd example S3 ls

aws s3 ls --debug

see if the above is working or not resolving over the IPv6 address.

If that is the case, you can switch back to IPV4 or configure use_dualstack_endpoint

Verify IPv6 Status

Follow these steps:

Log in with root privileges.
Execute the following command:
```
$ ip a | grep inet6
```

In case you find the following as the result, then IPv6 is enabled:

inet6 ::1/128 scope host
inet6 fe80::e922:bcdf:e150:labb/64 scope link

If IPv6 is disabled, you should see no output if you run this command.

Disable IPv6

Perform the following steps to disable IPv6.

Follow these steps:

Log in with root privileges.
Open the

/etc/sysctl.conf

file with the following command:
```
$ sudo vim /etc/sysctl.conf
```
1. Add the following lines to it:
```
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
```
2. Reload
  
  sysctl.conf
  
  with the following command:
```
sysctl -p
```
3. Verify that IPv6 is disabled by running the following command:
```
$ ip a | grep inet6
```

You can enbble use_dualstack_endpoint in AWS config

cat .aws/config
[default]
region = eu-west-1
output = json
use_dualstack_endpoint = true

https://techdocs.broadcom/us/en/ca-enterprise-software/it-operations-management/network-flow-analysis/23-3/installing/system-recommendations-and-requirements/linux-servers/disable-ipv6-networking-on-linux-servers.html

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - AWS - put data on S3 results in TimeOutError - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)