I have a python script that connects to a server and downloads a large amount of videos and photos. And when I say large, I mean in the tens of thousands. The total amount of data to be downloaded will likely be measurable in terabytes. Some of the individual videos are also rather large, possibly being over 2GB and beyond. I've recently rewrote the script to switch from using requests
to an asynchronous approach with asyncio
(and aiohttp
and aiofiles
) for reasons that should be obvious.
Since this program will be downloading so many files, I'm naturally expecting some errors. While downloading a video (photos aren't a problem), along with aiohttp.ClientConnectionError
and aiohttp.ClientSSLError
, I have a general error catch; every exception I consider a failure to download, and are logged in an error file to be attempted again later. The script then continues on to try and download the next video. I don't want to completely reattempt a download with this script as I think that will significantly slow this program down - plus, as the content on the server is varied, the file may just be undownloadable, and I don't want to have my program keep banging its head against a wall when it should just move on. I say all this because I don't want 50 comments saying 'error swallowing is bad' when I know this.
What I want is to eliminate the error catch for TimeoutError
. Again, some of these videos are very large, and the time to download them is proportionally very large. For some downloads, the time it takes exceeds whatever python's internal clock says is too long, and so even though the program is functioning properly and without issue, python will kill the download thinking something has gone wrong or something is caught in an infinite loop. I could just catch the TimeoutError
and have it restart the download from where it cut off, but that would take more computing and would weigh down the function for other downloads that don't have this issue. I feel that the far better solution would just be to not have the interpreter throw an error when nothing has gone wrong.
Does anyone know a way I could do this? A way to completely suppress the interpreter from considering this error? Or is there a better way than the two I've suggested?
EDIT: Just to clarify, the issue is not with the server timing out, but rather my python interpreter. Asyncio
thinks that a task going on for too long means that it's stuck in a loop or there's been a race condition or some other logical error. I need my interpreter to recognize that a task taking 20 minutes is acceptable.
I have a python script that connects to a server and downloads a large amount of videos and photos. And when I say large, I mean in the tens of thousands. The total amount of data to be downloaded will likely be measurable in terabytes. Some of the individual videos are also rather large, possibly being over 2GB and beyond. I've recently rewrote the script to switch from using requests
to an asynchronous approach with asyncio
(and aiohttp
and aiofiles
) for reasons that should be obvious.
Since this program will be downloading so many files, I'm naturally expecting some errors. While downloading a video (photos aren't a problem), along with aiohttp.ClientConnectionError
and aiohttp.ClientSSLError
, I have a general error catch; every exception I consider a failure to download, and are logged in an error file to be attempted again later. The script then continues on to try and download the next video. I don't want to completely reattempt a download with this script as I think that will significantly slow this program down - plus, as the content on the server is varied, the file may just be undownloadable, and I don't want to have my program keep banging its head against a wall when it should just move on. I say all this because I don't want 50 comments saying 'error swallowing is bad' when I know this.
What I want is to eliminate the error catch for TimeoutError
. Again, some of these videos are very large, and the time to download them is proportionally very large. For some downloads, the time it takes exceeds whatever python's internal clock says is too long, and so even though the program is functioning properly and without issue, python will kill the download thinking something has gone wrong or something is caught in an infinite loop. I could just catch the TimeoutError
and have it restart the download from where it cut off, but that would take more computing and would weigh down the function for other downloads that don't have this issue. I feel that the far better solution would just be to not have the interpreter throw an error when nothing has gone wrong.
Does anyone know a way I could do this? A way to completely suppress the interpreter from considering this error? Or is there a better way than the two I've suggested?
EDIT: Just to clarify, the issue is not with the server timing out, but rather my python interpreter. Asyncio
thinks that a task going on for too long means that it's stuck in a loop or there's been a race condition or some other logical error. I need my interpreter to recognize that a task taking 20 minutes is acceptable.
1 Answer
Reset to default 0This is the key question:
I feel that the far better solution would just be to not have the interpreter throw an error when nothing has gone wrong. Does anyone know a way I could do this? A way to completely suppress the interpreter from considering this error? Or is there a better way than the two I've suggested?
A way to achieve this with aiohttp
and aiofiles
would be to use the timeout
parameter of aiohttp.ClientSession.get
function.
See documentation of the constructor with all parameters:
timeout – a ClientTimeout settings structure, 300 seconds (5min) total timeout, 30 seconds socket connect timeout by default.
Added in version 3.3.
Changed in version 3.10.9: The default value for the sock_connect timeout has been changed to 30 seconds.
wait_for
orasync with asyncio.Timeout
or other call with atimeout=...
parameter set. – VPfB Commented 23 hours ago