I'd like to implement custom Scrapy HTTP cache. I know how to set HTTPCACHE_STORAGE
to my class and what methods to implement, but the problem is that the storage I want to use is asynchronous, while the HTTPCACHE_STORAGE
protocol is expected to be synchronous. Is there any way I can do this?
This isn't inside spider and for the methods to work, they must be simple def retrieve_response(...)
and so on. But inside that method I need to call await ...
. Since there's already one event loop running in Scrapy, I cannot run another one. How do I do this?
I use TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
.