最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Updating corpus of already generated BM25 model - Stack Overflow

programmeradmin0浏览0评论

I need to create a search alghoritm on a set of ~90k documents, where new documents get added on a daily basis. I want to marry BM25 with dense vectors and create hybrid ranking.

While I was reviewing documentation of BM25 python implementations, I started to wonder whether it is possible to update a BM25 model that has been once generated with new documents. From the efficiency point of view it won't be the best idea to take the entire database and generate new corpora and BM25 model every time a new document gets uploaded.

I was reviewing different BM25 python implementations, however I did not answer to my question.

EDIT: The exact question is whether it is possible to update a BM25 model that has been previously generated with some new documents, or a new model needs to be trained on the entire database again?

发布评论

评论列表(0)

  1. 暂无评论