I have a list of items grouped by selection, and my requirement is to search for items based on a list of selections. The items are indexed in a SolR index but the link between the selections and the items is in a postgresql table.
Today I'm running an initial SQL query to retrieve a list of item identifiers based on the selections, then the items are retrieved via a SolR query based on this list of identifiers.
The problem is that this query can be expensive depending on the number of items in the selections, which can exceed 30,000. Below is an example of a query with some identifiers (idItem) :
http://localhost:9182/solr/recherche/select?defType=lucene&q.op=OR&lowercaseOperators=false&facet.limit=-1&facet.mincount=1&facet.method=enum&fl=idObjetTouristiqueReference%2Cdate_tdt%2Cscore&mm=100%25&qf=nom_t_fr_st_ont_noidf%5E2.0&tie=0.2&pf=nom_t_fr_st_ont_noidf%5E4.0&ps=4&sort=idObjetTouristiqueReference+asc&q=*%3A*&start=0&rows=20&fq=%28idObjetTouristiqueReference%3A%285865563++OR+2916399++OR+98308++OR+5013577++OR+5701725++OR+6389867++OR+5865544++OR+884764++OR+884762++OR+7110772++OR+5701709++OR+5374025++OR+4948048++OR+6815862++OR+6324351++OR+4784233++OR+884780++OR+884782++OR+884778++OR+5996659++OR+5603453++OR+6389694++OR+7143347++OR+6062012++OR+6946696++OR+6979470++OR+6291386++OR+5996476++OR+884705++OR+7176092++OR+98293++OR+6619027++OR+6193062++OR+5078964++OR+5898147++OR+6094758++OR+98298++OR+5865381++OR+5439406+%29%29
I tried to optimise by adding the selections in the Solr index and by running a join query to directly retrieve my items based on my selections. Below is an example with some selections :
http://localhost:9182/solr/recherche/select?defType=lucene&q.op=OR&lowercaseOperators=false&facet.limit=-1&facet.mincount=1&facet.method=enum&fl=idObjetTouristiqueReference%2Cdate_tdt%2Cscore&mm=100%25&qf=nom_t_fr_st_ont_noidf%5E2.0&tie=0.2&pf=nom_t_fr_st_ont_noidf%5E4.0&ps=4&fq=%7B%21join+from%3DselectionReferenceIds_l_sf_mvt+to%3DidObjetTouristiqueReference%7D%28id%3A%28%22SELECTION71983%22++OR+%22SELECTION71984%22++OR+%22SELECTION71985%22++OR+%22SELECTION71986%22++OR+%22SELECTION71987%22++OR+%22SELECTION71988%22++OR+%22SELECTION71991%22++OR+%22SELECTION71992%22++OR+%22SELECTION71993%22++OR+%22SELECTION71994%22++OR+%22SELECTION71995%22++OR+%22SELECTION71996%22++OR+%22SELECTION71998%22++OR+%22SELECTION72000%22+%29%29&sort=idObjetTouristiqueReference+asc&q=*%3A*&start=0&rows=20
But strangely, the response time is much longer after this change. Could someone help me understand why a Joinquery query is slower than two successive queries ? And more generally, would anyone have any tips on how to optimise my search ?
Thanks !