We have a SolrCloud cluster with the following initial setup:
- Collection: Single collection with all document types (legal, report, thesis).
- Shard: 1 shard with 3 replicas.
- Collection Size: ~30GB with 1 million documents.
- Document Distribution: Thesis documents constitute 24GB out of 30GB and are generally large.
- Resource Allocation:
- RAM Allocated: 32GB (from a total of 128GB).
- Processors: 16.
Performance Metrics (Before Splitting Collections)
- P75 Response Time: 650ms for type:thesis filter.
- P90 Response Time: 900ms for type:thesis filter.
Change Made
To improve performance, we decided to split the collection by moving thesis documents into a separate collection. The new setup is:
- Collection 1: Legal + Report documents (~6GB size, 0.75M documents).
- Collection 2: Thesis documents (~24GB size, 0.25M documents).
Both collections are:
- In the same cluster.
- Have the same schema, SolrConfig, cache settings, etc..
- Have the same number of replicas and shards.
Observation (After Splitting)
Unexpectedly, the performance for Collection 2 (Thesis) degraded significantly:
- P75 Response Time: Increased to 740ms.
- P90 Response Time: Increased to 1.3 seconds.
This was surprising since the thesis documents now have a dedicated collection and fewer documents to process. However, the performance has worsened instead of improving.
Question
- What could be the potential root cause of this performance degradation?
- Given that the configurations, schema, and hardware resources are identical, why would a dedicated collection for larger documents perform worse?
- Are there any Solr-specific optimizations or cluster configurations that we may have missed when splitting the collections?
Any insights or suggestions to improve the response time would be greatly appreciated.
Thank you!