I am using the following code to add documents to a Lucene index. I have indexed 23,425 documents, but the folder where the index is stored has a size of 447.4 MB. In contrast, when storing the same data in a Parquet file with the same 23,425 records, the file size is only 625 KB. The folder size for the Lucene index seems excessively large. Could someone help identify why this is happening and how to optimize it? Below is the code I am using:
MMapDirectory indexDirectory = new MMapDirectory(Paths.get(directory));
// Configure the IndexWriter with an analyzer
StandardAnalyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter indexWriter = new IndexWriter(indexDirectory, config);
for (Map.Entry<String, OperationAggregation> entry : operations.entrySet())
{
Document doc1 = new Document();
doc1.add(new StringField("namespace", namespace, Store.YES));
doc1.add(new StringField("type", "operations", Store.YES));
doc1.add(new StringField("data", entry.getKey(), Store.YES));
doc1.add(new StringField("serviceName",entry.getValue().getServiceName(),
Store.YES));
List<AggregationAttribute> attributes =
entry.getValue().getOperationAttributes();
for (int i = 0; i < attributes.size(); i++)
{
doc1.add(new StoredField(attributes.get(i).getName(),
String.valueOf(attributes.get(i).getValue())));
}
try { docCount.getAndIncrement();
ndexWriter.addDocument(doc1);
} catch (IOException e) {
logger.error("Error while adding document to index", e);
}
}
indexWritermit();
indexWriter.close();