I’m working on an IoT project that collects real-time sensor data from thousands of devices. Each device sends a new reading every second, and I need to insert this data into GridDB Cloud as efficiently as possible.
I’m using a TimeSeries container since the data is time-based, and my schema looks like this:
ContainerInfo containerInfo = new ContainerInfo(
"sensor_data",
Arrays.asList(
new ColumnInfo("device_id", GSType.STRING),
new ColumnInfo("timestamp", GSType.TIMESTAMP),
new ColumnInfo("temperature", GSType.DOUBLE),
new ColumnInfo("humidity", GSType.DOUBLE)
),
true // Set as TimeSeries container
);
gridstore.putContainer(containerInfo);
Issue When inserting data at a high rate (thousands of inserts per second), I start noticing increased latency, sometimes leading to delays or failed writes. Here’s a simplified version of the code I’m using for inserts:
TimeSeries<Void> ts = gridstore.getTimeSeries("sensor_data", Void.class);
Row row = ts.createRow();
row.setString(0, "sensor_001");
row.setTimestamp(1, new Timestamp(System.currentTimeMillis()));
row.setDouble(2, 22.5);
row.setDouble(3, 55.0);
ts.put(row);
If I run this in a loop for multiple sensors, the performance starts dropping significantly as the number of writes increases.
**What I’ve Tried **
- Batch Inserts Instead of inserting one row at a time, I tried using batch inserts:
List<Row> rows = new ArrayList<>();
for (int i = 0; i < 1000; i++) {
Row r = ts.createRow();
r.setString(0, "sensor_" + i);
r.setTimestamp(1, new Timestamp(System.currentTimeMillis()));
r.setDouble(2, Math.random() * 30);
r.setDouble(3, Math.random() * 100);
rows.add(r);
}
ts.multiPut(rows);
This improved performance, but at higher loads, latency still increases.
- Parallel Inserts I tried using multiple threads to insert data in parallel, but this sometimes results in GSException: Timeout errors.
**Question **How can I reduce write latency in GridDB Cloud when handling high-frequency inserts? Are there specific configuration settings, transaction optimizations, or best practices that can help maintain low latency at scale?
Would appreciate any insights from those who have optimized GridDB for high-throughput workloads!