I'm looking for a way to cache data using a map (or equivalent) but instead of a comparable key (like int
, string
, etc) it would be lower and upper time boundaries. I would then be able to perform a map lookup of a specific timestamp.
In SQL terms this is what I would want (note the exclusive upper bound). This can be GIST-indexed and it essentially becomes like a hash.
SELECT col_x FROM table_y WHERE NOW() <@ TSTZRANGE(start_date, end_date, '[)');
My absolutely basic solution right now consists of creating one map entry per minute-truncated timestamp, which works well but obviously generates a lot of highly redundant entries.
Alternatively I could use the lower boundary only as a key, but I'm not sure how to lookup the map for the "first match greater or equal to the key" (that's obviously contrary to the comparable requirement). Is this where slices would be more appropriate?
By the way I'm using int64
timestamps instead of time.Time
.
I'm looking for a way to cache data using a map (or equivalent) but instead of a comparable key (like int
, string
, etc) it would be lower and upper time boundaries. I would then be able to perform a map lookup of a specific timestamp.
In SQL terms this is what I would want (note the exclusive upper bound). This can be GIST-indexed and it essentially becomes like a hash.
SELECT col_x FROM table_y WHERE NOW() <@ TSTZRANGE(start_date, end_date, '[)');
My absolutely basic solution right now consists of creating one map entry per minute-truncated timestamp, which works well but obviously generates a lot of highly redundant entries.
Alternatively I could use the lower boundary only as a key, but I'm not sure how to lookup the map for the "first match greater or equal to the key" (that's obviously contrary to the comparable requirement). Is this where slices would be more appropriate?
By the way I'm using int64
timestamps instead of time.Time
.
- 1) Could you provide some sample Go code and data showing how you'd like this to work? 2) "By the way I'm using int64 timestamps instead of time.Time." To be sure I understand, this is just about having a map with a key that's two numbers? 3) I like swords. – Schwern Commented Feb 5 at 21:27
- "I would then be able to perform a map lookup of a specific timestamp." What is your plan for overlapping time ranges? You might be better off putting your data into SQLite, indexing it, and querying it. – Schwern Commented Feb 5 at 21:31
- @Schwern in my case, the date ranges are guaranteed to be exclusive since there is a constraint in postgres. I'm actually trying to cache a static list in golang instead of performing database lookups. – Jeff Commented Feb 5 at 21:46
- Before writing your own indexing scheme, try a SQLite temporary database (the data will reside in memory) and see if it performs well enough. – Schwern Commented Feb 5 at 21:51
2 Answers
Reset to default 4By the way I'm using int64 timestamps instead of time.Time.
This isn't actually about time at all, we can drop that. You want to look up integers within a given range.
Rather than trying to make up your own indexing solution, probably your best solution is to put your data into a SQLite temporary database, index, and query it.
create table data (
num biginteger not null,
-- add your data columns
);
create index data_num_idx on data(num);
Or use redis, often used as a cache, and a sorted set. zrange
has O(logn + m) time complexity, N being the number of elements in the sorted set and M the number of elements returned, equivalent to using a b-tree because redis sorted sets are b-trees (more about b-trees below).
This can be coded up quickly, and are very flexible and easier to maintain than a custom solution. Decide on your performance criteria and run some benchmarks while you're coding up a Go solution.
This can be GIST-indexed and it essentially becomes like a hash.
Sure, you could use a GiST like a hash, but that won't help optimize your range queries much. If you use microsecond time as a hash key you'd need to search 1,000,000 entries to cover a range of one second.
A GiST is like a search tree, in fact it is a search tree. GiST is a Generalized Search Tree based on B+Trees.
If you want to do this yourself, you need a data structure like a map with its quick lookups and inserts, but also like a list with its fast in-order traversal: a tree! Not just any tree, but a B-Tree just like SQL indexes. Store the timestamp as the key. Search for the node which is equal to the lower bound or until you fail to match, this is O(logn), then walk the tree until you hit a node greater than the upper bound, this is O(m). The total is O(logn + m) where n is the number of nodes in the tree and m is the number of nodes in the range.
There are several Go B-Tree implementations out there.
Look, i can suggest you a practical solution using an interval tree implementation. In my opinion, this is much more efficient than creating entries per minute, here's the code:
type TimeRange struct {
Start int64
End int64
}
type IntervalNode struct {
Range TimeRange
Value interface{}
Max int64
Left *IntervalNode
Right *IntervalNode
}
type IntervalMap struct {
root *IntervalNode
}
func NewIntervalMap() *IntervalMap {
return &IntervalMap{}
}
func (im *IntervalMap) Insert(start, end int64, value interface{}) {
if im.root == nil {
im.root = &IntervalNode{
Range: TimeRange{Start: start, End: end},
Value: value,
Max: end,
}
return
}
im.root = im.insert(im.root, TimeRange{Start: start, End: end}, value)
}
func (im *IntervalMap) insert(node *IntervalNode, r TimeRange, value interface{}) *IntervalNode {
if node == nil {
return &IntervalNode{Range: r, Value: value, Max: r.End}
}
if r.Start < node.Range.Start {
node.Left = im.insert(node.Left, r, value)
} else {
node.Right = im.insert(node.Right, r, value)
}
if node.Max < r.End {
node.Max = r.End
}
return node
}
func (im *IntervalMap) Find(timestamp int64) interface{} {
if im.root == nil {
return nil
}
return im.search(im.root, timestamp)
}
func (im *IntervalMap) search(node *IntervalNode, timestamp int64) interface{} {
if node == nil {
return nil
}
// Check if current node contains the timestamp
if timestamp >= node.Range.Start && timestamp < node.Range.End {
return node.Value
}
// Search left subtree if it could contain our timestamp
if node.Left != nil && timestamp < node.Range.Start {
return im.search(node.Left, timestamp)
}
// Search right subtree
return im.search(node.Right, timestamp)
}
How we can use it?
timeMap := NewIntervalMap()
// Insert some ranges
timeMap.Insert(
time.Date(2024, 1, 1, 0, 0, 0, 0, time.UTC).Unix(),
time.Date(2024, 1, 2, 0, 0, 0, 0, time.UTC).Unix(),
"value1",
)
// Look up a timestamp
now := time.Now().Unix()
value := timeMap.Find(now)
It is beneficial because you'll get O(log n) lookup time, no redundant entries, it handles overlapping ranges correctly and it's memory efficient