I've got a data type that consists of actual data and an id:
struct Data {
int value;
int DataId;
};
I've got a collection of these data from which I have to frequently get an element based on its id and modify its value (but not it's ID).
I thus chose a std::set
in order to have an efficient sort, with the following comparator:
class Comparator {
/// dummy type for declaring is_transparent type
struct SIsTransparentTag;
public:
// tag (falsely) indicating that operator() is transparent, i.e. accepts
// any type of argument: this is a workaround to allow std::set::find to
// use a int key instead of a full Data object.
//
// /
using is_transparent = SIsTransparentTag;
// comparator between two Data
bool operator()(const Data &lhs, const Data &rhs) const {
return lhs.DataId < rhs.DataId;
}
// comparator between a Data and an id
bool operator()(const Data &lhs, const int id) const {
return lhs.DataId < id;
}
// comparator between an id and a Data
bool operator()(const int id, const Data &rhs) const {
return id < rhs.DataId;
}
};
Then I wrote a read-write getter:
Data* GetRWDataFromID(std::set<Data,Comparator> &collection,int id)
{
auto it = collection.find(id);
if (it == std::end(collection))
{
return nullptr;
}
// UB as soon as I will modify Data!!!
return const_cast<Data *>(&(*it));
}
Playground
std::set::find()
only returns an iterator to a non-mutable element. However, I only want to be able to modify the value
member, thus never changing the set order.
This design is broken because removing constness of an object and modifying it is undefined behavior: .
How can I fix my design while keeping an efficient search (typically logarithmic) and letting the id attached to a data (when I get a data, I want to get its id from it, not from some book-keeping; thus I'd like to avoid a std::map
)?
A bonus would be to have DataId
non-mutable once a Data
object is created (for this one, I imagine I can make it private and use read-only getters?).
I've got a data type that consists of actual data and an id:
struct Data {
int value;
int DataId;
};
I've got a collection of these data from which I have to frequently get an element based on its id and modify its value (but not it's ID).
I thus chose a std::set
in order to have an efficient sort, with the following comparator:
class Comparator {
/// dummy type for declaring is_transparent type
struct SIsTransparentTag;
public:
// tag (falsely) indicating that operator() is transparent, i.e. accepts
// any type of argument: this is a workaround to allow std::set::find to
// use a int key instead of a full Data object.
// https://en.cppreference/w/cpp/container/set/find
// https://www.fluentcpp/2017/06/09/search-set-another-type-key/
using is_transparent = SIsTransparentTag;
// comparator between two Data
bool operator()(const Data &lhs, const Data &rhs) const {
return lhs.DataId < rhs.DataId;
}
// comparator between a Data and an id
bool operator()(const Data &lhs, const int id) const {
return lhs.DataId < id;
}
// comparator between an id and a Data
bool operator()(const int id, const Data &rhs) const {
return id < rhs.DataId;
}
};
Then I wrote a read-write getter:
Data* GetRWDataFromID(std::set<Data,Comparator> &collection,int id)
{
auto it = collection.find(id);
if (it == std::end(collection))
{
return nullptr;
}
// UB as soon as I will modify Data!!!
return const_cast<Data *>(&(*it));
}
Playground
std::set::find()
only returns an iterator to a non-mutable element. However, I only want to be able to modify the value
member, thus never changing the set order.
This design is broken because removing constness of an object and modifying it is undefined behavior: https://en.cppreference/w/cpp/language/const_cast.
How can I fix my design while keeping an efficient search (typically logarithmic) and letting the id attached to a data (when I get a data, I want to get its id from it, not from some book-keeping; thus I'd like to avoid a std::map
)?
A bonus would be to have DataId
non-mutable once a Data
object is created (for this one, I imagine I can make it private and use read-only getters?).
4 Answers
Reset to default 4Just use std::map
directly.
when I get a data, I want to get its id from it, not from some book-keeping; thus I'd like to avoid a
std::map
Inside std::map
, it store the key and value as a std::pair
, so you do not need any book-keeping to get both key and value.
Further reading: more information about std::map
You might (ab)use of mutable
:
struct Data {
mutable int value;
int DataId;
};
then you can return a const Data*
and modify its value.
Caveat is that you no longer have direct possibility to have "const
" value
.
One way to do this is to make your data value mutable
like this:
struct Data {
mutable int value;
int DataId;
};
Then return a const*
as normal:
const Data* GetRWDataFromID(std::set<Data,Comparator> &collection,int id)
{
auto it = collection.find(id);
if (it == std::end(collection))
{
return nullptr;
}
// UB as soon as I will modify Data!!!
return &(*it);
}
Now you can still modify the mutable
element.
Alternatively you can use a different container and manage the order yourself.
Something a bit like this:
// some utility functions to insert and remove sorted
// unique values from a vector
auto insert_unique(std::vector<Data>& v, Data&& d)
-> std::pair<std::vector<Data>::iterator, bool>
{
auto lb = std::lower_bound(std::begin(v), std::end(v), d, Comparator());
if(lb != std::end(v) && lb->DataId == d.DataId)
return {lb, false};
return {v.insert(lb, std::forward<Data>(d)), true};
}
auto insert_unique(std::vector<Data>& v, Data const& d)
{ auto copy = d; return insert_unique(v, std::move(copy)); }
auto remove_unique(std::vector<Data>& v, Data const& d)
-> std::pair<std::vector<Data>::iterator, bool>
{
auto lb = std::lower_bound(std::begin(v), std::end(v), d, Comparator());
if(lb != std::end(v) && lb->DataId == d.DataId)
return {v.erase(lb), true};
return {std::end(v), false};
}
int main()
{
std::vector<Data> v;
Data d{1, 2};
insert_unique(v, d);
// do a binary search
auto lb = std::lower_bound(std::begin(v), std::end(v), d, Comparator());
if(lb != std::end(v)) // was it found?
lb->value = 9;
}
Of course that extra code can be wrapped up in a "uniquely sorted vector" class.
This answer does not give a direct answer but provides some hints towards a better performance for querying if it is a bottleneck for you. It also supposes that the collection is immutable, ie. you can look for items in the collection after every items have been put into.
Actually, std::set::find
is logarithmic in the size of the container, but you can almost reach a lookup in constant time if the collection can't grow. The idea is to store all the Data
objects in a std::vector
(which should be optimal in terms of space) and to store the keys in a succinct data structure such as Elias Fano that provides rank/select operation in nearly constant time (and with a little extra memory usage explained in the link above).
Then, from one given key k
, you can find what is the position of the corresponding entry (if any) in the vector, which can be done with rank/select operations.
Here is a sketch of a IndexedVector
class that uses the succinct library:
template<typename T, auto getkey>
class IndexedVector
{
public:
using key_type = decltype(getkey(T{}));
template<typename Range>
IndexedVector (Range range, key_type maxKey=std::numeric_limits<key_type>::max())
{
succinct::elias_fano::elias_fano_builder bvb (maxKey, range.size());
// We build both the vector of items and the Elian Fano structure holding the keys
for (auto const& item : range)
{
bvb.push_back (getkey(item));
items_.push_back (item);
}
succinct::elias_fano tmp(&bvb);
bitmap_.swap (tmp);
}
std::optional<T> find (key_type k)
{
if (bitmap_[k]) { return items_[bitmap_.rank(k+1)-1]; }
else { return {}; }
}
private:
std::vector<T> items_; // holds the items
succinct::elias_fano bitmap_; // holds the bitset containing the keys
};
Demo
With this kind of approach, one can observe a speedup (compared to std::set::find
) ranging from 3x to 10x in function of the number of items in the collection.
const_cast
has removed constness is UB only if the object wasconst
to begin with. But that is not the case here. You simply have aconst
reference/pointer to a non-const (mutable) object, so removing the constness from the reference/pointer is not UB. – Remy Lebeau Commented Mar 21 at 22:51std::map
? You get aData
from somewhere and then you can add it to your map likemy_map[data.id] = data.value;
– NathanOliver Commented Mar 21 at 23:07std::map
– catnip Commented Mar 21 at 23:56