最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

arrays - Does exist a standardized string table implementation ? Is my implementation in C good ? Looking to discuss topic - Sta

programmeradmin5浏览0评论

I'm developing a C library to write a custom binary format to store levels for a game.

I had a problem about optimizing the usage of strings for this file, I have two use cases of strings in this file: cells on a grid for tiles and metadata.

The cells specify how the game world look like and ask 3 strings for what texture to display for a floor, walls and ceiling; then the metadata is key/value pairs. With my first attempts creating a big grid filled with the same strings for the cells ends up into a file that grows exponentially, that's not good. I need to optimize this.

My attempt

The best case would be to write the string once, so I thought of a string table placed at the end of the file, each string is stored with an unique integer key. I haven't found any stardardized implementation of a string table so I've made my own.

I've made a struct that implements a dynamic set of entries, each entry contains the string with his hash and a generated unique key. The hash algorithm is really stupid, it adds the first 50 characters into one big number and collision are handled by comparing two strings with matching hashes, it's good enough.

the structure looks like this:

struct sentry_s
{
    char *value;
    uint32_t key, key_hash;
    size_t use_count; // when hit zero entry is removed.
}

// a SET that store unique strings with a key.
struct stable_s
{
    struct sentry_s *entries;
    size_t count, capacity;
}

// actual func names are longer, ofc.
struct sentry_s* tput (struct stable_s *table, char *string);
struct sentry_s* tget (struct stable_s *table, uint32_t key);
void trem (struct stable_s *table, uint32_t key);

I do really like this structure because I can use binary search to increase speed for one of these things:

  • Order the array by keys will increase reading and erase speed by his key.
  • Order the array by hash will increase inserting speed by the string value.

I want to increase reading speed to prioritize the game loading these levels.

The question

I came here to ask if there is a better implementation for a string table, what would other devs do? Do exists papers that specify a standardized string table? Could this structure be improved to be both fast at reading and writing?

I'm interested on the topic and it's not the first time I've seen the use string tables, for example compiled code have a string table for static strings.

The actual question: Is the above diatribe from:

  1. Markov chain
  2. Generative AI
  3. College Dare bet .

I'm developing a C library to write a custom binary format to store levels for a game.

I had a problem about optimizing the usage of strings for this file, I have two use cases of strings in this file: cells on a grid for tiles and metadata.

The cells specify how the game world look like and ask 3 strings for what texture to display for a floor, walls and ceiling; then the metadata is key/value pairs. With my first attempts creating a big grid filled with the same strings for the cells ends up into a file that grows exponentially, that's not good. I need to optimize this.

My attempt

The best case would be to write the string once, so I thought of a string table placed at the end of the file, each string is stored with an unique integer key. I haven't found any stardardized implementation of a string table so I've made my own.

I've made a struct that implements a dynamic set of entries, each entry contains the string with his hash and a generated unique key. The hash algorithm is really stupid, it adds the first 50 characters into one big number and collision are handled by comparing two strings with matching hashes, it's good enough.

the structure looks like this:

struct sentry_s
{
    char *value;
    uint32_t key, key_hash;
    size_t use_count; // when hit zero entry is removed.
}

// a SET that store unique strings with a key.
struct stable_s
{
    struct sentry_s *entries;
    size_t count, capacity;
}

// actual func names are longer, ofc.
struct sentry_s* tput (struct stable_s *table, char *string);
struct sentry_s* tget (struct stable_s *table, uint32_t key);
void trem (struct stable_s *table, uint32_t key);

I do really like this structure because I can use binary search to increase speed for one of these things:

  • Order the array by keys will increase reading and erase speed by his key.
  • Order the array by hash will increase inserting speed by the string value.

I want to increase reading speed to prioritize the game loading these levels.

The question

I came here to ask if there is a better implementation for a string table, what would other devs do? Do exists papers that specify a standardized string table? Could this structure be improved to be both fast at reading and writing?

I'm interested on the topic and it's not the first time I've seen the use string tables, for example compiled code have a string table for static strings.

The actual question: Is the above diatribe from:

  1. Markov chain
  2. Generative AI
  3. College Dare bet .
Share Improve this question edited Jan 19 at 0:24 mevets 10.4k1 gold badge23 silver badges35 bronze badges asked Jan 18 at 8:28 maggmagg 451 silver badge6 bronze badges 3
  • 1 If your code is working but slow, then this is more properly asked on the StackExchange site Code Review Stack Exchange. That said, a hash-table of strings isn't a bad idea for lookup speed. You want to keep your load-factor less than 0.6. The openSSL lh_strhash() is a very good string hashing function that will minimize collision for most sane strings. A Minimal Complete Reproducible Example is needed. – David C. Rankin Commented Jan 18 at 8:43
  • The other thought is if the set of strings (textures) is fixed for startup, you can simply create the strings once and then create a simple array of pointers where the 1-D index is mapped into your 2D grid for display. You would set up the array of pointers and then just iterate once mapping into your 2D grid. That would eliminate the table overhead and the single iteration and index map would be very efficient. If the strings aren't fixed, then John has a solid approach in his answer. – David C. Rankin Commented Jan 18 at 8:55
  • Unfortunately for you this site is not for discussions, it are rather explicitly off-topic. You might want to find some forum or another SE site. – the busybee Commented Jan 18 at 8:55
Add a comment  | 

1 Answer 1

Reset to default 1

First, don't malloc() each individual string. Instead, malloc() one giant array of characters and fill it with all the strings concatenated with null terminators between them. Like this:

hello world\0foo\0bar\0

Next, use the starting address of any string as its ID:

hello world\0foo\0bar\0
01234567890 1234 5678 9

So "foo" has ID 12.

Your use case doesn't seem to need support for erasing strings, and you can append any time (use realloc() to double the table capacity if needed).

The IDs should be stored in the smallest integer type that makes sense. Probably uint16_t or uint32_t. If you want a special value you can store a null terminator at the start and its ID will be 0.

Binary search will still work, but if you need O(1) lookup by string you can add a hash table where each entry is (string_hash, ID). Then to do lookup by string you hash the string, get its ID from the hash table, and use that ID to get the full string to compare (to confirm it's not a hash collision). This way you don't need to store all the strings a second time, and the hash table only contains integers so the entire solution only has two heap allocations in use at any time. The string hash can probably be 32 bits, using a full 64 bit hash is probably overkill if you use a standard string hash function instead of your weak one.

Don't bother writing your own hash table, there are many great existing ones.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论