Home > mailing lists

Safer hash table initialization macro - Mailing list pgsql-hackers

From	Bertrand Drouvot
Subject	Safer hash table initialization macro
Date	December 1, 2025 16:45:00
Msg-id	aS2b3LoUypW1/Gdz@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw
Responses	Re: Safer hash table initialization macro
List	pgsql-hackers

Tree view

Hi hackers,

Currently to create a hash table we do things like:

A) create a struct, say:

 typedef struct SeenRelsEntry
 {
    Oid   rel_id;
    int   list_index;
 } SeenRelsEntry;

where the first member is the hash key, and then later:

B)

 ctl.keysize = sizeof(Oid);
 ctl.entrysize = sizeof(SeenRelsEntry);
 ctl.hcxt = CurrentMemoryContext;

 seen_rels = hash_create("find_all_inheritors temporary table",
                         32, /* start small and extend */
                         &ctl,

I can see 2 possible issues:

1)

We manually specify the type for keysize, which could become incorrect (from the
start) or if the key member's type changes.

2) 

It may be possible to remove the key member without the compiler noticing it.

Take this example and remove:

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 929bb53b620..eb11976afef 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -36,7 +36,6 @@
  */
 typedef struct SeenRelsEntry
 {
-       Oid                     rel_id;                 /* relation oid */
        int                     list_index;             /* its position in output list(s) */
 } SeenRelsEntry;

That would compile without any issues because this rel_id member is not
referenced in the code (for this particular example). That's rare but possible.

But then, on my machine, during make check:

TRAP: failed Assert("!found"), File: "nodeModifyTable.c", Line: 5157, PID: 140430

The reason is that the struct member access is done only for bytes level
operations (within the hash related macros). So it's easy to think that this
member is unused (because it is not referenced in the code).

I'm thinking about what kind of safety we could put in place to better deal with
1) and 2).

What about adding a macro that:

- requests the key member name
- ensures that it is at offset 0
- computes the key size based on the member

Something like:

"
#define HASH_ELEM_INIT(ctl, entrytype, keymember) \
    do { \
        StaticAssertStmt(offsetof(entrytype, keymember) == 0, \
                        #keymember " must be first member in " #entrytype); \
        (ctl).keysize = sizeof(((entrytype *)0)->keymember); \
        (ctl).entrysize = sizeof(entrytype); \
    } while (0)
"

That way:

- The key member is explicitly referenced in the code (preventing "unused"
false positives)
- The key size is automatically computed from the actual member type (preventing
type mismatches)
- We enforce that the key is at offset 0

An additional benefit: it avoids repeating the "keysize ="  followed by "entrysize ="
in a lot of places in the code (currently about 100 times).

If that sounds like a good idea, I could work on a patch doing so.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Pavel Stehule
Date: 01 December 2025, 16:40:08
Subject: Re: Migrate to autoconf 2.72?

From: Philipp Marek
Date: 01 December 2025, 16:55:27
Subject: Re: [PATCH] Better Performance for PostgreSQL with large INSERTs

Safer hash table initialization macro - Mailing list pgsql-hackers

Previous

Next