Safer hash table initialization macro - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Safer hash table initialization macro
Date
Msg-id aS2b3LoUypW1/Gdz@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
Responses Re: Safer hash table initialization macro
List pgsql-hackers
Hi hackers,

Currently to create a hash table we do things like:

A) create a struct, say:

 typedef struct SeenRelsEntry
 {
    Oid   rel_id;
    int   list_index;
 } SeenRelsEntry;

where the first member is the hash key, and then later:

B)

 ctl.keysize = sizeof(Oid);
 ctl.entrysize = sizeof(SeenRelsEntry);
 ctl.hcxt = CurrentMemoryContext;

 seen_rels = hash_create("find_all_inheritors temporary table",
                         32, /* start small and extend */
                         &ctl,

I can see 2 possible issues:

1)

We manually specify the type for keysize, which could become incorrect (from the
start) or if the key member's type changes.

2) 

It may be possible to remove the key member without the compiler noticing it.

Take this example and remove:

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 929bb53b620..eb11976afef 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -36,7 +36,6 @@
  */
 typedef struct SeenRelsEntry
 {
-       Oid                     rel_id;                 /* relation oid */
        int                     list_index;             /* its position in output list(s) */
 } SeenRelsEntry;

That would compile without any issues because this rel_id member is not
referenced in the code (for this particular example). That's rare but possible.

But then, on my machine, during make check:

TRAP: failed Assert("!found"), File: "nodeModifyTable.c", Line: 5157, PID: 140430

The reason is that the struct member access is done only for bytes level
operations (within the hash related macros). So it's easy to think that this
member is unused (because it is not referenced in the code).

I'm thinking about what kind of safety we could put in place to better deal with
1) and 2).

What about adding a macro that:

- requests the key member name
- ensures that it is at offset 0
- computes the key size based on the member

Something like:

"
#define HASH_ELEM_INIT(ctl, entrytype, keymember) \
    do { \
        StaticAssertStmt(offsetof(entrytype, keymember) == 0, \
                        #keymember " must be first member in " #entrytype); \
        (ctl).keysize = sizeof(((entrytype *)0)->keymember); \
        (ctl).entrysize = sizeof(entrytype); \
    } while (0)
"

That way:

- The key member is explicitly referenced in the code (preventing "unused"
false positives)
- The key size is automatically computed from the actual member type (preventing
type mismatches)
- We enforce that the key is at offset 0

An additional benefit: it avoids repeating the "keysize ="  followed by "entrysize ="
in a lot of places in the code (currently about 100 times).

If that sounds like a good idea, I could work on a patch doing so.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Migrate to autoconf 2.72?
Next
From: Philipp Marek
Date:
Subject: Re: [PATCH] Better Performance for PostgreSQL with large INSERTs