Thread: [dynahash] do not refill the hashkey after hash_search

[dynahash] do not refill the hashkey after hash_search

From
Junwang Zhao
Date:
Hi hackers,

It's not necessary to fill the key field for most cases, since
hash_search has already done that for you. For developer that
using memset to zero the entry structure after enter it, fill the
key field is a must, but IMHO that is not good coding style, we
really should not touch the key field after insert it into the
dynahash.

This patch fixed some most abnormal ones, instead of refilling the
key field of primitive types, adding some assert might be a better
choice.


-- 
Regards
Junwang Zhao

Attachment

Re: [dynahash] do not refill the hashkey after hash_search

From
John Naylor
Date:

On Wed, Sep 13, 2023 at 1:47 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> Hi hackers,
>
> It's not necessary to fill the key field for most cases, since
> hash_search has already done that for you. For developer that
> using memset to zero the entry structure after enter it, fill the
> key field is a must, but IMHO that is not good coding style, we
> really should not touch the key field after insert it into the
> dynahash.

- memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
- part_entry->partoid = partOid;
+ Assert(part_entry->partoid == partOid);
+ memset(entry, 0, sizeof(LogicalRepRelMapEntry));

This is making an assumption that the non-key part of LogicalRepPartMapEntry will never get new members. Without knowing much about this code, it seems like a risk in the abstract.

> This patch fixed some most abnormal ones, instead of refilling the
> key field of primitive types, adding some assert might be a better
> choice.

Taking a quick look, I didn't happen to see any existing asserts of this sort, so the patch doesn't seem to be making things more "normal". I did see a few instances of /* hash_search already filled in the key */, so if we do anything at all here, we might prefer that.

- hash_search(uncommitted_enums, serialized++, HASH_ENTER, NULL);
+ (void) hash_search(uncommitted_enums, serialized++, HASH_ENTER, NULL);

I prefer explicit (void) for new code, but others may disagree. I don't think we have a preferred style for this, so changing current usage will just cause unnecessary code churn.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [dynahash] do not refill the hashkey after hash_search

From
Junwang Zhao
Date:
On Wed, Sep 13, 2023 at 4:22 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
>
> On Wed, Sep 13, 2023 at 1:47 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > Hi hackers,
> >
> > It's not necessary to fill the key field for most cases, since
> > hash_search has already done that for you. For developer that
> > using memset to zero the entry structure after enter it, fill the
> > key field is a must, but IMHO that is not good coding style, we
> > really should not touch the key field after insert it into the
> > dynahash.
>
> - memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
> - part_entry->partoid = partOid;
> + Assert(part_entry->partoid == partOid);
> + memset(entry, 0, sizeof(LogicalRepRelMapEntry));
>
> This is making an assumption that the non-key part of LogicalRepPartMapEntry will never get new members. Without
knowingmuch about this code, it seems like a risk in the abstract. 

What do you mean by 'the non-key part of LogicalRepPartMapEntry will
never get new members'?

typedef struct LogicalRepPartMapEntry
{
        Oid partoid; /* LogicalRepPartMap's key */
        LogicalRepRelMapEntry relmapentry;
} LogicalRepPartMapEntry;

partoid has already been filled by hash_search with HASH_ENTER action,
so I think the
above code should have the same effects.

>
> > This patch fixed some most abnormal ones, instead of refilling the
> > key field of primitive types, adding some assert might be a better
> > choice.
>
> Taking a quick look, I didn't happen to see any existing asserts of this sort, so the patch doesn't seem to be making
thingsmore "normal". I did see a few instances of /* hash_search already filled in the key */, so if we do anything at
allhere, we might prefer that. 

There are some code using assert for this sort, for example in
*ReorderBufferToastAppendChunk*:

```
ent = (ReorderBufferToastEnt *)
hash_search(txn->toast_hash, &chunk_id, HASH_ENTER, &found);

if (!found)
{
        Assert(ent->chunk_id == chunk_id);           <------- this
line, by Robert Haas
        ent->num_chunks = 0;
        ent->last_chunk_seq = 0;
```

and in *rebuild_database_list*, tom commented that the key has already
been filled, which I think
he was trying to tell people no need to assign the key again.

```
/* we assume it isn't found because the hash was just created */
db = hash_search(dbhash, &newdb, HASH_ENTER, NULL);

/* hash_search already filled in the key */         <------- this
line, by Tom Lane
db->adl_score = score++;
/* next_worker is filled in later */
```

>
> - hash_search(uncommitted_enums, serialized++, HASH_ENTER, NULL);
> + (void) hash_search(uncommitted_enums, serialized++, HASH_ENTER, NULL);
>
> I prefer explicit (void) for new code, but others may disagree. I don't think we have a preferred style for this, so
changingcurrent usage will just cause unnecessary code churn. 
>

What I am concerned about is that if we change the key after
hash_search with HASH_ENTER action, there
are chances that if we assign a wrong value, it will be impossible to
match that entry again.

> --
> John Naylor
> EDB: http://www.enterprisedb.com



--
Regards
Junwang Zhao



Re: [dynahash] do not refill the hashkey after hash_search

From
John Naylor
Date:

On Wed, Sep 13, 2023 at 3:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> On Wed, Sep 13, 2023 at 4:22 PM John Naylor
> <john.naylor@enterprisedb.com> wrote:

> > - memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
> > - part_entry->partoid = partOid;
> > + Assert(part_entry->partoid == partOid);
> > + memset(entry, 0, sizeof(LogicalRepRelMapEntry));
> >
> > This is making an assumption that the non-key part of LogicalRepPartMapEntry will never get new members. Without knowing much about this code, it seems like a risk in the abstract.
>
> What do you mean by 'the non-key part of LogicalRepPartMapEntry will
> never get new members'?

I mean, if this struct:

> typedef struct LogicalRepPartMapEntry
> {
>         Oid partoid; /* LogicalRepPartMap's key */
>         LogicalRepRelMapEntry relmapentry;
> } LogicalRepPartMapEntry;

...gets a new member, it will not get memset when memsetting "relmapentry".

> > Taking a quick look, I didn't happen to see any existing asserts of this sort, so the patch doesn't seem to be making things more "normal". I did see a few instances of /* hash_search already filled in the key */, so if we do anything at all here, we might prefer that.
>
> There are some code using assert for this sort, for example in
> *ReorderBufferToastAppendChunk*:

> and in *rebuild_database_list*, tom commented that the key has already
> been filled, which I think
> he was trying to tell people no need to assign the key again.

Okay, we have examples of each.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [dynahash] do not refill the hashkey after hash_search

From
Junwang Zhao
Date:
On Wed, Sep 13, 2023 at 5:28 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
>
> On Wed, Sep 13, 2023 at 3:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > On Wed, Sep 13, 2023 at 4:22 PM John Naylor
> > <john.naylor@enterprisedb.com> wrote:
>
> > > - memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
> > > - part_entry->partoid = partOid;
> > > + Assert(part_entry->partoid == partOid);
> > > + memset(entry, 0, sizeof(LogicalRepRelMapEntry));
> > >
> > > This is making an assumption that the non-key part of LogicalRepPartMapEntry will never get new members. Without
knowingmuch about this code, it seems like a risk in the abstract. 
> >
> > What do you mean by 'the non-key part of LogicalRepPartMapEntry will
> > never get new members'?
>
> I mean, if this struct:
>
> > typedef struct LogicalRepPartMapEntry
> > {
> >         Oid partoid; /* LogicalRepPartMap's key */
> >         LogicalRepRelMapEntry relmapentry;
> > } LogicalRepPartMapEntry;
>
> ...gets a new member, it will not get memset when memsetting "relmapentry".

ok, I see. I will leave this case as it was.

>
> > > Taking a quick look, I didn't happen to see any existing asserts of this sort, so the patch doesn't seem to be
makingthings more "normal". I did see a few instances of /* hash_search already filled in the key */, so if we do
anythingat all here, we might prefer that. 
> >
> > There are some code using assert for this sort, for example in
> > *ReorderBufferToastAppendChunk*:
>
> > and in *rebuild_database_list*, tom commented that the key has already
> > been filled, which I think
> > he was trying to tell people no need to assign the key again.
>
> Okay, we have examples of each.
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com

Add a v2 with some change to fix warnings about unused-parameter.

I will add this to Commit Fest.

--
Regards
Junwang Zhao

Attachment

Re: [dynahash] do not refill the hashkey after hash_search

From
Nathan Bossart
Date:
On Thu, Sep 14, 2023 at 04:28:26PM +0800, Junwang Zhao wrote:
> Add a v2 with some change to fix warnings about unused-parameter.
> 
> I will add this to Commit Fest.

This looks reasonable to me.  I've marked the commitfest entry as
ready-for-committer.  I will plan on committing it in a couple of days
unless John has additional feedback or would like to do the honors.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [dynahash] do not refill the hashkey after hash_search

From
John Naylor
Date:
On Fri, Oct 13, 2023 at 10:07 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Thu, Sep 14, 2023 at 04:28:26PM +0800, Junwang Zhao wrote:
> > Add a v2 with some change to fix warnings about unused-parameter.
> >
> > I will add this to Commit Fest.
>
> This looks reasonable to me.  I've marked the commitfest entry as
> ready-for-committer.  I will plan on committing it in a couple of days
> unless John has additional feedback or would like to do the honors.

(I've been offline for a few weeks, and have been catching up this week.)

I agree it's reasonable, but there are a couple small loose ends I'd
like to see addressed.

- strlcpy(hentry->name, name, sizeof(hentry->name));

This might do with a comment stating we already set the value, (we've
seen in this thread that some other code does this), but I don't feel
strongly about it.

  do
  {
- hash_search(uncommitted_enums, serialized++, HASH_ENTER, NULL);
+ (void) hash_search(uncommitted_enums, serialized++, HASH_ENTER, NULL);
  } while (OidIsValid(*serialized));

I still consider this an unrelated and unnecessary cosmetic change.

- NotificationHash *hentry;
  bool found;

- hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab,
-   &oldn,
-   HASH_ENTER,
-   &found);
+ (void) hash_search(pendingNotifies->hashtab,
+    &oldn,
+    HASH_ENTER,
+    &found);
  Assert(!found);
- hentry->event = oldn;

I'd prefer just adding "Assert(hentry->event == oldn);" and declaring
hentry PG_USED_FOR_ASSERTS_ONLY.



Re: [dynahash] do not refill the hashkey after hash_search

From
Tom Lane
Date:
John Naylor <johncnaylorls@gmail.com> writes:
> I'd prefer just adding "Assert(hentry->event == oldn);" and declaring
> hentry PG_USED_FOR_ASSERTS_ONLY.

I'm not aware of any other places where we have Asserts checking
that hash_search() honored its contract.  Why do we need one here?

            regards, tom lane



Re: [dynahash] do not refill the hashkey after hash_search

From
John Naylor
Date:
On Wed, Oct 25, 2023 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> John Naylor <johncnaylorls@gmail.com> writes:
> > I'd prefer just adding "Assert(hentry->event == oldn);" and declaring
> > hentry PG_USED_FOR_ASSERTS_ONLY.
>
> I'm not aware of any other places where we have Asserts checking
> that hash_search() honored its contract.  Why do we need one here?

[removing old CC]
The author pointed out here that we're not consistent in this regard:

https://www.postgresql.org/message-id/CAEG8a3KEO_Kdt2Y5hFNWMEX3DpCXi9jtZOJY-GFUEE9QLgF%2Bbw%40mail.gmail.com

...but I didn't try seeing where the balance lay. We can certainly
just remove redundant assignments.



Re: [dynahash] do not refill the hashkey after hash_search

From
Nathan Bossart
Date:
On Wed, Oct 25, 2023 at 12:48:52PM +0700, John Naylor wrote:
> On Wed, Oct 25, 2023 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> John Naylor <johncnaylorls@gmail.com> writes:
>> > I'd prefer just adding "Assert(hentry->event == oldn);" and declaring
>> > hentry PG_USED_FOR_ASSERTS_ONLY.
>>
>> I'm not aware of any other places where we have Asserts checking
>> that hash_search() honored its contract.  Why do we need one here?
> 
> [removing old CC]
> The author pointed out here that we're not consistent in this regard:
> 
> https://www.postgresql.org/message-id/CAEG8a3KEO_Kdt2Y5hFNWMEX3DpCXi9jtZOJY-GFUEE9QLgF%2Bbw%40mail.gmail.com
> 
> ...but I didn't try seeing where the balance lay. We can certainly
> just remove redundant assignments.

While it probably doesn't hurt anything, IMHO it's unnecessary to verify
that hash_search() works every time it is called.  This behavior seems
unlikely to change anytime soon, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [dynahash] do not refill the hashkey after hash_search

From
John Naylor
Date:
On Thu, Sep 14, 2023 at 3:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> Add a v2 with some change to fix warnings about unused-parameter.
>
> I will add this to Commit Fest.

Pushed v2 after removing asserts, as well as the unnecessary cast that
I complained about earlier.

Some advice: I was added as a reviewer in CF without my knowledge. I
see people doing it sometimes, but I don't recommend that, since the
reviewer field in CF implies the person volunteered to continue giving
feedback for patch updates.