Thread: Should we use MemSet or {0} for struct initialization?

Should we use MemSet or {0} for struct initialization?

From
Richard Guo
Date:
While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet.  I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures?  From 9fd45870
it seems that we prefer to {0}.  So here is a trivial patch doing that.
And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.

Thanks
Richard
Attachment

Re: Should we use MemSet or {0} for struct initialization?

From
Junwang Zhao
Date:
On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
> While working on a bug in expandRecordVariable() I noticed that in the
> switch statement for case RTE_SUBQUERY we initialize struct ParseState
> with {0} while for case RTE_CTE we do that with MemSet.  I understand
> that there is nothing wrong with this, just cannot get away with the
> inconsistency inside the same function (sorry for the nitpicking).
>
> Do we have a preference for how to initialize structures?  From 9fd45870
> it seems that we prefer to {0}.  So here is a trivial patch doing that.
> And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> can also be replaced with {0}, so include that in the patch too.
>
> Thanks
> Richard

If the struct has padding or aligned, {0} only guarantee the struct
members initialized to 0, while memset sets the alignment/padding
to 0 as well, but since we will not access the alignment/padding, so
they give the same effect.

I bet {0} should be faster since there is no function call, but I'm not
100% sure ;)


--
Regards
Junwang Zhao



Re: Should we use MemSet or {0} for struct initialization?

From
John Naylor
Date:
> On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote:
> >
> > While working on a bug in expandRecordVariable() I noticed that in the
> > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > with {0} while for case RTE_CTE we do that with MemSet.  I understand
> > that there is nothing wrong with this, just cannot get away with the
> > inconsistency inside the same function (sorry for the nitpicking).
> >
> > Do we have a preference for how to initialize structures?  From 9fd45870
> > it seems that we prefer to {0}.  So here is a trivial patch doing that.

It seems to have been deliberately left that way in the wake of that commit, see:

https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com

(If so, it deserves a comment to keep people from trying to change it...)

> > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> > can also be replaced with {0}, so include that in the patch too.

I _believe_ that's harmless to change.

On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

> If the struct has padding or aligned, {0} only guarantee the struct
> members initialized to 0, while memset sets the alignment/padding
> to 0 as well, but since we will not access the alignment/padding, so
> they give the same effect.

See above -- if it's used as a hash key, for example, you must clear everything.

> I bet {0} should be faster since there is no function call, but I'm not
> 100% sure ;)

Neither has a function call. MemSet is a PG macro. You're thinking of memset, the libc library function, but a decent compiler can easily turn that into something else for fixed-size inputs.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Should we use MemSet or {0} for struct initialization?

From
Junwang Zhao
Date:
On Thu, Aug 31, 2023 at 7:07 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> > On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote:
> > >
> > > While working on a bug in expandRecordVariable() I noticed that in the
> > > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > > with {0} while for case RTE_CTE we do that with MemSet.  I understand
> > > that there is nothing wrong with this, just cannot get away with the
> > > inconsistency inside the same function (sorry for the nitpicking).
> > >
> > > Do we have a preference for how to initialize structures?  From 9fd45870
> > > it seems that we prefer to {0}.  So here is a trivial patch doing that.
>
> It seems to have been deliberately left that way in the wake of that commit, see:
>
> https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com
>
> (If so, it deserves a comment to keep people from trying to change it...)
>
> > > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> > > can also be replaced with {0}, so include that in the patch too.
>
> I _believe_ that's harmless to change.
>
> On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> > If the struct has padding or aligned, {0} only guarantee the struct
> > members initialized to 0, while memset sets the alignment/padding
> > to 0 as well, but since we will not access the alignment/padding, so
> > they give the same effect.
>
> See above -- if it's used as a hash key, for example, you must clear everything.

Yeah, if memcmp was used as the key comparison function, there is a problem.

>
> > I bet {0} should be faster since there is no function call, but I'm not
> > 100% sure ;)
>
> Neither has a function call. MemSet is a PG macro. You're thinking of memset, the libc library function, but a decent
compilercan easily turn that into something else for fixed-size inputs. 

good to know, thanks.

>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>
>


--
Regards
Junwang Zhao



Re: Should we use MemSet or {0} for struct initialization?

From
Richard Guo
Date:

On Thu, Aug 31, 2023 at 7:07 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote:
> >
> > While working on a bug in expandRecordVariable() I noticed that in the
> > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > with {0} while for case RTE_CTE we do that with MemSet.  I understand
> > that there is nothing wrong with this, just cannot get away with the
> > inconsistency inside the same function (sorry for the nitpicking).
> >
> > Do we have a preference for how to initialize structures?  From 9fd45870
> > it seems that we prefer to {0}.  So here is a trivial patch doing that.

It seems to have been deliberately left that way in the wake of that commit, see:

https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com

(If so, it deserves a comment to keep people from trying to change it...)

Thanks for pointing this out.  Yeah, struct initialization does not work
for some cases with padding bits, such as for a hash key we need to
clear the padding too.

The case in expandRecordVariable() mentioned here should be safe though,
maybe this is an omission from 9fd45870?

Thanks
Richard

Re: Should we use MemSet or {0} for struct initialization?

From
Jelte Fennema
Date:
On Thu, 31 Aug 2023 at 13:35, Junwang Zhao <zhjwpku@gmail.com> wrote:
> > > If the struct has padding or aligned, {0} only guarantee the struct
> > > members initialized to 0, while memset sets the alignment/padding
> > > to 0 as well, but since we will not access the alignment/padding, so
> > > they give the same effect.
> >
> > See above -- if it's used as a hash key, for example, you must clear everything.
>
> Yeah, if memcmp was used as the key comparison function, there is a problem.

The C standard says:
> When a value is stored in an object of structure or union type, including in a member object, the bytes of the object
representationthat correspond to any padding bytes take unspecified values.
 

So if you set any of the fields after a MemSet, the values of the
padding bytes that were set to 0 are now unspecified. It seems much
safer to actually spell out the padding fields of a hash key.



Re: Should we use MemSet or {0} for struct initialization?

From
John Naylor
Date:

On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> wrote:

> The C standard says:
> > When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
>
> So if you set any of the fields after a MemSet, the values of the
> padding bytes that were set to 0 are now unspecified. It seems much
> safer to actually spell out the padding fields of a hash key.

No, the standard is telling you why you need to memset if consistency of padding bytes matters.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Should we use MemSet or {0} for struct initialization?

From
Chapman Flack
Date:
On 2023-09-01 09:25, John Naylor wrote:
> On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> 
> wrote:
>> The C standard says:
>> > When a value is stored in an object of structure or union type,
>> > including in a member object, the bytes of the object representation that
>> > correspond to any padding bytes take unspecified values.
>> 
>> So if you set any of the fields after a MemSet, the values of the
>> padding bytes that were set to 0 are now unspecified. It seems much
>> safer to actually spell out the padding fields of a hash key.
> 
> No, the standard is telling you why you need to memset if consistency 
> of
> padding bytes matters.

Um, I'm in no way a language lawyer for recent C specs, but the language
Jelte Fennema quoted is also present in the rather old 9899 TC2 draft
I still have around from years back, and in particular it does say
that upon assignment, padding bytes ▶take◀ unspecified values, not
merely that they retain whatever unspecified values they may have had
before. There is a footnote attached (in 9899 TC2) that says "Thus,
for example, structure assignment need not copy any padding bits."
If that footnote example were normative, it would be reassuring,
because you could assume that padding bits not copied are unchanged
and remember what you originally memset() them to. So that would be
nice. But everything about the form and phrasing of the footnote
conveys that it isn't normative. And the normative text does appear
to be saying that those padding bytes ▶take◀ unspecified values upon,
assignment to the object, even if you may have memset() them before.
Or at least to be saying that's what could happen, in some 
implementation
on some architecture, and it would be standard-conformant if it did.

Perhaps there is language elsewhere in the standard that pins it down
to the way you've interpreted it? If you know where that language
is, could you point to it?

Regards,
-Chap



Re: Should we use MemSet or {0} for struct initialization?

From
Jelte Fennema
Date:
On Fri, 1 Sept 2023 at 15:25, John Naylor <john.naylor@enterprisedb.com> wrote:
> On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> wrote:
> > The C standard says:
> > > When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
> >
> > So if you set any of the fields after a MemSet, the values of the
> > padding bytes that were set to 0 are now unspecified. It seems much
> > safer to actually spell out the padding fields of a hash key.
>
> No, the standard is telling you why you need to memset if consistency of padding bytes matters.

Maybe I'm misunderstanding the sentence from the C standard I quoted. But under my interpretation it means that even an assignment to a field of a struct causes the padding bytes to take unspecified (but not undefined) values, because of the "including in a member object" part of the sentence. It's ofcourse possible that all compilers relevant to Postgres never actually change padding when assigning to a field.

Re: Should we use MemSet or {0} for struct initialization?

From
Peter Eisentraut
Date:
On 31.08.23 10:32, Richard Guo wrote:
> While working on a bug in expandRecordVariable() I noticed that in the
> switch statement for case RTE_SUBQUERY we initialize struct ParseState
> with {0} while for case RTE_CTE we do that with MemSet.  I understand
> that there is nothing wrong with this, just cannot get away with the
> inconsistency inside the same function (sorry for the nitpicking).
> 
> Do we have a preference for how to initialize structures?  From 9fd45870
> it seems that we prefer to {0}.  So here is a trivial patch doing that.
> And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> can also be replaced with {0}, so include that in the patch too.

The first part (parse_target.c) was already addressed by e0e492e5a9.  I 
have applied the second part (pgstatfuncs.c).




Re: Should we use MemSet or {0} for struct initialization?

From
Richard Guo
Date:

On Tue, Sep 19, 2023 at 5:37 PM Peter Eisentraut <peter@eisentraut.org> wrote:
On 31.08.23 10:32, Richard Guo wrote:
> While working on a bug in expandRecordVariable() I noticed that in the
> switch statement for case RTE_SUBQUERY we initialize struct ParseState
> with {0} while for case RTE_CTE we do that with MemSet.  I understand
> that there is nothing wrong with this, just cannot get away with the
> inconsistency inside the same function (sorry for the nitpicking).
>
> Do we have a preference for how to initialize structures?  From 9fd45870
> it seems that we prefer to {0}.  So here is a trivial patch doing that.
> And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> can also be replaced with {0}, so include that in the patch too.

The first part (parse_target.c) was already addressed by e0e492e5a9.  I
have applied the second part (pgstatfuncs.c).

Thanks for pushing this.

Thanks
Richard