Thread: Should we use MemSet or {0} for struct initialization?
While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet. I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).
Do we have a preference for how to initialize structures? From 9fd45870
it seems that we prefer to {0}. So here is a trivial patch doing that.
And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.
Thanks
Richard
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet. I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).
Do we have a preference for how to initialize structures? From 9fd45870
it seems that we prefer to {0}. So here is a trivial patch doing that.
And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.
Thanks
Richard
Attachment
On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote: > > While working on a bug in expandRecordVariable() I noticed that in the > switch statement for case RTE_SUBQUERY we initialize struct ParseState > with {0} while for case RTE_CTE we do that with MemSet. I understand > that there is nothing wrong with this, just cannot get away with the > inconsistency inside the same function (sorry for the nitpicking). > > Do we have a preference for how to initialize structures? From 9fd45870 > it seems that we prefer to {0}. So here is a trivial patch doing that. > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact() > can also be replaced with {0}, so include that in the patch too. > > Thanks > Richard If the struct has padding or aligned, {0} only guarantee the struct members initialized to 0, while memset sets the alignment/padding to 0 as well, but since we will not access the alignment/padding, so they give the same effect. I bet {0} should be faster since there is no function call, but I'm not 100% sure ;) -- Regards Junwang Zhao
> On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote:
> >
> > While working on a bug in expandRecordVariable() I noticed that in the
> > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > with {0} while for case RTE_CTE we do that with MemSet. I understand
> > that there is nothing wrong with this, just cannot get away with the
> > inconsistency inside the same function (sorry for the nitpicking).
> >
> > Do we have a preference for how to initialize structures? From 9fd45870
> > it seems that we prefer to {0}. So here is a trivial patch doing that.
It seems to have been deliberately left that way in the wake of that commit, see:
https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com
(If so, it deserves a comment to keep people from trying to change it...)
> > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> > can also be replaced with {0}, so include that in the patch too.
I _believe_ that's harmless to change.
On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> If the struct has padding or aligned, {0} only guarantee the struct
> members initialized to 0, while memset sets the alignment/padding
> to 0 as well, but since we will not access the alignment/padding, so
> they give the same effect.
See above -- if it's used as a hash key, for example, you must clear everything.
> I bet {0} should be faster since there is no function call, but I'm not
> 100% sure ;)
Neither has a function call. MemSet is a PG macro. You're thinking of memset, the libc library function, but a decent compiler can easily turn that into something else for fixed-size inputs.
--
John Naylor
EDB: http://www.enterprisedb.com
> >
> > While working on a bug in expandRecordVariable() I noticed that in the
> > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > with {0} while for case RTE_CTE we do that with MemSet. I understand
> > that there is nothing wrong with this, just cannot get away with the
> > inconsistency inside the same function (sorry for the nitpicking).
> >
> > Do we have a preference for how to initialize structures? From 9fd45870
> > it seems that we prefer to {0}. So here is a trivial patch doing that.
It seems to have been deliberately left that way in the wake of that commit, see:
https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com
(If so, it deserves a comment to keep people from trying to change it...)
> > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> > can also be replaced with {0}, so include that in the patch too.
I _believe_ that's harmless to change.
On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> If the struct has padding or aligned, {0} only guarantee the struct
> members initialized to 0, while memset sets the alignment/padding
> to 0 as well, but since we will not access the alignment/padding, so
> they give the same effect.
See above -- if it's used as a hash key, for example, you must clear everything.
> I bet {0} should be faster since there is no function call, but I'm not
> 100% sure ;)
Neither has a function call. MemSet is a PG macro. You're thinking of memset, the libc library function, but a decent compiler can easily turn that into something else for fixed-size inputs.
--
John Naylor
EDB: http://www.enterprisedb.com
On Thu, Aug 31, 2023 at 7:07 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > > On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote: > > > > > > While working on a bug in expandRecordVariable() I noticed that in the > > > switch statement for case RTE_SUBQUERY we initialize struct ParseState > > > with {0} while for case RTE_CTE we do that with MemSet. I understand > > > that there is nothing wrong with this, just cannot get away with the > > > inconsistency inside the same function (sorry for the nitpicking). > > > > > > Do we have a preference for how to initialize structures? From 9fd45870 > > > it seems that we prefer to {0}. So here is a trivial patch doing that. > > It seems to have been deliberately left that way in the wake of that commit, see: > > https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com > > (If so, it deserves a comment to keep people from trying to change it...) > > > > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact() > > > can also be replaced with {0}, so include that in the patch too. > > I _believe_ that's harmless to change. > > On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > If the struct has padding or aligned, {0} only guarantee the struct > > members initialized to 0, while memset sets the alignment/padding > > to 0 as well, but since we will not access the alignment/padding, so > > they give the same effect. > > See above -- if it's used as a hash key, for example, you must clear everything. Yeah, if memcmp was used as the key comparison function, there is a problem. > > > I bet {0} should be faster since there is no function call, but I'm not > > 100% sure ;) > > Neither has a function call. MemSet is a PG macro. You're thinking of memset, the libc library function, but a decent compilercan easily turn that into something else for fixed-size inputs. good to know, thanks. > > -- > John Naylor > EDB: http://www.enterprisedb.com > > -- Regards Junwang Zhao
On Thu, Aug 31, 2023 at 7:07 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote:
> >
> > While working on a bug in expandRecordVariable() I noticed that in the
> > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > with {0} while for case RTE_CTE we do that with MemSet. I understand
> > that there is nothing wrong with this, just cannot get away with the
> > inconsistency inside the same function (sorry for the nitpicking).
> >
> > Do we have a preference for how to initialize structures? From 9fd45870
> > it seems that we prefer to {0}. So here is a trivial patch doing that.
It seems to have been deliberately left that way in the wake of that commit, see:
https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com
(If so, it deserves a comment to keep people from trying to change it...)
Thanks for pointing this out. Yeah, struct initialization does not work
for some cases with padding bits, such as for a hash key we need to
clear the padding too.
The case in expandRecordVariable() mentioned here should be safe though,
maybe this is an omission from 9fd45870?
Thanks
Richard
for some cases with padding bits, such as for a hash key we need to
clear the padding too.
The case in expandRecordVariable() mentioned here should be safe though,
maybe this is an omission from 9fd45870?
Thanks
Richard
On Thu, 31 Aug 2023 at 13:35, Junwang Zhao <zhjwpku@gmail.com> wrote: > > > If the struct has padding or aligned, {0} only guarantee the struct > > > members initialized to 0, while memset sets the alignment/padding > > > to 0 as well, but since we will not access the alignment/padding, so > > > they give the same effect. > > > > See above -- if it's used as a hash key, for example, you must clear everything. > > Yeah, if memcmp was used as the key comparison function, there is a problem. The C standard says: > When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representationthat correspond to any padding bytes take unspecified values. So if you set any of the fields after a MemSet, the values of the padding bytes that were set to 0 are now unspecified. It seems much safer to actually spell out the padding fields of a hash key.
On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> wrote:
> The C standard says:
> > When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
>
> So if you set any of the fields after a MemSet, the values of the
> padding bytes that were set to 0 are now unspecified. It seems much
> safer to actually spell out the padding fields of a hash key.
No, the standard is telling you why you need to memset if consistency of padding bytes matters.
--
John Naylor
EDB: http://www.enterprisedb.com
On 2023-09-01 09:25, John Naylor wrote: > On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> > wrote: >> The C standard says: >> > When a value is stored in an object of structure or union type, >> > including in a member object, the bytes of the object representation that >> > correspond to any padding bytes take unspecified values. >> >> So if you set any of the fields after a MemSet, the values of the >> padding bytes that were set to 0 are now unspecified. It seems much >> safer to actually spell out the padding fields of a hash key. > > No, the standard is telling you why you need to memset if consistency > of > padding bytes matters. Um, I'm in no way a language lawyer for recent C specs, but the language Jelte Fennema quoted is also present in the rather old 9899 TC2 draft I still have around from years back, and in particular it does say that upon assignment, padding bytes ▶take◀ unspecified values, not merely that they retain whatever unspecified values they may have had before. There is a footnote attached (in 9899 TC2) that says "Thus, for example, structure assignment need not copy any padding bits." If that footnote example were normative, it would be reassuring, because you could assume that padding bits not copied are unchanged and remember what you originally memset() them to. So that would be nice. But everything about the form and phrasing of the footnote conveys that it isn't normative. And the normative text does appear to be saying that those padding bytes ▶take◀ unspecified values upon, assignment to the object, even if you may have memset() them before. Or at least to be saying that's what could happen, in some implementation on some architecture, and it would be standard-conformant if it did. Perhaps there is language elsewhere in the standard that pins it down to the way you've interpreted it? If you know where that language is, could you point to it? Regards, -Chap
On Fri, 1 Sept 2023 at 15:25, John Naylor <john.naylor@enterprisedb.com> wrote:
> On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> wrote:
> > The C standard says:
> > > When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
> >
> > So if you set any of the fields after a MemSet, the values of the
> > padding bytes that were set to 0 are now unspecified. It seems much
> > safer to actually spell out the padding fields of a hash key.
>
> No, the standard is telling you why you need to memset if consistency of padding bytes matters.
Maybe I'm misunderstanding the sentence from the C standard I quoted. But under my interpretation it means that even an assignment to a field of a struct causes the padding bytes to take unspecified (but not undefined) values, because of the "including in a member object" part of the sentence. It's ofcourse possible that all compilers relevant to Postgres never actually change padding when assigning to a field.
> On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> wrote:
> > The C standard says:
> > > When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
> >
> > So if you set any of the fields after a MemSet, the values of the
> > padding bytes that were set to 0 are now unspecified. It seems much
> > safer to actually spell out the padding fields of a hash key.
>
> No, the standard is telling you why you need to memset if consistency of padding bytes matters.
Maybe I'm misunderstanding the sentence from the C standard I quoted. But under my interpretation it means that even an assignment to a field of a struct causes the padding bytes to take unspecified (but not undefined) values, because of the "including in a member object" part of the sentence. It's ofcourse possible that all compilers relevant to Postgres never actually change padding when assigning to a field.
On 31.08.23 10:32, Richard Guo wrote: > While working on a bug in expandRecordVariable() I noticed that in the > switch statement for case RTE_SUBQUERY we initialize struct ParseState > with {0} while for case RTE_CTE we do that with MemSet. I understand > that there is nothing wrong with this, just cannot get away with the > inconsistency inside the same function (sorry for the nitpicking). > > Do we have a preference for how to initialize structures? From 9fd45870 > it seems that we prefer to {0}. So here is a trivial patch doing that. > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact() > can also be replaced with {0}, so include that in the patch too. The first part (parse_target.c) was already addressed by e0e492e5a9. I have applied the second part (pgstatfuncs.c).
On Tue, Sep 19, 2023 at 5:37 PM Peter Eisentraut <peter@eisentraut.org> wrote:
On 31.08.23 10:32, Richard Guo wrote:
> While working on a bug in expandRecordVariable() I noticed that in the
> switch statement for case RTE_SUBQUERY we initialize struct ParseState
> with {0} while for case RTE_CTE we do that with MemSet. I understand
> that there is nothing wrong with this, just cannot get away with the
> inconsistency inside the same function (sorry for the nitpicking).
>
> Do we have a preference for how to initialize structures? From 9fd45870
> it seems that we prefer to {0}. So here is a trivial patch doing that.
> And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> can also be replaced with {0}, so include that in the patch too.
The first part (parse_target.c) was already addressed by e0e492e5a9. I
have applied the second part (pgstatfuncs.c).
Thanks for pushing this.
Thanks
Richard
Thanks
Richard