Thread: Strange coding in mvdistinct.c

Strange coding in mvdistinct.c

From
Tom Lane
Date:
In the wake of the discussion at [1] I went looking for structs that
should be using FLEXIBLE_ARRAY_MEMBER and are not, by dint of grepping
for size calculations of the form "offsetof(struct,fld) + n * sizeof(...)"
and then seeing how "fld" is declared.  I haven't yet found anything
like that that I want to change, but I did come across this bit in
mvdistinct.c's statext_ndistinct_serialize():

    len = VARHDRSZ + SizeOfMVNDistinct +
        ndistinct->nitems * (offsetof(MVNDistinctItem, attrs) + sizeof(int));

Given the way that the subsequent code looks, I would argue that
offsetof(MVNDistinctItem, attrs) has got basically nothing to do with
this calculation, and that the right way to phrase it is just

    len = VARHDRSZ + SizeOfMVNDistinct +
        ndistinct->nitems * (sizeof(double) + sizeof(int));

Consider if there happened to be alignment padding in MVNDistinctItem:
as the code stands it'd overestimate the space needed.  (There won't be
padding on any machine we support, I believe, so this isn't a live bug ---
but it's overcomplicated code, and could become buggy if any
less-than-double-width fields get added to MVNDistinctItem.)

For largely the same reason, I do not think that SizeOfMVNDistinct is
a helpful way to compute the space needed for those fields --- any
alignment padding that might be included is irrelevant for this purpose.
In short I'd be inclined to phrase this just as

    len = VARHDRSZ + 3 * sizeof(uint32) +
        ndistinct->nitems * (sizeof(double) + sizeof(int));

It looks to me actually like all the uses of both SizeOfMVNDistinctItem
and SizeOfMVNDistinct are wrong, because the code using those symbols
is really thinking about the size of this serialized representation,
which is guaranteed not to have any inter-field padding, unlike the
structs.

Thoughts?

            regards, tom lane

[1] https://postgr.es/m/a620f85a-42ab-e0f3-3337-b04b97e2e2f5@redhat.com



Re: Strange coding in mvdistinct.c

From
Tom Lane
Date:
Oh, and as I continue to grep, I found this in dependencies.c:

            dependencies = (MVDependencies *) repalloc(dependencies,
                                                       offsetof(MVDependencies, deps)
                                                       + dependencies->ndeps * sizeof(MVDependency));

I'm pretty sure this is an actual bug: the calculation should be

                       offsetof(MVDependencies, deps)
                       + dependencies->ndeps * sizeof(MVDependency *));

because deps is an array of MVDependency* not MVDependency.

This would lead to an overallocation not underallocation, and it's
probably pretty harmless because ndeps can't get too large (I hope;
if it could, this would have O(N^2) performance problems).  Still,
you oughta fix it.

(There's a similar calculation later in the file that gets it right.)

            regards, tom lane



Re: Strange coding in mvdistinct.c

From
Tomas Vondra
Date:
On Mon, Apr 15, 2019 at 06:12:24PM -0400, Tom Lane wrote:
>Oh, and as I continue to grep, I found this in dependencies.c:
>
>            dependencies = (MVDependencies *) repalloc(dependencies,
>                                                       offsetof(MVDependencies, deps)
>                                                       + dependencies->ndeps * sizeof(MVDependency));
>
>I'm pretty sure this is an actual bug: the calculation should be
>
>                       offsetof(MVDependencies, deps)
>                       + dependencies->ndeps * sizeof(MVDependency *));
>
>because deps is an array of MVDependency* not MVDependency.
>
>This would lead to an overallocation not underallocation, and it's
>probably pretty harmless because ndeps can't get too large (I hope;
>if it could, this would have O(N^2) performance problems).  Still,
>you oughta fix it.
>
>(There's a similar calculation later in the file that gets it right.)
>

Thanks. I noticed some of the bugs while investigating the recent MCV
serialization, and I plan to fix them soon. This week, hopefully.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Strange coding in mvdistinct.c

From
Tomas Vondra
Date:
On Mon, Apr 15, 2019 at 06:12:24PM -0400, Tom Lane wrote:
>Oh, and as I continue to grep, I found this in dependencies.c:
>
>            dependencies = (MVDependencies *) repalloc(dependencies,
>                                                       offsetof(MVDependencies, deps)
>                                                       + dependencies->ndeps * sizeof(MVDependency));
>
>I'm pretty sure this is an actual bug: the calculation should be
>
>                       offsetof(MVDependencies, deps)
>                       + dependencies->ndeps * sizeof(MVDependency *));
>
>because deps is an array of MVDependency* not MVDependency.
>
>This would lead to an overallocation not underallocation, and it's
>probably pretty harmless because ndeps can't get too large (I hope;
>if it could, this would have O(N^2) performance problems).  Still,
>you oughta fix it.
>
>(There's a similar calculation later in the file that gets it right.)
>

I've pushed a fix correcting those issues - both for mvndistinct and
functional dependencies. I've reworked the macros used to compute the
serialized sizes not to use offsetof(), which however made them pretty
useless for other purposes. So I've made them private by moving them to
the .c files where they are moved them to the .c files.

I don't think we need to backpatch this - we're not going to add new
fields in backbranches, so there's little danger of introducing padding.
And I'm not sure we want to remove the macros, although it's unlikely
anyone else uses them.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services