Re: mcvstats serialization code is still shy of a load - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: mcvstats serialization code is still shy of a load
Date
Msg-id 20190626160808.3icyhgv5c6pusdp3@development
Whole thread Raw
In response to Re: mcvstats serialization code is still shy of a load  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: mcvstats serialization code is still shy of a load
List pgsql-hackers
On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> Attached is a patch that should (hopefully) fix this. It essentially
>> treats the item as (char *) and does all pointer arithmetics without any
>> additional casts. So there are no intermediate casts.
>
>This passes the eyeball test, and it also allows my OpenBSD/hppa
>installation to get through the core regression tests, so I think
>it's good as far as it goes.  Please push.
>
>However ... nosing around in mcv.c, I noticed that the next macro:
>
>/*
> * Used to compute size of serialized MCV list representation.
> */
>#define MinSizeOfMCVList        \
>    (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))
>
>#define SizeOfMCVList(ndims,nitems)    \
>    (MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
>     MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
>     MAXALIGN((nitems) * ITEM_SIZE(ndims)))
>
>is both woefully underdocumented and completely at variance with
>reality.  It doesn't seem to be accounting for the actual data values.
>No doubt this is why it's not used in the places where it'd matter;
>the tests that do use it are testing much weaker conditions than they
>should.
>

I agree about the macro being underdocumented, but AFAICS it's used
correctly to check the expected length. It can't include the data values
directly, because that's variable amount of data - and it's encoded in not
yet verified part of the data.

So this only includes parts with known lengths, and then the code does
this:

    for (dim = 0; dim < ndims; dim++)
    {
        ...
        expected_size += MAXALIGN(info[dim].nbytes);
    }

and uses that to check the actual length.

    if (VARSIZE_ANY(data) != expected_size)
        elog(ERROR, ...);

That being said, maybe this is unnecessarily defensive and we should just
trust the values not being corrupted. So if we get pg_mcv_list value, we'd
simply assume it's OK.

>> The fix keeps the binary format as is, so the serialized MCV items are
>> max-aligned. That means we can access the uint16 indexes directly, but we
>> need to copy the rest of the fields (because those may not be aligned). In
>> hindsight that seems a bit silly, we might as well copy everything, not
>> care about the alignment and maybe save a few more bytes.
>
>I think that part of the problem here is that the way this code is
>written, "maxaligned" is no such thing.  What you're actually maxaligning
>seems to be the offset from the start of the data area of a varlena value,
>which is generally going to be a maxaligned palloc result plus 4 bytes.
>So "aligned" double values are actually guaranteed to be on odd word
>boundaries not even ones.
>

I don't think so. The pointers should be maxaligned with respect to the
whole varlena value, which is what 'raw' points to. At least that was the
intent of code like this:

    raw = palloc0(total_length);

    ...

    /* the header may not be exactly aligned, so make sure it is */
    ptr = raw + MAXALIGN(ptr - raw);

If it's not like that in some place, it's a bug.

>What's more, it's difficult to convince oneself that the maxaligns done
>in different parts of the code are all enforcing the same choices about
>which substructures get pseudo-maxaligned and which don't, because the
>logic doesn't line up very well.
>

Not sure. If there's a way to make it clearer, I'm ready to do the work.
Unfortunately it's hard for me to judge that, because I've spent so much
time on that code that it seems fairly clear to me.

>If we do need another catversion bump before v12, I'd vote for ripping
>out Every Single One of the "maxalign" operations in this code, just
>on the grounds of code simplicity and bug reduction.
>

Hmmm, OK. The original reason to keep the parts aligned was to be able to
reference the parts directly during processing. If we get rid of the
alignment, we'll have to memcpy everything during deserialization. But
if it makes the code simpler, it might be worth it - this part of the
code was clearly the weakest part of the patch.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: pguser
Date:
Subject: Re: Index Skip Scan - attempting to evalutate patch
Next
From: Tom Lane
Date:
Subject: Re: mcvstats serialization code is still shy of a load