Re: mcvstats serialization code is still shy of a load - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: mcvstats serialization code is still shy of a load
Date
Msg-id 20190626222918.72fprjvs6fvmjkup@development
Whole thread Raw
In response to Re: mcvstats serialization code is still shy of a load  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: mcvstats serialization code is still shy of a load
List pgsql-hackers
On Wed, Jun 26, 2019 at 12:31:13PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>>> #define SizeOfMCVList(ndims,nitems)    \
>>> is both woefully underdocumented and completely at variance with
>>> reality.  It doesn't seem to be accounting for the actual data values.
>
>> I agree about the macro being underdocumented, but AFAICS it's used
>> correctly to check the expected length. It can't include the data values
>> directly, because that's variable amount of data - and it's encoded in not
>> yet verified part of the data.
>
>Well, it should have some other name then.  Or *at least* a comment.
>It's unbelievably misleading as it stands.
>

True.

>> That being said, maybe this is unnecessarily defensive and we should just
>> trust the values not being corrupted.
>
>No, I'm on board with checking the lengths.  I just don't like how
>hard it is to discern what's being checked.
>

Understood.

>>> I think that part of the problem here is that the way this code is
>>> written, "maxaligned" is no such thing.  What you're actually maxaligning
>>> seems to be the offset from the start of the data area of a varlena value,
>
>> I don't think so. The pointers should be maxaligned with respect to the
>> whole varlena value, which is what 'raw' points to.
>
>[ squint ... ]  OK, I think I misread this:
>
>statext_mcv_deserialize(bytea *data)
>{
>...
>    /* pointer to the data part (skip the varlena header) */
>    ptr = VARDATA_ANY(data);
>    raw = (char *) data;
>
>I think this is confusing in itself --- I read it as "raw = (char *) ptr"
>and I think most other people would assume that too based on the order
>of operations.  It'd read better as
>
>    /* remember start of datum for maxalign reference */
>    raw = (char *) data;
>
>    /* pointer to the data part (skip the varlena header) */
>    ptr = VARDATA_ANY(data);
>

Yeah, that'd have been better.

>Another problem with this code is that it flat doesn't work for
>non-4-byte-header varlenas: it'd do the alignment differently than the
>serialization side did.  That's okay given that the two extant call sites
>are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
>completely misleading appearance of being ready to deal with short-header
>varlenas, and heaven forbid there should be any comment to discourage
>future coders from trying.  So really what I'd like to see here is
>
>    /* remember start of datum for maxalign reference */
>    raw = (char *) data;
>
>    /* alignment logic assumes full-size datum header */
>    Assert(VARATT_IS_4B(data));
>
>    /* pointer to the data part (skip the varlena header) */
>    ptr = VARDATA_ANY(data);
>
>Or, of course, this could all go away if we got rid of the
>bogus maxaligning...
>

OK. Attached is a patch ditching the alignment in serialized data. I've
ditched the macros to access parts of serialized data, and everything
gets copied.

The main complication is with varlena values, which may or may not have
4B headers (for now there's the PG_DETOAST_DATUM call, but as you
mentioned we may want to remove it in the future). So I've stored the
length as uint32 separately, followed by the full varlena value (thanks
to that the deserialization is simpler). Not sure if that's the best
solution, though, because this way we store the length twice.

I've kept the alignment in the deserialization code, because there it
allows us to allocate the whole value as a single chunk, which I think
is useful (I admit I don't have any measurements to demonstrate that).
But if we decide to rework this later, we can - it's just in-memory
representation, not on-disk.

Is this roughly what you had in mind?

FWIW I'm sure some of the comments are stale and/or need clarification,
but it's a bit too late over here, so I'll look into that tomorrow.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Dent John
Date:
Subject: Re: [PATCH] src/test/modules/dummy_index -- way to test reloptionsfrom inside of access method
Next
From: Andrew Gierth
Date:
Subject: Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)