Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1JGDTzRt-ORya0C-2G_hw67eZc0BR-9Vkr6NfJg9YVrww@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Sep 24, 2015 at 8:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Sep 3, 2015 at 6:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > [ new patches ]
>
> Still more review comments:
>
> +                               /* Allow space for terminating zero-byte */
> +                               size = add_size(size, 1);
>
> This is pointless.  The length is already stored separately, and if it
> weren't, this wouldn't be adequate anyway because a varlena can
> contain NUL bytes.  It won't if it's text, but it could be bytea or
> numeric or whatever.
>

Agreed and I think we can do without that as well.

> RestoreBoundParams is broken, because it can do unaligned reads, which
> will core dump on some architectures (and merely be slow on others).
> If there are two or more parameters, and the first one is a varlena
> with a length that is not a multiple of MAXIMUM_ALIGNOF, the second
> SerializedParamExternData will be misaligned.
>

Agreed, but can't we fix that problem even in the current patch by using
*ALIGN (TYPEALIGN) macro?

> Also, it's pretty lame that we send the useless pointer even for a
> pass-by-reference data type and then overwrite the bad pointer with a
> good one a few lines later.  It would be better to design the
> serialization format so that we don't send the bogus pointer over the
> wire in the first place.
>
> It's also problematic in my view that there is so much duplicated code
> here. SerializedParamExternData and SerializedParamExecData are very
> similar and there are large swaths of very similar code to handle each
> case.  Both structures contain Datum value, Size length, bool isnull,
> and Oid ptype, albeit not in the same order for some reason.  The only
> difference is that SerializedParamExternData contains uint16 pflags
> and SerializedParamExecData contains int paramid.  I think we need to
> refactor this code to get rid of all this duplication.

The reason of having different structures is that both the structures are
derived from their non-serialized versions (ParamExternData and
ParamExecData).  I agree that we can avoid code duplication incase of
serialization and restoration of both structures, however doing it with
structure compatible to non-serialized version seems to have better code
readability and easier to enhance (think if we need to add a new parameter
in non-serialized version).  This can somewhat vary from each individual's
perspective and I won't argue much here if you say that having native format
as you are suggesting is better in terms of code readability and maintenance.


>  I suggest that
> we decide to represent a datum here in a uniform fashion, perhaps like
> this:
>
> First, store a 4-byte header word.  If this is -2, the value is NULL
> and no data follows.  If it's -1, the value is pass-by-value and
> sizeof(Datum) bytes follow.  If it's >0, the value is
> pass-by-reference and the value gives the number of following bytes
> that should be copied into a brand-new palloc'd chunk.
>
> Using a format like this, we can serialize and restore datums in
> various contexts - including bind and exec params - without having to
> rewrite the code each time.  For example, for param extern data, you
> can dump an array of all the ptypes and paramids and then follow it
> with all of the params one after another.  For param exec data, you
> can dump an array of all the ptypes and paramids and then follow it
> with the values one after another.  The code that reads and writes the
> datums in both cases can be the same.  If we need to send datums in
> other contexts, we can use the same code for it.
>
> The attached patch - which I even tested! - shows what I have in mind.
> It can save and restore the ParamListInfo (bind parameters).  I
> haven't tried to adapt it to the exec parameters because I don't quite
> understand what you are doing there yet, but you can see that the
> datum-serialization logic is separated from the stuff that knows about
> the details of ParamListInfo, so datumSerialize() should be reusable
> for other purposes.

I think we can adapt this for Param exec data parameters.  I can take
care of integrating this with funnel patch and changing the param exec data
parameters related code if we decide to go via this way.

>  This also doesn't have the other problems
> mentioned above.
>

I have a question here which is why this format doesn't have a similar problem
as the current version, basically in current patch the second read of
SerializedParamExternData can be misaligned and for same reason in your
patch the second read of Oid could by misaligned?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Nikolay Shaplov
Date:
Subject: Re: pageinspect patch, for showing tuple data
Next
From: Michael Paquier
Date:
Subject: Re: pageinspect patch, for showing tuple data