Re: Verifying embedded oids in *recv is a bad idea - Mailing list pgsql-hackers

From Jelte Fennema
Subject Re: Verifying embedded oids in *recv is a bad idea
Date
Msg-id CAGECzQQ=JRMptKFqvMpEtKQfK5fM4PK6iORJPAs32grjBcoAqg@mail.gmail.com
Whole thread Raw
In response to Re: Verifying embedded oids in *recv is a bad idea  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
So I ran into this as well and I think this should actually be considered a bug, since COPY ... BINARY does not work between nodes now.
Which is why I have posted it to the pgsql-bugs list with some reproducible steps to trigger the bug: https://www.postgresql.org/message-id/16485-f7b2dddca52ef2ae%40postgresql.org

Like Andres said, the oids are not checked for the text protocol versions either. And like  Kyotaro says, if the type that has a different OID would actually have a different binary encoding, parsing will simply fail at that point.
I don't think there's any practical reason to keep these checks. They only get in the way of using the more efficient binary encoding for anything other than simple types.
The only case where these *_recv functions would currently work is if you COPY ... BINARY to the exact same postgres instance (when not dropping and recreating types). But in that case there's little reason to use COPY at all.

On Tue, 9 Jun 2020 at 18:33, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,

At Mon, 25 Apr 2016 17:17:13 -0700, Andres Freund <andres@anarazel.de> wrote in <20160426001713.hbqdiwvf4mkzkg55@alap3.anarazel.de>
> Hi,
>
> for performance reasons it's a good idea to use the binary protocol. But
> doing so between two postgres installations is made unnecessarily hard
> by the choice of embedding and verifying oids in composite and array
> types.  When using extensions, even commonly used ones like hstore, the
> datatype oids will often not be the same between systems, even when
> using exactly the same version on the same OS.
>
> Specifically I'm talking about
>
> Datum
> array_recv(PG_FUNCTION_ARGS)
> {
> ...
>       element_type = pq_getmsgint(buf, sizeof(Oid));
>       if (element_type != spec_element_type)
>       {
>               /* XXX Can we allow taking the input element type in any cases? */
>               ereport(ERROR,
>                               (errcode(ERRCODE_DATATYPE_MISMATCH),
>                                errmsg("wrong element type")));
>       }
> ...
> }
>
> and
>
> Datum
> record_recv(PG_FUNCTION_ARGS)
> {
> ...
>       /* Process each column */
>       for (i = 0; i < ncolumns; i++)
> ...
>               /* Verify column datatype */
>               coltypoid = pq_getmsgint(buf, sizeof(Oid));
>               if (coltypoid != column_type)
>                       ereport(ERROR,
>                                       (errcode(ERRCODE_DATATYPE_MISMATCH),
>                                        errmsg("wrong data type: %u, expected %u",
>                                                       coltypoid, column_type)));
> ...
> }
>
>
> given that we're giving up quite some speed and adding complexity to
> make send/recv functions portable, this seems like a shame.
>
> I'm failing to see what these checks are buying us? I mean the text
> representation of a composite doesn't include type information about
> contained columns either, I don't see why the binary version has to?
>
> I'm basically thinking that we should remove the checks on the receiving
> side, but leave the sending of oids in place for backward compat.

FWIW, I think that typsend/typreceive are intended for the use by
COPY FROM/TO and the doc says that the following.

http://www.postgresql.org/docs/devel/static/sql-copy.html

> Binary Format
>
> The binary format option causes all data to be stored/read as
> binary format rather than as text. It is somewhat faster than the
> text and CSV formats, but a binary-format file is less portable
> across machine architectures and PostgreSQL versions. Also, the
> binary format is very data type specific; for example it will not
> work to output binary data from a smallint column and read it
> into an integer column, even though that would work fine in text
> format.

I haven't considered much about the usefulness of this
restriction, but receiving into an array of a different type
easily leads to a crash.

# However, false matching of type oids also would do so.

I suppose that we should reconsider here if removing the checks.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Inlining of couple of functions in pl_exec.c improves performance
Next
From: Andres Freund
Date:
Subject: Re: Physical replication slot advance is not persistent