Thread: About the stability of COPY BINARY data

About the stability of COPY BINARY data

From
Dominique Devienne
Date:
From https://www.postgresql.org/docs/current/sql-copy.html:
|> binary-format file is less portable across machine architectures
and PostgreSQL versions

In my experience, the binary encoding of binding/resultset/copy is
endian neutral (network byte order), so what is the less portable
across machine architectures that warning about?

Also, does the code for per-type _send() and _recv() functions really change
across versions of PostgreSQL? How common are instances of such
changes across versions? Any examples of such backward-incompatible
changes, in the past?

The binary data contains OIDs, but if sticking to built-in types,
which OIDs are unlikely to change across versions?

I'm obviously storing COPY BINARY data (we have lots of bytea
columns), and I wonder how bad it is long term, and across PostgreSQL
versions.

Thanks for any insights, --DD



Re: About the stability of COPY BINARY data

From
Adrian Klaver
Date:
On 11/6/24 08:20, Dominique Devienne wrote:
>>From https://www.postgresql.org/docs/current/sql-copy.html:
> |> binary-format file is less portable across machine architectures
> and PostgreSQL versions
> 
> In my experience, the binary encoding of binding/resultset/copy is
> endian neutral (network byte order), so what is the less portable
> across machine architectures that warning about?
> 
> Also, does the code for per-type _send() and _recv() functions really change
> across versions of PostgreSQL? How common are instances of such
> changes across versions? Any examples of such backward-incompatible
> changes, in the past?
> 
> The binary data contains OIDs, but if sticking to built-in types,
> which OIDs are unlikely to change across versions?
> 
> I'm obviously storing COPY BINARY data (we have lots of bytea
> columns), and I wonder how bad it is long term, and across PostgreSQL
> versions.

If I where to hazard a guess this plays a part:

https://www.postgresql.org/docs/current/sql-copy.html

"To determine the appropriate binary format for the actual tuple data 
you should consult the PostgreSQL source, in particular the *send and 
*recv functions for each column's data type (typically these functions 
are found in the src/backend/utils/adt/ directory of the source 
distribution)."

> 
> Thanks for any insights, --DD
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: About the stability of COPY BINARY data

From
Dominique Devienne
Date:
On Thu, Nov 7, 2024 at 5:37 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 11/6/24 08:20, Dominique Devienne wrote:
> >>From https://www.postgresql.org/docs/current/sql-copy.html:
> > |> binary-format file is less portable across machine architectures
> > and PostgreSQL versions
> >
> > In my experience, the binary encoding of binding/resultset/copy is
> > endian neutral (network byte order), so what is the less portable
> > across machine architectures that warning about?
> >
> > Also, does the code for per-type _send() and _recv() functions really change
> > across versions of PostgreSQL? How common are instances of such
> > changes across versions? Any examples of such backward-incompatible
> > changes, in the past?
> >
> > The binary data contains OIDs, but if sticking to built-in types,
> > which OIDs are unlikely to change across versions?
> >
> > I'm obviously storing COPY BINARY data (we have lots of bytea
> > columns), and I wonder how bad it is long term, and across PostgreSQL
> > versions.
>
> If I where to hazard a guess this plays a part:
>
> https://www.postgresql.org/docs/current/sql-copy.html
>
> "To determine the appropriate binary format for the actual tuple data
> you should consult the PostgreSQL source, in particular the *send and
> *recv functions for each column's data type (typically these functions
> are found in the src/backend/utils/adt/ directory of the source
> distribution)."

Hi Adrian.

Well, sure. The questions above are whether those type-specific formats are:
1) architecture dependent. (that's not my experience).
2) change across PostgreSQL versions.

Not what the actual formats are. --DD

PS: I'm surprised I didn't get answers. Seems to me to doc is overly "careful"
    about COPY BINARY's stability, thus my asking for confirmation here.



Re: About the stability of COPY BINARY data

From
"Daniel Verite"
Date:
    Dominique Devienne wrote:

> Also, does the code for per-type _send() and _recv() functions
> really change across versions of PostgreSQL? How common are
> instances of such changes across versions? Any examples of such
> backward-incompatible changes, in the past?

For the timestamp types, I think these functions were
sending/expecting float8 (before version 7.3), and then float8 or
int64 depending on the server configuration up until 9.6, and since
then int64 only.
The same for the "time" field of the interval type.
There is still an "integer_datetimes" GUC reflecting this.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



Re: About the stability of COPY BINARY data

From
Dominique Devienne
Date:
On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <daniel@manitou-mail.org> wrote:
>         Dominique Devienne wrote:
> > Also, does the code for per-type _send() and _recv() functions
> > really change across versions of PostgreSQL? How common are
> > instances of such changes across versions? Any examples of such
> > backward-incompatible changes, in the past?
>
> For the timestamp types, I think these functions were
> sending/expecting float8 (before version 7.3), and then float8 or
> int64 depending on the server configuration up until 9.6, and since
> then int64 only.
> The same for the "time" field of the interval type.
> There is still an "integer_datetimes" GUC reflecting this.

Thanks. So it did happen in a distant past.
Anything below 14 is of no concern to me though.
So again, it does sound like changes are unlikely.

And I haven't seen anything not network-byte-order,
as far architecture is concerned.



Re: About the stability of COPY BINARY data

From
Adrian Klaver
Date:
On 11/7/24 09:55, Dominique Devienne wrote:
> On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <daniel@manitou-mail.org> wrote:
>>          Dominique Devienne wrote:
>>> Also, does the code for per-type _send() and _recv() functions
>>> really change across versions of PostgreSQL? How common are
>>> instances of such changes across versions? Any examples of such
>>> backward-incompatible changes, in the past?
>>
>> For the timestamp types, I think these functions were
>> sending/expecting float8 (before version 7.3), and then float8 or
>> int64 depending on the server configuration up until 9.6, and since
>> then int64 only.
>> The same for the "time" field of the interval type.
>> There is still an "integer_datetimes" GUC reflecting this.
> 
> Thanks. So it did happen in a distant past.
> Anything below 14 is of no concern to me though.
> So again, it does sound like changes are unlikely.

Yeah that is implied by:

https://www.postgresql.org/docs/current/pgupgrade.html

"Major PostgreSQL releases regularly add new features that often change 
the layout of the system tables, but the internal data storage format 
rarely changes. "

The COPY warning is there as heads up that it is a possibility.

> 
> And I haven't seen anything not network-byte-order,
> as far architecture is concerned.
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: About the stability of COPY BINARY data

From
Dominique Devienne
Date:
On Thu, Nov 7, 2024 at 7:04 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 11/7/24 09:55, Dominique Devienne wrote:
> > On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <daniel@manitou-mail.org> wrote:
> >>          Dominique Devienne wrote:
> >>> Also, does the code for per-type _send() and _recv() functions
> >>> really change across versions of PostgreSQL? How common are
> >>> instances of such changes across versions? Any examples of such
> >>> backward-incompatible changes, in the past?
> >>
> >> For the timestamp types, I think these functions were
> >> sending/expecting float8 (before version 7.3), and then float8 or
> >> int64 depending on the server configuration up until 9.6, and since
> >> then int64 only.
> >> The same for the "time" field of the interval type.
> >> There is still an "integer_datetimes" GUC reflecting this.
> >
> > Thanks. So it did happen in a distant past.
> > Anything below 14 is of no concern to me though.
> > So again, it does sound like changes are unlikely.
>
> Yeah that is implied by:
>
> https://www.postgresql.org/docs/current/pgupgrade.html
>
> "Major PostgreSQL releases regularly add new features that often change
> the layout of the system tables, but the internal data storage format
> rarely changes. "
>
> The COPY warning is there as heads up that it is a possibility.
> >
> > And I haven't seen anything not network-byte-order,
> > as far architecture is concerned.

But the COPY BINARY format and "the internal data storage format"
are two separate things Adrian, AFAIK.

Using binds and result sets in binary mode is part of the protocol in a way,
and not an internal implementation detail, like internal format for
tables on disk.

I'm sure I'm not the only one that using binary mode for PostgreSQL,
and any change in _send and _recv functions across versions, or have their
results be architecture dependent, would break many client codes, if
it happened.
And COPY BINARY's outer format is also public and documented, so can't
change either.

That's why I'm insisting on that phrase in the documentation, which
gives the wrong impressions IMHO.
Shouldn't it be removed or amended? Thanks, --DD