Thread: About the stability of COPY BINARY data
From https://www.postgresql.org/docs/current/sql-copy.html: |> binary-format file is less portable across machine architectures and PostgreSQL versions In my experience, the binary encoding of binding/resultset/copy is endian neutral (network byte order), so what is the less portable across machine architectures that warning about? Also, does the code for per-type _send() and _recv() functions really change across versions of PostgreSQL? How common are instances of such changes across versions? Any examples of such backward-incompatible changes, in the past? The binary data contains OIDs, but if sticking to built-in types, which OIDs are unlikely to change across versions? I'm obviously storing COPY BINARY data (we have lots of bytea columns), and I wonder how bad it is long term, and across PostgreSQL versions. Thanks for any insights, --DD
On 11/6/24 08:20, Dominique Devienne wrote: >>From https://www.postgresql.org/docs/current/sql-copy.html: > |> binary-format file is less portable across machine architectures > and PostgreSQL versions > > In my experience, the binary encoding of binding/resultset/copy is > endian neutral (network byte order), so what is the less portable > across machine architectures that warning about? > > Also, does the code for per-type _send() and _recv() functions really change > across versions of PostgreSQL? How common are instances of such > changes across versions? Any examples of such backward-incompatible > changes, in the past? > > The binary data contains OIDs, but if sticking to built-in types, > which OIDs are unlikely to change across versions? > > I'm obviously storing COPY BINARY data (we have lots of bytea > columns), and I wonder how bad it is long term, and across PostgreSQL > versions. If I where to hazard a guess this plays a part: https://www.postgresql.org/docs/current/sql-copy.html "To determine the appropriate binary format for the actual tuple data you should consult the PostgreSQL source, in particular the *send and *recv functions for each column's data type (typically these functions are found in the src/backend/utils/adt/ directory of the source distribution)." > > Thanks for any insights, --DD > > -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Nov 7, 2024 at 5:37 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > On 11/6/24 08:20, Dominique Devienne wrote: > >>From https://www.postgresql.org/docs/current/sql-copy.html: > > |> binary-format file is less portable across machine architectures > > and PostgreSQL versions > > > > In my experience, the binary encoding of binding/resultset/copy is > > endian neutral (network byte order), so what is the less portable > > across machine architectures that warning about? > > > > Also, does the code for per-type _send() and _recv() functions really change > > across versions of PostgreSQL? How common are instances of such > > changes across versions? Any examples of such backward-incompatible > > changes, in the past? > > > > The binary data contains OIDs, but if sticking to built-in types, > > which OIDs are unlikely to change across versions? > > > > I'm obviously storing COPY BINARY data (we have lots of bytea > > columns), and I wonder how bad it is long term, and across PostgreSQL > > versions. > > If I where to hazard a guess this plays a part: > > https://www.postgresql.org/docs/current/sql-copy.html > > "To determine the appropriate binary format for the actual tuple data > you should consult the PostgreSQL source, in particular the *send and > *recv functions for each column's data type (typically these functions > are found in the src/backend/utils/adt/ directory of the source > distribution)." Hi Adrian. Well, sure. The questions above are whether those type-specific formats are: 1) architecture dependent. (that's not my experience). 2) change across PostgreSQL versions. Not what the actual formats are. --DD PS: I'm surprised I didn't get answers. Seems to me to doc is overly "careful" about COPY BINARY's stability, thus my asking for confirmation here.
Dominique Devienne wrote: > Also, does the code for per-type _send() and _recv() functions > really change across versions of PostgreSQL? How common are > instances of such changes across versions? Any examples of such > backward-incompatible changes, in the past? For the timestamp types, I think these functions were sending/expecting float8 (before version 7.3), and then float8 or int64 depending on the server configuration up until 9.6, and since then int64 only. The same for the "time" field of the interval type. There is still an "integer_datetimes" GUC reflecting this. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <daniel@manitou-mail.org> wrote: > Dominique Devienne wrote: > > Also, does the code for per-type _send() and _recv() functions > > really change across versions of PostgreSQL? How common are > > instances of such changes across versions? Any examples of such > > backward-incompatible changes, in the past? > > For the timestamp types, I think these functions were > sending/expecting float8 (before version 7.3), and then float8 or > int64 depending on the server configuration up until 9.6, and since > then int64 only. > The same for the "time" field of the interval type. > There is still an "integer_datetimes" GUC reflecting this. Thanks. So it did happen in a distant past. Anything below 14 is of no concern to me though. So again, it does sound like changes are unlikely. And I haven't seen anything not network-byte-order, as far architecture is concerned.
On 11/7/24 09:55, Dominique Devienne wrote: > On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <daniel@manitou-mail.org> wrote: >> Dominique Devienne wrote: >>> Also, does the code for per-type _send() and _recv() functions >>> really change across versions of PostgreSQL? How common are >>> instances of such changes across versions? Any examples of such >>> backward-incompatible changes, in the past? >> >> For the timestamp types, I think these functions were >> sending/expecting float8 (before version 7.3), and then float8 or >> int64 depending on the server configuration up until 9.6, and since >> then int64 only. >> The same for the "time" field of the interval type. >> There is still an "integer_datetimes" GUC reflecting this. > > Thanks. So it did happen in a distant past. > Anything below 14 is of no concern to me though. > So again, it does sound like changes are unlikely. Yeah that is implied by: https://www.postgresql.org/docs/current/pgupgrade.html "Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. " The COPY warning is there as heads up that it is a possibility. > > And I haven't seen anything not network-byte-order, > as far architecture is concerned. > > -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Nov 7, 2024 at 7:04 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 11/7/24 09:55, Dominique Devienne wrote: > > On Thu, Nov 7, 2024 at 6:39 PM Daniel Verite <daniel@manitou-mail.org> wrote: > >> Dominique Devienne wrote: > >>> Also, does the code for per-type _send() and _recv() functions > >>> really change across versions of PostgreSQL? How common are > >>> instances of such changes across versions? Any examples of such > >>> backward-incompatible changes, in the past? > >> > >> For the timestamp types, I think these functions were > >> sending/expecting float8 (before version 7.3), and then float8 or > >> int64 depending on the server configuration up until 9.6, and since > >> then int64 only. > >> The same for the "time" field of the interval type. > >> There is still an "integer_datetimes" GUC reflecting this. > > > > Thanks. So it did happen in a distant past. > > Anything below 14 is of no concern to me though. > > So again, it does sound like changes are unlikely. > > Yeah that is implied by: > > https://www.postgresql.org/docs/current/pgupgrade.html > > "Major PostgreSQL releases regularly add new features that often change > the layout of the system tables, but the internal data storage format > rarely changes. " > > The COPY warning is there as heads up that it is a possibility. > > > > And I haven't seen anything not network-byte-order, > > as far architecture is concerned. But the COPY BINARY format and "the internal data storage format" are two separate things Adrian, AFAIK. Using binds and result sets in binary mode is part of the protocol in a way, and not an internal implementation detail, like internal format for tables on disk. I'm sure I'm not the only one that using binary mode for PostgreSQL, and any change in _send and _recv functions across versions, or have their results be architecture dependent, would break many client codes, if it happened. And COPY BINARY's outer format is also public and documented, so can't change either. That's why I'm insisting on that phrase in the documentation, which gives the wrong impressions IMHO. Shouldn't it be removed or amended? Thanks, --DD