Binary Cursors, and the COPY command - Mailing list pgsql-hackers
From | pgsql@mohawksoft.com |
---|---|
Subject | Binary Cursors, and the COPY command |
Date | |
Msg-id | 10519.64.119.142.34.1090863707.squirrel@mail.mohawksoft.com Whole thread Raw |
Responses |
Re: Binary Cursors, and the COPY command
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Binary Cursors, and the COPY command (Oliver Jowett <oliver@opencloud.com>) |
List | pgsql-hackers |
OK, I wrote a utility for 7.3 that takes the output of a select command in a Binary cursor and creates a binary "COPY" file. The premise of the utility is to take the results of two or more selects from external databases and create a single unified table. Here are the issues: In 7.3, COPY BINARY was machine specific and so was the output of a binary cursor. Everything just worked fine. In 7.4, COPY BINARY uses "network byte order," i.e. native data types are altered to big endian if nessisary. The documentation for binary cursors does not specify whether or not the "binary" data is native or "network byte order." I have a few issues with "COPY BINARY" using "network byte order," first, it is pointless. The problem it intends to solve, i.e. transfering across different machine types is already answered using the tried and true ascii method. Second, it actually makes the COPY functionality less usable. You can not create the data outside of the database because all the data type definitions and manipulation functions are inside the database. (Unless you only use simple data types, of course.) Third, if a binary cursor does not encode the binary data as "network byte order" a binary copy can ONLY communicate between two postgreSQL databases because the information required to go from native ordering to network ordering is only in the database. Lastly, the vast majority of machines in use today are intel. Meaning that they are small endian. Except in a very rare circumstance, two machines that would normally be able to communicate in native byte order, will ALWAYS have to convert data. The only use case network byte order fixes is a BINARY COPY between different machine types, but in doing that, it forces anyone trying to add value to postgresql or create a utility that uses COPY to reimplement all the data type handlers outside of the database, even if they never need to interpret or inspect the values, because they have to do this to put them in network byte order. I would say that the history of the word "BINARY" would tend more to indicate incompatible machine specific data. I would submit that the 7.4 format of data, i.e. one data size int32 instead of an int16 followed by the optional int32 is cleaner, but I would remove the "network byte order" and put the byte order int32 back in the header for 7.5
pgsql-hackers by date: