Thread: Binary params in libpq
Hello, I'm thinking about adding support for PQexecParams and PQprepare in Psycopg. I've posted more details yesterday on the Psycopg mailing list <http://archives.postgresql.org/psycopg/2011-02/msg00076.php>. I have a few preliminary questions: How stable is the binary representation for the PostgreSQL types? We may just pass bytea data in binary format and pass everything else as text parameters, or pass different types too as binary, if performace would benefit. Did binary format ever changed in the past (at least since the introduction of the V3 protocol)? Is such binary representation more likely to change in the future respect to the textual one, in a way that forward/backward compatibility between server and client would be compromised? Query plans for prepared statements may be less efficient than ones for queries with bound parameters. Are query plans generated for PQexecParams calls as efficient as the ones generated for PQexec? Or do they have the same limitation of the ones generated for PQprepare? Thank you, -- Daniele
On Sun, Feb 27, 2011 at 1:13 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > Hello, > > I'm thinking about adding support for PQexecParams and PQprepare in > Psycopg. I've posted more details yesterday on the Psycopg mailing > list <http://archives.postgresql.org/psycopg/2011-02/msg00076.php>. I > have a few preliminary questions: If you are wrapping libpq, there really is no good reason not to use the parametrized interfaces exclusively...not doing that is just asking for SQL injection type problems. > How stable is the binary representation for the PostgreSQL types? We > may just pass bytea data in binary format and pass everything else as > text parameters, or pass different types too as binary, if performace > would benefit. Did binary format ever changed in the past (at least > since the introduction of the V3 protocol)? Is such binary > representation more likely to change in the future respect to the > textual one, in a way that forward/backward compatibility between > server and client would be compromised? The binary representations of types are mostly stable. On average of about one per release you might see a wire format change...it doesn't happen that often but you have to be prepared to deal with it. 'money' type is one example -- it was moved from 32 to 64 bit..there are a couple of others. Backwards compatibility is no problem but forwards compatibility is going to be problematic because you have no support from the server. In practice, it usually works ok. > Query plans for prepared statements may be less efficient than ones > for queries with bound parameters. Are query plans generated for > PQexecParams calls as efficient as the ones generated for PQexec? Or > do they have the same limitation of the ones generated for PQprepare? Query plans for PQexec is the same for PQexecParams. PQexec can be a tiny bit more efficient for simple queries but is not worth the risk unless you are sending 0 parameters. merlin
On 28/02/2011 7:48 AM, Merlin Moncure wrote: >> How stable is the binary representation for the PostgreSQL types? We >> may just pass bytea data in binary format and pass everything else as >> text parameters, or pass different types too as binary, if performace >> would benefit. Did binary format ever changed in the past (at least >> since the introduction of the V3 protocol)? AFAIK, the `timestamp' type moved from a floating-point to an integer representation internally, which would've affected the binary protocol representation. That was even a compile-time config option, so it could be different between two different Pg installs with the same version. >> Is such binary >> representation more likely to change in the future respect to the >> textual one, in a way that forward/backward compatibility between >> server and client would be compromised? History to date suggests that the binary representation certainly does change. Then again, so has the textual one at least in the case of bytea. > The binary representations of types are mostly stable. On average of > about one per release you might see a wire format change...it doesn't > happen that often but you have to be prepared to deal with it. Refusing to talk to databases newer than the version the driver was tested with *might* be a good idea, just to be safe, though some kind of user override would be needed. At least warn about it when you see a new and unfamiliar Pg version. You can't always *tell* that a new representation is incompatible, as the text-format bytea affair demonstrated. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On Sun, Feb 27, 2011 at 7:39 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 28/02/2011 7:48 AM, Merlin Moncure wrote: > >>> How stable is the binary representation for the PostgreSQL types? We >>> may just pass bytea data in binary format and pass everything else as >>> text parameters, or pass different types too as binary, if performace >>> would benefit. Did binary format ever changed in the past (at least >>> since the introduction of the V3 protocol)? > > AFAIK, the `timestamp' type moved from a floating-point to an integer > representation internally, which would've affected the binary protocol > representation. That was even a compile-time config option, so it could be > different between two different Pg installs with the same version. Actually, this has always been a compile time option on the server as far as i remember and there is protocol support for it -- libpq tells you how it has been set...you've always had to deal with this (libpqtypes, which I've mentioned several times as the correct way to 'do' binary with libpq, handles this for you). Only the default setting was changed. Timestamps are the only type afaik that are like this so they are something of a special case. There *was* a binary wire format change for timestamps some time back iirc that was unrelated to the int/float property. They are unlikely to change again unless the server's internal representation changes. Small aside about timestamps: aside from bytea they stand to benefit the most out of all the common types by moving to binary, especially if you are moving them to/from something like a C struct tm on the client. int, float, etc are hardly noticeable and barely worth the trouble. Long term, imnsho significant portions of the type library on the server should be abstracted out into a library that (optionally) the client and the server share, with the client portions wrapped by libpqtypes (which then becomes a simple sytnax extension to libpq), the not frequently mentioned but very excellent ecpg, and other drivers like php, Psycopg, etc. merlin
On Mon, Feb 28, 2011 at 2:42 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Sun, Feb 27, 2011 at 7:39 PM, Craig Ringer >> AFAIK, the `timestamp' type moved from a floating-point to an integer >> representation internally, which would've affected the binary protocol >> representation. That was even a compile-time config option, so it could be >> different between two different Pg installs with the same version. > > Actually, this has always been a compile time option on the server as > far as i remember and there is protocol support for it -- libpq tells > you how it has been set...you've always had to deal with this It should be 'integer_datetimes' setting reported by PQparameterStatus I assume. > (libpqtypes, which I've mentioned several times as the correct way to > 'do' binary with libpq, handles this for you). Only the default > setting was changed. Timestamps are the only type afaik that are like > this so they are something of a special case. There *was* a binary > wire format change for timestamps some time back iirc that was > unrelated to the int/float property. They are unlikely to change > again unless the server's internal representation changes. > > Small aside about timestamps: aside from bytea they stand to benefit > the most out of all the common types by moving to binary, especially > if you are moving them to/from something like a C struct tm on the > client. int, float, etc are hardly noticeable and barely worth the > trouble. Not only barely noticeable: I've talked with somebody who has done some performance test and has found better performance in text mode for small integers: apparently the time spent parsing the integer text format was less than the overhead to pass the 8 bits of an int64 just to represent '3'. Probably an edge case but it restates that for ints there isn't a lot to gain from the binary format. So it looks like we could assume the bin representation relatively stable, but still at each postgresql major version it should be checked if the internal representation of any data type changed. OTOH not so many data types would really benefit from being transferred as binary (from a quick scan to the data types list, the datetime types look indeed the ones where there would be the most to gain). -- Daniele
On 28/02/2011 10:42 PM, Merlin Moncure wrote: > Actually, this has always been a compile time option on the server as > far as i remember and there is protocol support for it -- libpq tells > you how it has been set...you've always had to deal with this > (libpqtypes, which I've mentioned several times as the correct way to > 'do' binary with libpq, handles this for you). Only the default > setting was changed. Ah. Thanks for clarifying that. It sounds like binary transfer is best for date, timestamp, and bytea. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On Mon, Feb 28, 2011 at 8:42 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Sun, Feb 27, 2011 at 7:39 PM, Craig Ringer >> AFAIK, the `timestamp' type moved from a floating-point to an integer >> representation internally, which would've affected the binary protocol >> representation. That was even a compile-time config option, so it could be >> different between two different Pg installs with the same version. > > Actually, this has always been a compile time option on the server as > far as i remember and there is protocol support for it -- libpq tells > you how it has been set...you've always had to deal with this I don't see any libpq calls that can, at run-time, tell you things like what format the timestamp is and what endian-ness the server is. Is there something I'm missing? The only thing I could figure out is to do something like 'select 123::int4' or select a known date and determine the nature of the server from what you get back. -K
On Tue, Mar 1, 2011 at 8:19 AM, Kelly Burkhart <kelly.burkhart@gmail.com> wrote: > On Mon, Feb 28, 2011 at 8:42 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Sun, Feb 27, 2011 at 7:39 PM, Craig Ringer >>> AFAIK, the `timestamp' type moved from a floating-point to an integer >>> representation internally, which would've affected the binary protocol >>> representation. That was even a compile-time config option, so it could be >>> different between two different Pg installs with the same version. >> >> Actually, this has always been a compile time option on the server as >> far as i remember and there is protocol support for it -- libpq tells >> you how it has been set...you've always had to deal with this > > I don't see any libpq calls that can, at run-time, tell you things > like what format the timestamp is and what endian-ness the server is. > Is there something I'm missing? The only thing I could figure out is > to do something like 'select 123::int4' or select a known date and > determine the nature of the server from what you get back. You don't need to know endian-ness: it is always in network order (big endian). As noted above, you can check datetime format with PQparameterStatus. Really, my advice to you and the countless others who seem to continually want to re-engineer this problem is to either use or crib from two libraries that have completely solved it...namely libpqtypes and ecpg. Some of the wire formats are non-trivial to convert to C native types. merlin
Kelly Burkhart <kelly.burkhart@gmail.com> writes: > On Mon, Feb 28, 2011 at 8:42 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> Actually, this has always been a compile time option on the server as >> far as i remember and there is protocol support for it -- libpq tells >> you how it has been set...you've always had to deal with this > I don't see any libpq calls that can, at run-time, tell you things > like what format the timestamp is and what endian-ness the server is. > Is there something I'm missing? The timestamp format can be determined by querying PQparameterStatus for the value of "integer_datetimes". Server endianness is irrelevant because values are always sent big-endian. regards, tom lane