Thread: Binary params in libpq

Binary params in libpq

From
Daniele Varrazzo
Date:
Hello,

I'm thinking about adding support for PQexecParams and PQprepare in
Psycopg. I've posted more details yesterday on the Psycopg mailing
list <http://archives.postgresql.org/psycopg/2011-02/msg00076.php>. I
have a few preliminary questions:

How stable is the binary representation for the PostgreSQL types? We
may just pass bytea data in binary format and pass everything else as
text parameters, or pass different types too as binary, if performace
would benefit. Did binary format ever changed in the past (at least
since the introduction of the V3 protocol)? Is such binary
representation more likely to change in the future respect to the
textual one, in a way that forward/backward compatibility between
server and client would be compromised?

Query plans for prepared statements may be less efficient than ones
for queries with bound parameters. Are query plans generated for
PQexecParams calls as efficient as the ones generated for PQexec? Or
do they have the same limitation of the ones generated for PQprepare?

Thank you,

-- Daniele

Re: Binary params in libpq

From
Merlin Moncure
Date:
On Sun, Feb 27, 2011 at 1:13 PM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
> Hello,
>
> I'm thinking about adding support for PQexecParams and PQprepare in
> Psycopg. I've posted more details yesterday on the Psycopg mailing
> list <http://archives.postgresql.org/psycopg/2011-02/msg00076.php>. I
> have a few preliminary questions:

If you are wrapping libpq, there really is no good reason not to use
the parametrized interfaces exclusively...not doing that is just
asking for SQL injection type problems.

> How stable is the binary representation for the PostgreSQL types? We
> may just pass bytea data in binary format and pass everything else as
> text parameters, or pass different types too as binary, if performace
> would benefit. Did binary format ever changed in the past (at least
> since the introduction of the V3 protocol)? Is such binary
> representation more likely to change in the future respect to the
> textual one, in a way that forward/backward compatibility between
> server and client would be compromised?

The binary representations of types are mostly stable.  On average of
about one per release you might see a wire format change...it doesn't
happen that often but you have to be prepared to deal with it.
'money' type is one example -- it was moved from 32 to 64 bit..there
are a couple of others.  Backwards compatibility is no problem but
forwards compatibility is going to be problematic because you have no
support from the server.  In practice, it usually works ok.

> Query plans for prepared statements may be less efficient than ones
> for queries with bound parameters. Are query plans generated for
> PQexecParams calls as efficient as the ones generated for PQexec? Or
> do they have the same limitation of the ones generated for PQprepare?

Query plans for PQexec is the same for PQexecParams.  PQexec can be a
tiny bit more efficient for simple queries but is not worth the risk
unless you are sending 0 parameters.

merlin

Re: Binary params in libpq

From
Craig Ringer
Date:
On 28/02/2011 7:48 AM, Merlin Moncure wrote:

>> How stable is the binary representation for the PostgreSQL types? We
>> may just pass bytea data in binary format and pass everything else as
>> text parameters, or pass different types too as binary, if performace
>> would benefit. Did binary format ever changed in the past (at least
>> since the introduction of the V3 protocol)?

AFAIK, the `timestamp' type moved from a floating-point to an integer
representation internally, which would've affected the binary protocol
representation. That was even a compile-time config option, so it could
be different between two different Pg installs with the same version.

>> Is such binary
>> representation more likely to change in the future respect to the
>> textual one, in a way that forward/backward compatibility between
>> server and client would be compromised?

History to date suggests that the binary representation certainly does
change. Then again, so has the textual one at least in the case of bytea.

> The binary representations of types are mostly stable.  On average of
> about one per release you might see a wire format change...it doesn't
> happen that often but you have to be prepared to deal with it.

Refusing to talk to databases newer than the version the driver was
tested with *might* be a good idea, just to be safe, though some kind of
user override would be needed. At least warn about it when you see a new
and unfamiliar Pg version. You can't always *tell* that a new
representation is incompatible, as the text-format bytea affair
demonstrated.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Binary params in libpq

From
Merlin Moncure
Date:
On Sun, Feb 27, 2011 at 7:39 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 28/02/2011 7:48 AM, Merlin Moncure wrote:
>
>>> How stable is the binary representation for the PostgreSQL types? We
>>> may just pass bytea data in binary format and pass everything else as
>>> text parameters, or pass different types too as binary, if performace
>>> would benefit. Did binary format ever changed in the past (at least
>>> since the introduction of the V3 protocol)?
>
> AFAIK, the `timestamp' type moved from a floating-point to an integer
> representation internally, which would've affected the binary protocol
> representation. That was even a compile-time config option, so it could be
> different between two different Pg installs with the same version.

Actually, this has always been a compile time option on the server as
far as i remember and there is protocol support for it -- libpq tells
you how it has been set...you've always had to deal with this
(libpqtypes, which I've mentioned several times as the correct way to
'do' binary with libpq, handles this for you).  Only the default
setting was changed.  Timestamps are the only type afaik that are like
this so they are something of a special case.  There *was* a binary
wire format change for timestamps some time back iirc that was
unrelated to the int/float property.  They are unlikely to change
again unless the server's internal representation changes.

Small aside about timestamps: aside from bytea they stand to benefit
the most out of all the common types by moving to binary, especially
if you are moving them to/from something like a C struct tm on the
client.  int, float, etc are hardly noticeable and barely worth the
trouble.

Long term, imnsho significant portions of the type library on the
server should be abstracted out into a library that (optionally) the
client and the server share, with the client portions wrapped by
libpqtypes (which then becomes a simple sytnax extension to libpq),
the not frequently mentioned but very excellent ecpg, and other
drivers like php, Psycopg, etc.

merlin

Re: Binary params in libpq

From
Daniele Varrazzo
Date:
On Mon, Feb 28, 2011 at 2:42 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Sun, Feb 27, 2011 at 7:39 PM, Craig Ringer

>> AFAIK, the `timestamp' type moved from a floating-point to an integer
>> representation internally, which would've affected the binary protocol
>> representation. That was even a compile-time config option, so it could be
>> different between two different Pg installs with the same version.
>
> Actually, this has always been a compile time option on the server as
> far as i remember and there is protocol support for it -- libpq tells
> you how it has been set...you've always had to deal with this

It should be 'integer_datetimes' setting reported by PQparameterStatus I assume.

> (libpqtypes, which I've mentioned several times as the correct way to
> 'do' binary with libpq, handles this for you).  Only the default
> setting was changed.  Timestamps are the only type afaik that are like
> this so they are something of a special case.  There *was* a binary
> wire format change for timestamps some time back iirc that was
> unrelated to the int/float property.  They are unlikely to change
> again unless the server's internal representation changes.
>
> Small aside about timestamps: aside from bytea they stand to benefit
> the most out of all the common types by moving to binary, especially
> if you are moving them to/from something like a C struct tm on the
> client.  int, float, etc are hardly noticeable and barely worth the
> trouble.

Not only barely noticeable: I've talked with somebody who has done
some performance test and has found better performance in text mode
for small integers: apparently the time spent parsing the integer text
format was less than the overhead to pass the 8 bits of an int64 just
to represent '3'. Probably an edge case but it restates that for ints
there isn't a lot to gain from the binary format.

So it looks like we could assume the bin representation relatively
stable, but still at each postgresql major version it should be
checked if the internal representation of any data type changed. OTOH
not so many data types would really benefit from being transferred as
binary (from a quick scan to the data types list, the datetime types
look indeed the ones where there would be the most to gain).

-- Daniele

Re: Binary params in libpq

From
Craig Ringer
Date:
On 28/02/2011 10:42 PM, Merlin Moncure wrote:

> Actually, this has always been a compile time option on the server as
> far as i remember and there is protocol support for it -- libpq tells
> you how it has been set...you've always had to deal with this
> (libpqtypes, which I've mentioned several times as the correct way to
> 'do' binary with libpq, handles this for you).  Only the default
> setting was changed.

Ah. Thanks for clarifying that.

It sounds like binary transfer is best for date, timestamp, and bytea.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Binary params in libpq

From
Kelly Burkhart
Date:
On Mon, Feb 28, 2011 at 8:42 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Sun, Feb 27, 2011 at 7:39 PM, Craig Ringer
>> AFAIK, the `timestamp' type moved from a floating-point to an integer
>> representation internally, which would've affected the binary protocol
>> representation. That was even a compile-time config option, so it could be
>> different between two different Pg installs with the same version.
>
> Actually, this has always been a compile time option on the server as
> far as i remember and there is protocol support for it -- libpq tells
> you how it has been set...you've always had to deal with this

I don't see any libpq calls that can, at run-time, tell you things
like what format the timestamp is and what endian-ness the server is.
Is there something I'm missing?  The only thing I could figure out is
to do something like 'select 123::int4' or select a known date and
determine the nature of the server from what you get back.

-K

Re: Binary params in libpq

From
Merlin Moncure
Date:
On Tue, Mar 1, 2011 at 8:19 AM, Kelly Burkhart <kelly.burkhart@gmail.com> wrote:
> On Mon, Feb 28, 2011 at 8:42 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Sun, Feb 27, 2011 at 7:39 PM, Craig Ringer
>>> AFAIK, the `timestamp' type moved from a floating-point to an integer
>>> representation internally, which would've affected the binary protocol
>>> representation. That was even a compile-time config option, so it could be
>>> different between two different Pg installs with the same version.
>>
>> Actually, this has always been a compile time option on the server as
>> far as i remember and there is protocol support for it -- libpq tells
>> you how it has been set...you've always had to deal with this
>
> I don't see any libpq calls that can, at run-time, tell you things
> like what format the timestamp is and what endian-ness the server is.
> Is there something I'm missing?  The only thing I could figure out is
> to do something like 'select 123::int4' or select a known date and
> determine the nature of the server from what you get back.

You don't need to know endian-ness: it is always in network order (big
endian).  As noted above, you can check datetime format with
PQparameterStatus.

Really, my advice to you and the countless others who seem to
continually want to re-engineer this problem is to either use or crib
from two libraries that have completely solved it...namely libpqtypes
and ecpg.  Some of the wire formats are non-trivial to convert to C
native types.

merlin

Re: Binary params in libpq

From
Tom Lane
Date:
Kelly Burkhart <kelly.burkhart@gmail.com> writes:
> On Mon, Feb 28, 2011 at 8:42 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Actually, this has always been a compile time option on the server as
>> far as i remember and there is protocol support for it -- libpq tells
>> you how it has been set...you've always had to deal with this

> I don't see any libpq calls that can, at run-time, tell you things
> like what format the timestamp is and what endian-ness the server is.
> Is there something I'm missing?

The timestamp format can be determined by querying PQparameterStatus
for the value of "integer_datetimes".  Server endianness is irrelevant
because values are always sent big-endian.

            regards, tom lane