Thread: Stuff for 2.4.1

Stuff for 2.4.1

From

Daniele Varrazzo

Date:

27 March 2011, 07:33:36

I've integrated a few patches in my repos. The most import is own
parser for the bytea hex format, so that clients are no more dependant
on libpq 9 to talk with a 9 server. By the way the best way to make
this patch was to have our own parser for the "escape" format as well:
this has generally improved performance in receiving bytea
(PQunescapeBytea required a null-terminated string even if we know the
length of the string to decode, so we dropped unneeded memcpy/strlen
of the received data).

Other things are minor, see the news file
(https://github.com/dvarrazzo/psycopg/blob/devel/NEWS).

If I haven't forgotten anything I think we can have a 2.4.1 release.

Cheers,

-- Daniele

Re: Stuff for 2.4.1

From

Harald Armin Massa

Date:

27 March 2011, 08:51:14

Daniele,

> own parser for the bytea hex format, so that clients are no more >dependant on libpq 9 to talk with a 9 server.
>By the way the best way to make  this patch
>was to have our own parser for the "escape" format as well:
> this has generally improved performance in receiving bytea

compliments for doing performance improvements!

Are you really sure that psycopg2 should go the road of having own
parsers in addition to libpq-s routines? As much as I am happy about
the robustness when having other libpqs, and about the performance
benefit, as much I fear to have some new areas for possible bugs -
especially security-relevant things like SQL-injections.

Harald

--
Harald Armin Massa     www.2ndQuadrant.com
PostgreSQL  Training, Services  and Support

2ndQuadrant Deutschland GmbH
GF: Harald Armin Massa
Amtsgericht Stuttgart, HRB 736399

Re: Stuff for 2.4.1

From

Daniele Varrazzo

Date:

27 March 2011, 10:46:51

On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa
<harald@2ndquadrant.com> wrote:
> Daniele,
>
>> own parser for the bytea hex format, so that clients are no more >dependant on libpq 9 to talk with a 9 server.
>>By the way the best way to make  this patch
>>was to have our own parser for the "escape" format as well:
>> this has generally improved performance in receiving bytea
>
> compliments for doing performance improvements!
>
> Are you really sure that psycopg2 should go the road of having own
> parsers in addition to libpq-s routines? As much as I am happy about
> the robustness when having other libpqs, and about the performance
> benefit, as much I fear to have some new areas for possible bugs -
> especially security-relevant things like SQL-injections.

Hi Harald,

The will to stick as much as possible to the libpq functions has been
the reason I had not written the above parser before (releasing
2.4.0). Unfortunately the bytea problem has proven trickier to handle
for many psycopg users. I've changed my mind as I think psycopg has
the responsibility to provide a set of feature in a robust way, and if
the libpq is just not reliable for bytea parsing (for me the hex
format should have been backported to the the client libraries of the
previous versions) I think we have to provide a solution, not just to
propagate the problem.

Please note that I have not written a parser for user input: this is a
parser specifically used to receive data from the database and is only
used to parse the bytea *output* format
(http://www.postgresql.org/docs/9.0/static/datatype-binary.html). I
would be very concerned in replacing PQescapeString/PQescapeBytea for
the reason you mention, and I would never do it to gain performance:
I've just replaced PQunescapeBytea to solve the problem of PG 9.0
communication, and the performance improvement is only a side effect,
not something I was after.

Of course the code is available for review
<https://github.com/dvarrazzo/psycopg/blob/devel/psycopg/typecast_binary.c>.

Regards,

-- Daniele

Re: Stuff for 2.4.1

From

David Blewett

Date:

27 March 2011, 14:32:36

On Sun, Mar 27, 2011 at 9:46 AM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
> On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa
> <harald@2ndquadrant.com> wrote:
>> Are you really sure that psycopg2 should go the road of having own
>> parsers in addition to libpq-s routines? As much as I am happy about
>> the robustness when having other libpqs, and about the performance
>> benefit, as much I fear to have some new areas for possible bugs -
>> especially security-relevant things like SQL-injections.
>
> The will to stick as much as possible to the libpq functions has been
> the reason I had not written the above parser before (releasing
> 2.4.0). Unfortunately the bytea problem has proven trickier to handle
> for many psycopg users. I've changed my mind as I think psycopg has
> the responsibility to provide a set of feature in a robust way, and if
> the libpq is just not reliable for bytea parsing (for me the hex
> format should have been backported to the the client libraries of the
> previous versions) I think we have to provide a solution, not just to
> propagate the problem.

I think I agree with Harald here. In my opinion, this shouldn't be
done at the driver level. There never has been a guarantee from the
database side that applications compiled against older libpq will be
able to communicate with newer versions. Emulating this in the driver
only propagates this mis-conception. What has been the problem in the
past? Maybe the documentation should be improved so that people are
sure to build against the appropriate version of libpq for the version
of the server they intend to communicate with?

--
Thanks,

David Blewett

Re: Stuff for 2.4.1

From

Karsten Hilbert

Date:

27 March 2011, 14:41:08

> [...] psycopg2's own bytea output parsing [...]

I think it's conceptually better to endure the extra
code/query(s) needed to determine at .connect() time whether
"set bytea_output to ..." is needed or not.

After all this is only going to be relevant within the
window where pre-9 libpq's try to connect to 9+ servers.

For psycopg2's compiled against 9+ libpq's this would IFDEF
out to a noop, essentially.

Either way, thanks for the work on psycopg2 !

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Stuff for 2.4.1

From

Daniele Varrazzo

Date:

27 March 2011, 15:25:29

On Sun, Mar 27, 2011 at 6:23 PM, David Blewett <david@dawninglight.net> wrote:
> On Sun, Mar 27, 2011 at 9:46 AM, Daniele Varrazzo
> <daniele.varrazzo@gmail.com> wrote:
>> On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa
>> <harald@2ndquadrant.com> wrote:
>>> Are you really sure that psycopg2 should go the road of having own
>>> parsers in addition to libpq-s routines? As much as I am happy about
>>> the robustness when having other libpqs, and about the performance
>>> benefit, as much I fear to have some new areas for possible bugs -
>>> especially security-relevant things like SQL-injections.
>>
>> The will to stick as much as possible to the libpq functions has been
>> the reason I had not written the above parser before (releasing
>> 2.4.0). Unfortunately the bytea problem has proven trickier to handle
>> for many psycopg users. I've changed my mind as I think psycopg has
>> the responsibility to provide a set of feature in a robust way, and if
>> the libpq is just not reliable for bytea parsing (for me the hex
>> format should have been backported to the the client libraries of the
>> previous versions) I think we have to provide a solution, not just to
>> propagate the problem.
>
> I think I agree with Harald here. In my opinion, this shouldn't be
> done at the driver level.

No? At which level do you think this should be done?

> There never has been a guarantee from the
> database side that applications compiled against older libpq will be
> able to communicate with newer versions. Emulating this in the driver
> only propagates this mis-conception. What has been the problem in the
> past?

There have been several examples in the not so long story of this
mailing list, other were in the previous ml. Here's a couple. You can
google yourself to discover that GnuMed, Drupal and who knows how many
not public projects have been bitten by the transition to hex.

http://archives.postgresql.org/psycopg/2011-02/msg00020.php
http://archives.postgresql.org/pgsql-general/2010-10/msg00146.php
(from Harald, interesting)

> Maybe the documentation should be improved so that people are
> sure to build against the appropriate version of libpq for the version
> of the server they intend to communicate with?

The documentation does a pretty good job in describing the problem.
Unfortunately not everybody is willing/able/allowed to install an
up-to-date client library. PG 9 is not so widespread on the clients
yet (e.g. Natty still deploys 8.4, so you will have to wait at least
Oct 2011 for a libpq 9 on the most widespread linux distro).

Half of Psycopg job is to convert the textual representation of
postgres data type into python object. The libpq provides no
assistance in doing that for any other format than bytea. There is no
PQunescapeDateTime whatsoever: should we ask Python users to parse the
dates from string themselves? Not to mention composite types, hstore
and other delicacies, for which we provide parsers to Python objects.
The bytea is in a sense "blessed" for having a dedicated unescape
function, but if using it causes more trouble than it solves I think a
less problematic parser is a good answer.

-- Daniele

Re: Stuff for 2.4.1

From

Daniele Varrazzo

Date:

27 March 2011, 15:38:01

On Sun, Mar 27, 2011 at 6:40 PM, Karsten Hilbert
<Karsten.Hilbert@gmx.net> wrote:
>> [...] psycopg2's own bytea output parsing [...]
>
> I think it's conceptually better to endure the extra
> code/query(s) needed to determine at .connect() time whether
> "set bytea_output to ..." is needed or not.

This would imply an extra roundtrip to the server per connection:
we've already had enough people complaining we do too many of them
(they used to be 3, now there is a single one and I would have dropped
it as well, but there's not been consensus on this point).

I think we've already discussed about querying the server and we
agreed not to do this.

> After all this is only going to be relevant within the
> window where pre-9 libpq's try to connect to 9+ servers.
>
> For psycopg2's compiled against 9+ libpq's this would IFDEF
> out to a noop, essentially.

Unfortunately there's no reliable way to do this. The problem is in
the runtime library, not in the one used at compile time, and there is
no way to know the runtime version - not for libraries before 9.0.

Even if I could have the option to write "if version < 9 then use our
parser, else use libpq's", we should then track who allocated the
memory returned by the function to correctly free it (PQunescapeBytea
allocates its own memory to be released with PQfreemem, and there is
no "PQmalloc" exposed we could use to make my buffer appear like a
libpq's one): so the result would have amounted to such an ugly and
convoluted code that I simply prefer to have my own parser.

-- Daniele

Re: Stuff for 2.4.1

From

Karsten Hilbert

Date:

27 March 2011, 15:51:40

On Sun, Mar 27, 2011 at 07:25:20PM +0100, Daniele Varrazzo wrote:

> > There never has been a guarantee from the
> > database side that applications compiled against older libpq will be
> > able to communicate with newer versions. Emulating this in the driver
> > only propagates this mis-conception. What has been the problem in the
> > past?
>
> There have been several examples in the not so long story of this
> mailing list, other were in the previous ml. Here's a couple. You can
> google yourself to discover that GnuMed, Drupal and who knows how many
> not public projects have been bitten by the transition to hex.

And, thanks to the help of this list, we've GNUmed has had a
fix out within minutes after learning the reason of the
problem.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Stuff for 2.4.1

From

Harald Armin Massa

Date:

27 March 2011, 15:57:03

Daniele,

as you found correctly, I was allready biten by that bytea-escape-bug.
The aftermath led to the PQlibVersion() function for libpq, committed
by Magnus @ http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=de9a4c27fefcc0d104bc9c97f4a93a49a25bf66d

> Please note that I have not written a parser for user input: this is a
> parser specifically used to receive data from the database and is only
> used to parse the bytea *output* format
> (http://www.postgresql.org/docs/9.0/static/datatype-binary.html).
> I would be very concerned in replacing >PQescapeString/PQescapeBytea for
> the reason you mention, and I would never do it to gain performance:

your arguments are sound. And a line at "nothing from the user, just
stuff from the database" is a line correctly drawn. Parsing things
that come from the database should be save.

Thanks for taking the time to answer my fears,

best wishes

Harald





--
Harald Armin Massa     www.2ndQuadrant.com
PostgreSQL  Training, Services  and Support

2ndQuadrant Deutschland GmbH
GF: Harald Armin Massa
Amtsgericht Stuttgart, HRB 736399

Re: Stuff for 2.4.1

From

Adrian Klaver

Date:

27 March 2011, 16:01:27

On Sunday, March 27, 2011 11:25:20 am Daniele Varrazzo wrote:

> On Sun, Mar 27, 2011 at 6:23 PM, David Blewett <david@dawninglight.net> wrote:

> > On Sun, Mar 27, 2011 at 9:46 AM, Daniele Varrazzo

> >

> > <daniele.varrazzo@gmail.com> wrote:

> >> On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa

> >>

> >> <harald@2ndquadrant.com> wrote:

> >>> Are you really sure that psycopg2 should go the road of having own

> >>> parsers in addition to libpq-s routines? As much as I am happy about

> >>> the robustness when having other libpqs, and about the performance

> >>> benefit, as much I fear to have some new areas for possible bugs -

> >>> especially security-relevant things like SQL-injections.

> >>

> >> The will to stick as much as possible to the libpq functions has been

> >> the reason I had not written the above parser before (releasing

> >> 2.4.0). Unfortunately the bytea problem has proven trickier to handle

> >> for many psycopg users. I've changed my mind as I think psycopg has

> >> the responsibility to provide a set of feature in a robust way, and if

> >> the libpq is just not reliable for bytea parsing (for me the hex

> >> format should have been backported to the the client libraries of the

> >> previous versions) I think we have to provide a solution, not just to

> >> propagate the problem.

> >

> > I think I agree with Harald here. In my opinion, this shouldn't be

> > done at the driver level.

> No? At which level do you think this should be done?

> > There never has been a guarantee from the

> > database side that applications compiled against older libpq will be

> > able to communicate with newer versions. Emulating this in the driver

> > only propagates this mis-conception. What has been the problem in the

> > past?

> There have been several examples in the not so long story of this

> mailing list, other were in the previous ml. Here's a couple. You can

> google yourself to discover that GnuMed, Drupal and who knows how many

> not public projects have been bitten by the transition to hex.

This is fixed by:

bytea_output = escape

Major upgrades are just that, major. Failure to read the docs is on the user. I know, I got bit by the change in behavior of pl/python with regards to 't' or 'f'.

> http://archives.postgresql.org/psycopg/2011-02/msg00020.php

> http://archives.postgresql.org/pgsql-general/2010-10/msg00146.php

> (from Harald, interesting)

> > Maybe the documentation should be improved so that people are

> > sure to build against the appropriate version of libpq for the version

> > of the server they intend to communicate with?

> The documentation does a pretty good job in describing the problem.

> Unfortunately not everybody is willing/able/allowed to install an

> up-to-date client library. PG 9 is not so widespread on the clients

> yet (e.g. Natty still deploys 8.4, so you will have to wait at least

> Oct 2011 for a libpq 9 on the most widespread linux distro).

I am not entirely against your argument, but if you are going to make the above statement at least compare apples to apples. In Natty the released version of Psycopg2 is 2.2.1 , so 2.4.1 is not an option by your rules. Also, it is possible for people to run a Pg 9.0 on newer distributions:

https://launchpad.net/~pitti/+archive/postgresql

> Half of Psycopg job is to convert the textual representation of

> postgres data type into python object. The libpq provides no

> assistance in doing that for any other format than bytea. There is no

> PQunescapeDateTime whatsoever: should we ask Python users to parse the

> dates from string themselves? Not to mention composite types, hstore

> and other delicacies, for which we provide parsers to Python objects.

> The bytea is in a sense "blessed" for having a dedicated unescape

> function, but if using it causes more trouble than it solves I think a

> less problematic parser is a good answer.

Like I said I am not entirely against what you are trying to do. I am concerned though that hiding a implementation detail does not really serve the user. Sooner or later they will have to come to grips with the change in behavior, whether via Psycopg or some other interface.

> -- Daniele

Adrian Klaver

adrian.klaver@gmail.com

Re: Stuff for 2.4.1

From

Karsten Hilbert

Date:

27 March 2011, 16:18:49

On Sun, Mar 27, 2011 at 07:37:52PM +0100, Daniele Varrazzo wrote:

> > After all this is only going to be relevant within the
> > window where pre-9 libpq's try to connect to 9+ servers.
> >
> > For psycopg2's compiled against 9+ libpq's this would IFDEF
> > out to a noop, essentially.
>
> Unfortunately there's no reliable way to do this. The problem is in
> the runtime library, not in the one used at compile time, and there is
> no way to know the runtime version - not for libraries before 9.0.

Well, one could roundtrip a bit of binary data, no ?

(I realize this wouldn't help all that much.)

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Stuff for 2.4.1

From

Daniele Varrazzo

Date:

27 March 2011, 16:36:58

On Sun, Mar 27, 2011 at 8:01 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:

> I am not entirely against your argument, but if you are going to make the
> above statement at least compare apples to apples. In Natty the released
> version of Psycopg2 is 2.2.1 , so 2.4.1 is not an option by your rules.

Once you have system packages libpq-dev and python-dev, psycopg can be
installed just with "easy_install psycopg2" and can be easily put in
per-user or per-project directories. Using non-system python packages
for a project is not uncommon at all, whereas I find always more
resistance from people to compile stuff for their /usr/local/.

Even if you wouldn't be able to upgrade system packages, you could
always compile the libpq yourself, but then you have to be careful to
the LD_LIBRARY_PATH to link the correct library at runtime. It would
be easy to have software passing all the tests and then failing when
deployed because of an upstart script not configured properly. It's
just too brittle and I feel we can avoid annoying situations like
this.

> Also, it is possible for people to run a Pg 9.0 on newer distributions:
>
> https://launchpad.net/~pitti/+archive/postgresql

Of course this is the best option to get PG9 today.

-- Daniele

Re: Stuff for 2.4.1

From

Federico Di Gregorio

Date:

28 March 2011, 04:11:25

On 27/03/11 19:23, David Blewett wrote:
> On Sun, Mar 27, 2011 at 9:46 AM, Daniele Varrazzo
> <daniele.varrazzo@gmail.com> wrote:
>> > On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa
>> > <harald@2ndquadrant.com> wrote:
>>> >> Are you really sure that psycopg2 should go the road of having own
>>> >> parsers in addition to libpq-s routines? As much as I am happy about
>>> >> the robustness when having other libpqs, and about the performance
>>> >> benefit, as much I fear to have some new areas for possible bugs -
>>> >> especially security-relevant things like SQL-injections.
>> >
>> > The will to stick as much as possible to the libpq functions has been
>> > the reason I had not written the above parser before (releasing
>> > 2.4.0). Unfortunately the bytea problem has proven trickier to handle
>> > for many psycopg users. I've changed my mind as I think psycopg has
>> > the responsibility to provide a set of feature in a robust way, and if
>> > the libpq is just not reliable for bytea parsing (for me the hex
>> > format should have been backported to the the client libraries of the
>> > previous versions) I think we have to provide a solution, not just to
>> > propagate the problem.
> I think I agree with Harald here. In my opinion, this shouldn't be
> done at the driver level. There never has been a guarantee from the
> database side that applications compiled against older libpq will be
> able to communicate with newer versions. Emulating this in the driver
> only propagates this mis-conception. What has been the problem in the
> past? Maybe the documentation should be improved so that people are
> sure to build against the appropriate version of libpq for the version
> of the server they intend to communicate with?

What Daniele did is fine:

1) There is no security problem, because the code only work in the
   database->user direction.

2) Allows communication with different combinations of backend/libpq
   versions without adding the overhead of extra quesries when
   establishing the connection (i.e., it just works and this is
   very important for the user).

Also, while I am writing very few new code I am reviewing everything and
I am confident to say that psycopg is much safe now than 2 years ago
when I was the only developer.

federico

--
Federico Di Gregorio                         federico.digregorio@dndg.it
Studio Associato Di Nunzio e Di Gregorio                  http://dndg.it
 When people say things are a lot more complicated than that, they
  means they're getting worried that they won't like the truth.
                                                    -- Granny Weatherwax

Re: Stuff for 2.4.1

From

"A.M."

Date:

28 March 2011, 12:01:05

On Mar 28, 2011, at 3:11 AM, Federico Di Gregorio wrote:

>
> What Daniele did is fine:
>
> 1) There is no security problem, because the code only work in the
>   database->user direction.
>
> 2) Allows communication with different combinations of backend/libpq
>   versions without adding the overhead of extra quesries when
>   establishing the connection (i.e., it just works and this is
>   very important for the user).
>
> Also, while I am writing very few new code I am reviewing everything and
> I am confident to say that psycopg is much safe now than 2 years ago
> when I was the only developer.

Wouldn't it make more sense to simply bundle the latest version of libpq with psycopg2? As far as I can tell, there is
noadvantage to compiling against an older libpq- they are all backwards compatible. 

Cheers,
M

Re: Stuff for 2.4.1

From

Federico Di Gregorio

Date:

28 March 2011, 12:53:48

On 28/03/11 17:00, A.M. wrote:
> On Mar 28, 2011, at 3:11 AM, Federico Di Gregorio wrote:
>
>>>
>>> What Daniele did is fine:
>>>
>>> 1) There is no security problem, because the code only work in
>>> the database->user direction.
>>>
>>> 2) Allows communication with different combinations of
>>> backend/libpq versions without adding the overhead of extra
>>> quesries when establishing the connection (i.e., it just works
>>> and this is very important for the user).
>>>
>>> Also, while I am writing very few new code I am reviewing
>>> everything and I am confident to say that psycopg is much safe
>>> now than 2 years ago when I was the only developer.
> Wouldn't it make more sense to simply bundle the latest version of
> libpq with psycopg2? As far as I can tell, there is no advantage to
> compiling against an older libpq- they are all backwards compatible.

You mean bundling the libpq source code and build it as part of psycopg?
Don't know whay but that idea sends creeps up my spine... :)

federico

--
Federico Di Gregorio                         federico.digregorio@dndg.it
Studio Associato Di Nunzio e Di Gregorio                  http://dndg.it
                 Ma nostro di chi? Cosa abbiamo in comune io e te? -- Md

Re: Stuff for 2.4.1

From

"Karsten Hilbert"

Date:

28 March 2011, 12:56:16

> You mean bundling the libpq source code and build it as part of psycopg?
> Don't know why but that idea sends creeps up my spine... :)

Why ?  Because that's called DLL Hell.

Also, psycopg2 wouldn't be packageable for Debian anymore without
first removing again the meticulously included libpq code.

Karsten
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

Re: Stuff for 2.4.1

From

"P. Christeas"

Date:

28 March 2011, 16:14:01

On Monday 28 March 2011, Karsten Hilbert wrote:
> > You mean bundling the libpq source code and build it as part of psycopg?
> > Don't know why but that idea sends creeps up my spine... :)
>
> Why ?  Because that's called DLL Hell.
>
> Also, psycopg2 wouldn't be packageable for Debian anymore...

Cool !

:P

--
Say NO to spam and viruses. Stop using Microsoft Windows!