Thread: Stuff for 2.4.1
I've integrated a few patches in my repos. The most import is own parser for the bytea hex format, so that clients are no more dependant on libpq 9 to talk with a 9 server. By the way the best way to make this patch was to have our own parser for the "escape" format as well: this has generally improved performance in receiving bytea (PQunescapeBytea required a null-terminated string even if we know the length of the string to decode, so we dropped unneeded memcpy/strlen of the received data). Other things are minor, see the news file (https://github.com/dvarrazzo/psycopg/blob/devel/NEWS). If I haven't forgotten anything I think we can have a 2.4.1 release. Cheers, -- Daniele
Daniele, > own parser for the bytea hex format, so that clients are no more >dependant on libpq 9 to talk with a 9 server. >By the way the best way to make this patch >was to have our own parser for the "escape" format as well: > this has generally improved performance in receiving bytea compliments for doing performance improvements! Are you really sure that psycopg2 should go the road of having own parsers in addition to libpq-s routines? As much as I am happy about the robustness when having other libpqs, and about the performance benefit, as much I fear to have some new areas for possible bugs - especially security-relevant things like SQL-injections. Harald -- Harald Armin Massa www.2ndQuadrant.com PostgreSQL Training, Services and Support 2ndQuadrant Deutschland GmbH GF: Harald Armin Massa Amtsgericht Stuttgart, HRB 736399
On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa <harald@2ndquadrant.com> wrote: > Daniele, > >> own parser for the bytea hex format, so that clients are no more >dependant on libpq 9 to talk with a 9 server. >>By the way the best way to make this patch >>was to have our own parser for the "escape" format as well: >> this has generally improved performance in receiving bytea > > compliments for doing performance improvements! > > Are you really sure that psycopg2 should go the road of having own > parsers in addition to libpq-s routines? As much as I am happy about > the robustness when having other libpqs, and about the performance > benefit, as much I fear to have some new areas for possible bugs - > especially security-relevant things like SQL-injections. Hi Harald, The will to stick as much as possible to the libpq functions has been the reason I had not written the above parser before (releasing 2.4.0). Unfortunately the bytea problem has proven trickier to handle for many psycopg users. I've changed my mind as I think psycopg has the responsibility to provide a set of feature in a robust way, and if the libpq is just not reliable for bytea parsing (for me the hex format should have been backported to the the client libraries of the previous versions) I think we have to provide a solution, not just to propagate the problem. Please note that I have not written a parser for user input: this is a parser specifically used to receive data from the database and is only used to parse the bytea *output* format (http://www.postgresql.org/docs/9.0/static/datatype-binary.html). I would be very concerned in replacing PQescapeString/PQescapeBytea for the reason you mention, and I would never do it to gain performance: I've just replaced PQunescapeBytea to solve the problem of PG 9.0 communication, and the performance improvement is only a side effect, not something I was after. Of course the code is available for review <https://github.com/dvarrazzo/psycopg/blob/devel/psycopg/typecast_binary.c>. Regards, -- Daniele
On Sun, Mar 27, 2011 at 9:46 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa > <harald@2ndquadrant.com> wrote: >> Are you really sure that psycopg2 should go the road of having own >> parsers in addition to libpq-s routines? As much as I am happy about >> the robustness when having other libpqs, and about the performance >> benefit, as much I fear to have some new areas for possible bugs - >> especially security-relevant things like SQL-injections. > > The will to stick as much as possible to the libpq functions has been > the reason I had not written the above parser before (releasing > 2.4.0). Unfortunately the bytea problem has proven trickier to handle > for many psycopg users. I've changed my mind as I think psycopg has > the responsibility to provide a set of feature in a robust way, and if > the libpq is just not reliable for bytea parsing (for me the hex > format should have been backported to the the client libraries of the > previous versions) I think we have to provide a solution, not just to > propagate the problem. I think I agree with Harald here. In my opinion, this shouldn't be done at the driver level. There never has been a guarantee from the database side that applications compiled against older libpq will be able to communicate with newer versions. Emulating this in the driver only propagates this mis-conception. What has been the problem in the past? Maybe the documentation should be improved so that people are sure to build against the appropriate version of libpq for the version of the server they intend to communicate with? -- Thanks, David Blewett
> [...] psycopg2's own bytea output parsing [...] I think it's conceptually better to endure the extra code/query(s) needed to determine at .connect() time whether "set bytea_output to ..." is needed or not. After all this is only going to be relevant within the window where pre-9 libpq's try to connect to 9+ servers. For psycopg2's compiled against 9+ libpq's this would IFDEF out to a noop, essentially. Either way, thanks for the work on psycopg2 ! Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Sun, Mar 27, 2011 at 6:23 PM, David Blewett <david@dawninglight.net> wrote: > On Sun, Mar 27, 2011 at 9:46 AM, Daniele Varrazzo > <daniele.varrazzo@gmail.com> wrote: >> On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa >> <harald@2ndquadrant.com> wrote: >>> Are you really sure that psycopg2 should go the road of having own >>> parsers in addition to libpq-s routines? As much as I am happy about >>> the robustness when having other libpqs, and about the performance >>> benefit, as much I fear to have some new areas for possible bugs - >>> especially security-relevant things like SQL-injections. >> >> The will to stick as much as possible to the libpq functions has been >> the reason I had not written the above parser before (releasing >> 2.4.0). Unfortunately the bytea problem has proven trickier to handle >> for many psycopg users. I've changed my mind as I think psycopg has >> the responsibility to provide a set of feature in a robust way, and if >> the libpq is just not reliable for bytea parsing (for me the hex >> format should have been backported to the the client libraries of the >> previous versions) I think we have to provide a solution, not just to >> propagate the problem. > > I think I agree with Harald here. In my opinion, this shouldn't be > done at the driver level. No? At which level do you think this should be done? > There never has been a guarantee from the > database side that applications compiled against older libpq will be > able to communicate with newer versions. Emulating this in the driver > only propagates this mis-conception. What has been the problem in the > past? There have been several examples in the not so long story of this mailing list, other were in the previous ml. Here's a couple. You can google yourself to discover that GnuMed, Drupal and who knows how many not public projects have been bitten by the transition to hex. http://archives.postgresql.org/psycopg/2011-02/msg00020.php http://archives.postgresql.org/pgsql-general/2010-10/msg00146.php (from Harald, interesting) > Maybe the documentation should be improved so that people are > sure to build against the appropriate version of libpq for the version > of the server they intend to communicate with? The documentation does a pretty good job in describing the problem. Unfortunately not everybody is willing/able/allowed to install an up-to-date client library. PG 9 is not so widespread on the clients yet (e.g. Natty still deploys 8.4, so you will have to wait at least Oct 2011 for a libpq 9 on the most widespread linux distro). Half of Psycopg job is to convert the textual representation of postgres data type into python object. The libpq provides no assistance in doing that for any other format than bytea. There is no PQunescapeDateTime whatsoever: should we ask Python users to parse the dates from string themselves? Not to mention composite types, hstore and other delicacies, for which we provide parsers to Python objects. The bytea is in a sense "blessed" for having a dedicated unescape function, but if using it causes more trouble than it solves I think a less problematic parser is a good answer. -- Daniele
On Sun, Mar 27, 2011 at 6:40 PM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: >> [...] psycopg2's own bytea output parsing [...] > > I think it's conceptually better to endure the extra > code/query(s) needed to determine at .connect() time whether > "set bytea_output to ..." is needed or not. This would imply an extra roundtrip to the server per connection: we've already had enough people complaining we do too many of them (they used to be 3, now there is a single one and I would have dropped it as well, but there's not been consensus on this point). I think we've already discussed about querying the server and we agreed not to do this. > After all this is only going to be relevant within the > window where pre-9 libpq's try to connect to 9+ servers. > > For psycopg2's compiled against 9+ libpq's this would IFDEF > out to a noop, essentially. Unfortunately there's no reliable way to do this. The problem is in the runtime library, not in the one used at compile time, and there is no way to know the runtime version - not for libraries before 9.0. Even if I could have the option to write "if version < 9 then use our parser, else use libpq's", we should then track who allocated the memory returned by the function to correctly free it (PQunescapeBytea allocates its own memory to be released with PQfreemem, and there is no "PQmalloc" exposed we could use to make my buffer appear like a libpq's one): so the result would have amounted to such an ugly and convoluted code that I simply prefer to have my own parser. -- Daniele
On Sun, Mar 27, 2011 at 07:25:20PM +0100, Daniele Varrazzo wrote: > > There never has been a guarantee from the > > database side that applications compiled against older libpq will be > > able to communicate with newer versions. Emulating this in the driver > > only propagates this mis-conception. What has been the problem in the > > past? > > There have been several examples in the not so long story of this > mailing list, other were in the previous ml. Here's a couple. You can > google yourself to discover that GnuMed, Drupal and who knows how many > not public projects have been bitten by the transition to hex. And, thanks to the help of this list, we've GNUmed has had a fix out within minutes after learning the reason of the problem. Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Daniele, as you found correctly, I was allready biten by that bytea-escape-bug. The aftermath led to the PQlibVersion() function for libpq, committed by Magnus @ http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=de9a4c27fefcc0d104bc9c97f4a93a49a25bf66d > Please note that I have not written a parser for user input: this is a > parser specifically used to receive data from the database and is only > used to parse the bytea *output* format > (http://www.postgresql.org/docs/9.0/static/datatype-binary.html). > I would be very concerned in replacing >PQescapeString/PQescapeBytea for > the reason you mention, and I would never do it to gain performance: your arguments are sound. And a line at "nothing from the user, just stuff from the database" is a line correctly drawn. Parsing things that come from the database should be save. Thanks for taking the time to answer my fears, best wishes Harald -- Harald Armin Massa www.2ndQuadrant.com PostgreSQL Training, Services and Support 2ndQuadrant Deutschland GmbH GF: Harald Armin Massa Amtsgericht Stuttgart, HRB 736399
On Sunday, March 27, 2011 11:25:20 am Daniele Varrazzo wrote:
> On Sun, Mar 27, 2011 at 6:23 PM, David Blewett <david@dawninglight.net> wrote:
> > On Sun, Mar 27, 2011 at 9:46 AM, Daniele Varrazzo
> >
> > <daniele.varrazzo@gmail.com> wrote:
> >> On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa
> >>
> >> <harald@2ndquadrant.com> wrote:
> >>> Are you really sure that psycopg2 should go the road of having own
> >>> parsers in addition to libpq-s routines? As much as I am happy about
> >>> the robustness when having other libpqs, and about the performance
> >>> benefit, as much I fear to have some new areas for possible bugs -
> >>> especially security-relevant things like SQL-injections.
> >>
> >> The will to stick as much as possible to the libpq functions has been
> >> the reason I had not written the above parser before (releasing
> >> 2.4.0). Unfortunately the bytea problem has proven trickier to handle
> >> for many psycopg users. I've changed my mind as I think psycopg has
> >> the responsibility to provide a set of feature in a robust way, and if
> >> the libpq is just not reliable for bytea parsing (for me the hex
> >> format should have been backported to the the client libraries of the
> >> previous versions) I think we have to provide a solution, not just to
> >> propagate the problem.
> >
> > I think I agree with Harald here. In my opinion, this shouldn't be
> > done at the driver level.
>
> No? At which level do you think this should be done?
>
> > There never has been a guarantee from the
> > database side that applications compiled against older libpq will be
> > able to communicate with newer versions. Emulating this in the driver
> > only propagates this mis-conception. What has been the problem in the
> > past?
>
> There have been several examples in the not so long story of this
> mailing list, other were in the previous ml. Here's a couple. You can
> google yourself to discover that GnuMed, Drupal and who knows how many
> not public projects have been bitten by the transition to hex.
This is fixed by:
bytea_output = escape
Major upgrades are just that, major. Failure to read the docs is on the user. I know, I got bit by the change in behavior of pl/python with regards to 't' or 'f'.
>
> http://archives.postgresql.org/psycopg/2011-02/msg00020.php
> http://archives.postgresql.org/pgsql-general/2010-10/msg00146.php
> (from Harald, interesting)
>
> > Maybe the documentation should be improved so that people are
> > sure to build against the appropriate version of libpq for the version
> > of the server they intend to communicate with?
>
> The documentation does a pretty good job in describing the problem.
> Unfortunately not everybody is willing/able/allowed to install an
> up-to-date client library. PG 9 is not so widespread on the clients
> yet (e.g. Natty still deploys 8.4, so you will have to wait at least
> Oct 2011 for a libpq 9 on the most widespread linux distro).
I am not entirely against your argument, but if you are going to make the above statement at least compare apples to apples. In Natty the released version of Psycopg2 is 2.2.1 , so 2.4.1 is not an option by your rules. Also, it is possible for people to run a Pg 9.0 on newer distributions:
https://launchpad.net/~pitti/+archive/postgresql
>
> Half of Psycopg job is to convert the textual representation of
> postgres data type into python object. The libpq provides no
> assistance in doing that for any other format than bytea. There is no
> PQunescapeDateTime whatsoever: should we ask Python users to parse the
> dates from string themselves? Not to mention composite types, hstore
> and other delicacies, for which we provide parsers to Python objects.
> The bytea is in a sense "blessed" for having a dedicated unescape
> function, but if using it causes more trouble than it solves I think a
> less problematic parser is a good answer.
Like I said I am not entirely against what you are trying to do. I am concerned though that hiding a implementation detail does not really serve the user. Sooner or later they will have to come to grips with the change in behavior, whether via Psycopg or some other interface.
>
> -- Daniele
--
Adrian Klaver
adrian.klaver@gmail.com
On Sun, Mar 27, 2011 at 07:37:52PM +0100, Daniele Varrazzo wrote: > > After all this is only going to be relevant within the > > window where pre-9 libpq's try to connect to 9+ servers. > > > > For psycopg2's compiled against 9+ libpq's this would IFDEF > > out to a noop, essentially. > > Unfortunately there's no reliable way to do this. The problem is in > the runtime library, not in the one used at compile time, and there is > no way to know the runtime version - not for libraries before 9.0. Well, one could roundtrip a bit of binary data, no ? (I realize this wouldn't help all that much.) Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Sun, Mar 27, 2011 at 8:01 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > I am not entirely against your argument, but if you are going to make the > above statement at least compare apples to apples. In Natty the released > version of Psycopg2 is 2.2.1 , so 2.4.1 is not an option by your rules. Once you have system packages libpq-dev and python-dev, psycopg can be installed just with "easy_install psycopg2" and can be easily put in per-user or per-project directories. Using non-system python packages for a project is not uncommon at all, whereas I find always more resistance from people to compile stuff for their /usr/local/. Even if you wouldn't be able to upgrade system packages, you could always compile the libpq yourself, but then you have to be careful to the LD_LIBRARY_PATH to link the correct library at runtime. It would be easy to have software passing all the tests and then failing when deployed because of an upstart script not configured properly. It's just too brittle and I feel we can avoid annoying situations like this. > Also, it is possible for people to run a Pg 9.0 on newer distributions: > > https://launchpad.net/~pitti/+archive/postgresql Of course this is the best option to get PG9 today. -- Daniele
On 27/03/11 19:23, David Blewett wrote: > On Sun, Mar 27, 2011 at 9:46 AM, Daniele Varrazzo > <daniele.varrazzo@gmail.com> wrote: >> > On Sun, Mar 27, 2011 at 12:51 PM, Harald Armin Massa >> > <harald@2ndquadrant.com> wrote: >>> >> Are you really sure that psycopg2 should go the road of having own >>> >> parsers in addition to libpq-s routines? As much as I am happy about >>> >> the robustness when having other libpqs, and about the performance >>> >> benefit, as much I fear to have some new areas for possible bugs - >>> >> especially security-relevant things like SQL-injections. >> > >> > The will to stick as much as possible to the libpq functions has been >> > the reason I had not written the above parser before (releasing >> > 2.4.0). Unfortunately the bytea problem has proven trickier to handle >> > for many psycopg users. I've changed my mind as I think psycopg has >> > the responsibility to provide a set of feature in a robust way, and if >> > the libpq is just not reliable for bytea parsing (for me the hex >> > format should have been backported to the the client libraries of the >> > previous versions) I think we have to provide a solution, not just to >> > propagate the problem. > I think I agree with Harald here. In my opinion, this shouldn't be > done at the driver level. There never has been a guarantee from the > database side that applications compiled against older libpq will be > able to communicate with newer versions. Emulating this in the driver > only propagates this mis-conception. What has been the problem in the > past? Maybe the documentation should be improved so that people are > sure to build against the appropriate version of libpq for the version > of the server they intend to communicate with? What Daniele did is fine: 1) There is no security problem, because the code only work in the database->user direction. 2) Allows communication with different combinations of backend/libpq versions without adding the overhead of extra quesries when establishing the connection (i.e., it just works and this is very important for the user). Also, while I am writing very few new code I am reviewing everything and I am confident to say that psycopg is much safe now than 2 years ago when I was the only developer. federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it When people say things are a lot more complicated than that, they means they're getting worried that they won't like the truth. -- Granny Weatherwax
On Mar 28, 2011, at 3:11 AM, Federico Di Gregorio wrote: > > What Daniele did is fine: > > 1) There is no security problem, because the code only work in the > database->user direction. > > 2) Allows communication with different combinations of backend/libpq > versions without adding the overhead of extra quesries when > establishing the connection (i.e., it just works and this is > very important for the user). > > Also, while I am writing very few new code I am reviewing everything and > I am confident to say that psycopg is much safe now than 2 years ago > when I was the only developer. Wouldn't it make more sense to simply bundle the latest version of libpq with psycopg2? As far as I can tell, there is noadvantage to compiling against an older libpq- they are all backwards compatible. Cheers, M
On 28/03/11 17:00, A.M. wrote: > On Mar 28, 2011, at 3:11 AM, Federico Di Gregorio wrote: > >>> >>> What Daniele did is fine: >>> >>> 1) There is no security problem, because the code only work in >>> the database->user direction. >>> >>> 2) Allows communication with different combinations of >>> backend/libpq versions without adding the overhead of extra >>> quesries when establishing the connection (i.e., it just works >>> and this is very important for the user). >>> >>> Also, while I am writing very few new code I am reviewing >>> everything and I am confident to say that psycopg is much safe >>> now than 2 years ago when I was the only developer. > Wouldn't it make more sense to simply bundle the latest version of > libpq with psycopg2? As far as I can tell, there is no advantage to > compiling against an older libpq- they are all backwards compatible. You mean bundling the libpq source code and build it as part of psycopg? Don't know whay but that idea sends creeps up my spine... :) federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it Ma nostro di chi? Cosa abbiamo in comune io e te? -- Md
> You mean bundling the libpq source code and build it as part of psycopg? > Don't know why but that idea sends creeps up my spine... :) Why ? Because that's called DLL Hell. Also, psycopg2 wouldn't be packageable for Debian anymore without first removing again the meticulously included libpq code. Karsten -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
On Monday 28 March 2011, Karsten Hilbert wrote: > > You mean bundling the libpq source code and build it as part of psycopg? > > Don't know why but that idea sends creeps up my spine... :) > > Why ? Because that's called DLL Hell. > > Also, psycopg2 wouldn't be packageable for Debian anymore... Cool ! :P -- Say NO to spam and viruses. Stop using Microsoft Windows!