Thread: Confusion over Python drivers
My son has brought to my attention that our current crop of Python client libraries is inadequate/confusing. I took a look myself, and asked on our IRC channel, and am now convinced this area needs attention. First, the sheer number of libraries is confusing. This is the page from the Postgres wiki: http://wiki.postgresql.org/wiki/Python The first one listed, Psycopg, is noted as "preferred libpq-based driver", but the license is GPL. Isn't that a problem for many client applications? The next one, PyGreSQL, is BSD licensed, but only has documentation for the "classic" interface. The DB-API module says about documentation: http://www.pygresql.org/pgdb.html This section of the documentation still needs to be written. The other three are pure Python drivers, which I guess can be good, but why three, and then there are three more listed as "obsolete/stalled". Clearly something is wrong here. The Python-hosted PostgreSQL page has similar problems: http://wiki.python.org/moin/PostgreSQL Does Perl have a similar mess? While I realize experienced people can easily navigate this confusion, I am concerned about new Postgres adopters being very confused by this and it is hurting our database adoption in general. What is really needed is for someone to take charge of one of these projects and make a best-of-breed Python driver that can gain general acceptance as our preferred driver. I feel Python is too important a language to be left in this state. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
http://wiki.postgresql.org/wiki/Python
The first one listed, Psycopg, is noted as "preferred libpq-based
driver", but the license is GPL. Isn't that a problem for many client
applications?
The licence of psycopg2 is a little more complicated; the "GPL" in that summary just tries to sum it. The actual licence is: [from the LICENCE in the source distribution]
---------------------------------------------
psycopg and the GPL
===================
psycopg is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version. See file COPYING for details.
As a special exception, specific permission is granted for the GPLed
code in this distribition to be linked to OpenSSL and PostgreSQL libpq
without invoking GPL clause 2(b).
Note that the GPL was chosen to avoid proprietary adapters based on
psycopg code. Using psycopg in a proprietary product (even bundling
psycopg with the proprietary product) is fine as long as:
1. psycopg is called from Python only using only the provided API
(i.e., no linking with C code and no C modules based on it); and
2. all the other points of the GPL are respected (you offer a copy
of psycopg's source code, and so on.)
Alternative licenses
====================
If you prefer you can use the Zope Database Adapter ZPsycopgDA (i.e.,
every file inside the ZPsycopgDA directory) user the ZPL license as
published on the Zope web site, http://www.zope.org/Resources/ZPL.
Also, the following BSD-like license applies (at your option) to the
files following the pattern psycopg/adapter*.{h,c} and
psycopg/microprotocol*.{h,c}:
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product documentation
would be appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not
be misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
psycopg is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Proprietary licenses
====================
A non-exclusive license is available for companies that want to include
psycopg in their proprietary products without respecting the spirit of the
GPL. The price of the license is one day of development done by the author,
at the consulting fee he applies to his usual customers at the day of the
request.
---------------------------------------------
---------------------------------------------
psycopg and the GPL
===================
psycopg is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version. See file COPYING for details.
As a special exception, specific permission is granted for the GPLed
code in this distribition to be linked to OpenSSL and PostgreSQL libpq
without invoking GPL clause 2(b).
Note that the GPL was chosen to avoid proprietary adapters based on
psycopg code. Using psycopg in a proprietary product (even bundling
psycopg with the proprietary product) is fine as long as:
1. psycopg is called from Python only using only the provided API
(i.e., no linking with C code and no C modules based on it); and
2. all the other points of the GPL are respected (you offer a copy
of psycopg's source code, and so on.)
Alternative licenses
====================
If you prefer you can use the Zope Database Adapter ZPsycopgDA (i.e.,
every file inside the ZPsycopgDA directory) user the ZPL license as
published on the Zope web site, http://www.zope.org/Resources/ZPL.
Also, the following BSD-like license applies (at your option) to the
files following the pattern psycopg/adapter*.{h,c} and
psycopg/microprotocol*.{h,c}:
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product documentation
would be appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not
be misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
psycopg is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Proprietary licenses
====================
A non-exclusive license is available for companies that want to include
psycopg in their proprietary products without respecting the spirit of the
GPL. The price of the license is one day of development done by the author,
at the consulting fee he applies to his usual customers at the day of the
request.
---------------------------------------------
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
Massa, Harald Armin wrote: > Bruce, > > http://wiki.postgresql.org/wiki/Python > > > > The first one listed, Psycopg, is noted as "preferred libpq-based > > driver", but the license is GPL. Isn't that a problem for many client > > applications? > > > > The licence of psycopg2 is a little more complicated; the "GPL" in that > summary just tries to sum it. The actual licence is: [from the LICENCE in > the source distribution] Wow, that is super-confusing. I am dealing with an issue now that some companies are concerned that individual names appear with our own BSD copyrights in small files. That pales in comparison to this copyright. I would never be able to recommend software with that complex a copyright. I think the community needs to look at other alternatives. (I am amazed that not only are there a confusing number of drivers, but there are a confusing number of copyright options for this single driver.) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On fre, 2010-02-05 at 09:19 -0500, Bruce Momjian wrote: > While I realize experienced people can easily navigate this confusion, > I > am concerned about new Postgres adopters being very confused by this > and > it is hurting our database adoption in general. > > What is really needed is for someone to take charge of one of these > projects and make a best-of-breed Python driver that can gain general > acceptance as our preferred driver. I feel Python is too important a > language to be left in this state. The situation is unfortunate, but you might as well argue that too many Linux desktops or Linux distributions confuse new users and hurt adoption. These alternatives all exist for a reason, and it will be difficult to get some of them to abandon their work with the aim of reducing the overall "confusion".
On Fri, Feb 05, 2010 at 09:19:26AM -0500, Bruce Momjian wrote: > My son has brought to my attention that our current crop of Python > client libraries is inadequate/confusing. I took a look myself, and > asked on our IRC channel, and am now convinced this area needs > attention. > > http://wiki.postgresql.org/wiki/Python > The Python-hosted PostgreSQL page has similar problems: > http://wiki.python.org/moin/PostgreSQL > > Does Perl have a similar mess? I don't think so. The primary database interface is DBI and as far as I can see there's only one DBI PostgreSQL driver: http://search.cpan.org/dist/DBD-Pg/ The only non-DBI interfaces I could find (by skimming the 384 results for postgresql on search.cpan.org) were: Postgres: http://search.cpan.org/dist/Postgres/ last updated in 1998. Pg: http://search.cpan.org/dist/pgsql_perl5/ last updated in 2000. Tim.
Peter Eisentraut wrote: > On fre, 2010-02-05 at 09:19 -0500, Bruce Momjian wrote: > > While I realize experienced people can easily navigate this confusion, > > I > > am concerned about new Postgres adopters being very confused by this > > and > > it is hurting our database adoption in general. > > > > What is really needed is for someone to take charge of one of these > > projects and make a best-of-breed Python driver that can gain general > > acceptance as our preferred driver. I feel Python is too important a > > language to be left in this state. > > The situation is unfortunate, but you might as well argue that too many > Linux desktops or Linux distributions confuse new users and hurt > adoption. These alternatives all exist for a reason, and it will be > difficult to get some of them to abandon their work with the aim of > reducing the overall "confusion". Agreed, but can't we do a better job of steering new users and improving the novice experience for the most popular of them, i.e. think Ubuntu. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tim Bunce wrote: > On Fri, Feb 05, 2010 at 09:19:26AM -0500, Bruce Momjian wrote: > > My son has brought to my attention that our current crop of Python > > client libraries is inadequate/confusing. I took a look myself, and > > asked on our IRC channel, and am now convinced this area needs > > attention. > > > > http://wiki.postgresql.org/wiki/Python > > The Python-hosted PostgreSQL page has similar problems: > > http://wiki.python.org/moin/PostgreSQL > > > > Does Perl have a similar mess? > > I don't think so. > > The primary database interface is DBI and as far as I can see there's > only one DBI PostgreSQL driver: http://search.cpan.org/dist/DBD-Pg/ > > The only non-DBI interfaces I could find (by skimming the 384 results > for postgresql on search.cpan.org) were: > Postgres: http://search.cpan.org/dist/Postgres/ last updated in 1998. > Pg: http://search.cpan.org/dist/pgsql_perl5/ last updated in 2000. Yes, that's what I thought, and am glad to here it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On fre, 2010-02-05 at 14:45 +0000, Tim Bunce wrote: > > Does Perl have a similar mess? > > I don't think so. > > The primary database interface is DBI and as far as I can see there's > only one DBI PostgreSQL driver: http://search.cpan.org/dist/DBD-Pg/ I think another difference is that the Perl DBI interface is very rich, whereas the Python DB-API is quite minimal and almost forces people to write (incompatible) extensions. The DB-SIG at Python that ought to drive all this is also quite dead, possibly because everyone has moved on to SQLAlchemy.
Peter Eisentraut wrote: > On fre, 2010-02-05 at 14:45 +0000, Tim Bunce wrote: > > > Does Perl have a similar mess? > > > > I don't think so. > > > > The primary database interface is DBI and as far as I can see there's > > only one DBI PostgreSQL driver: http://search.cpan.org/dist/DBD-Pg/ > > I think another difference is that the Perl DBI interface is very rich, > whereas the Python DB-API is quite minimal and almost forces people to > write (incompatible) extensions. The DB-SIG at Python that ought to > drive all this is also quite dead, possibly because everyone has moved > on to SQLAlchemy. I assumed it would be Perl that had confusion because of the Perl 6, but obviously it is Python. The second PL/Python server-side language that was proposed recently is taking on a whole new appearance for me. :-( -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> I think another difference is that the Perl DBI interface is very rich, > whereas the Python DB-API is quite minimal and almost forces people to > write (incompatible) extensions. The DB-SIG at Python that ought to > drive all this is also quite dead, possibly because everyone has moved > on to SQLAlchemy. For people who use Python a lot, could I have a list of the deficiencies in DBAPI? I've got my horse and lance ready. Given that SQLAlchemy isn't for everyone, of course ... it couldn't be, or Django would use it, no? --Josh Berkus
> The situation is unfortunate, but you might as well argue that too many > Linux desktops or Linux distributions confuse new users and hurt > adoption. These alternatives all exist for a reason, and it will be > difficult to get some of them to abandon their work with the aim of > reducing the overall "confusion". I'm not as concerned about "confusion" as the fact that *all* of the various Python drivers suck in different, and crippling, ways. I don't care how many drivers we have, as long as we have at least one 1st-class driver. --Josh Berkus
Josh Berkus wrote: > > > The situation is unfortunate, but you might as well argue that too many > > Linux desktops or Linux distributions confuse new users and hurt > > adoption. These alternatives all exist for a reason, and it will be > > difficult to get some of them to abandon their work with the aim of > > reducing the overall "confusion". > > I'm not as concerned about "confusion" as the fact that *all* of the > various Python drivers suck in different, and crippling, ways. I don't > care how many drivers we have, as long as we have at least one 1st-class > driver. Agreed, and we can point newbies to that one great driver. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Marko Kreen wrote: > Psycopg was the leader, especially in web-environments, > but it has non-obvious license and with dead website it does not > seem that attractive. Although it is well-maintained still. > > Best path forward would be to talk with Psycopg guys about > license clarification/change. > > I suspect GPL does not extend anyway to Python code that imports > it dynamically, and it does not seem to be their intention - they > even try to tell it in their current clarification, but it > is not as clear as it could be. Yes, this is a good analysis and has some good suggestions for moving forward. In summary, I don't think the current PG/Python driver situation is helping PG or Python, and I believe there are enough people who care about those two to get a group together to sort out a viable direction and start working toward the goal of improving things. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > While I realize experienced people can easily navigate this confusion... No, that's the worst part--the more you know and the deeper you dig into it, the more broken you realize the whole thing is. When one of the best drivers (in some respects) has a web page that looks like this: http://initd.org/ that doesn't seem so bad at first--but if you're experienced, you know that the page has been in that disturbing state since late 2006. You start digging into the driver mess, figure you just need to learn how things fit together, but the hole keeps getting bigger as you dig. The issues here have already been identified: the Perl DBI is an excellent spec, while the Python one is so weak everybody ends up needing their own extensions to it. And then portability *even among Python PostgreSQL drivers* goes out the window. If somebody built a BSD/MIT licensed driver that replaces every useful feature of all the forks, with no major problems, and a couple of major projects switched over to it (think "Skype" level big), maybe this mess could get resolved. I think it would take someone already familiar with the major issues involved a couple of months of regular work to make any serious progress on it. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith wrote: > Bruce Momjian wrote: > > While I realize experienced people can easily navigate this confusion... > > No, that's the worst part--the more you know and the deeper you dig into > it, the more broken you realize the whole thing is. When one of the > best drivers (in some respects) has a web page that looks like this: > http://initd.org/ that doesn't seem so bad at first--but if you're > experienced, you know that the page has been in that disturbing state > since late 2006. You start digging into the driver mess, figure you > just need to learn how things fit together, but the hole keeps getting > bigger as you dig. > > The issues here have already been identified: the Perl DBI is an > excellent spec, while the Python one is so weak everybody ends up > needing their own extensions to it. And then portability *even among > Python PostgreSQL drivers* goes out the window. If somebody built a > BSD/MIT licensed driver that replaces every useful feature of all the > forks, with no major problems, and a couple of major projects switched > over to it (think "Skype" level big), maybe this mess could get > resolved. I think it would take someone already familiar with the major > issues involved a couple of months of regular work to make any serious > progress on it. Yes, it is going to take serious time, and it is going to take more than one person to be involved, but I think there are enough people who care that something serious can be done to improve its current state. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, 2010-02-05 at 09:38 -0500, Bruce Momjian wrote: > Wow, that is super-confusing. Agreed. Standardization among licenses is useful, and I think it's important to have a driver with a license that people already understand. Regards,Jeff Davis
On 2/5/10, Josh Berkus <josh@agliodbs.com> wrote: > > I think another difference is that the Perl DBI interface is very rich, > > whereas the Python DB-API is quite minimal and almost forces people to > > write (incompatible) extensions. The DB-SIG at Python that ought to > > drive all this is also quite dead, possibly because everyone has moved > > on to SQLAlchemy. > > > For people who use Python a lot, could I have a list of the deficiencies > in DBAPI? I've got my horse and lance ready. I took quick glance on DBI and compared to that, DB-API does not define specific api for: - resultset cursors - prepared plans - db structure examination OTOH, my guess is that DB-API authors assumed the first two are used transparently by the driver, and I see no reason why they cannot be, with current DB-API. Last item is "rich" indeed, but seems slightly overengineered.. Now, none of the differences between drivers and current confusion seem to be related to above points. "Hysterical Raisins" is only good explanation about current situation. The pg8000 / bpgsql seem to be toy projects, and anyway you dont want to use pure-Python drivers in high-performance environments. We are not talking about C#/java here. py-postgresql seems to be more serious, but as it's python3 only which makes it irrelevant today. PyGreSQL is the oldest, older than DB-API, and so it's DB-API interface seems an afterthought and is untested/underused - eg. it does not support bytea. Psycopg was the leader, especially in web-environments, but it has non-obvious license and with dead website it does not seem that attractive. Although it is well-maintained still. Best path forward would be to talk with Psycopg guys about license clarification/change. I suspect GPL does not extend anyway to Python code that imports it dynamically, and it does not seem to be their intention - they even try to tell it in their current clarification, but it is not as clear as it could be. -- marko
> Imho a big problem is that it does way too much itself - i.e. it does not use > things like PQExecParams but does escaping/parsing itself... > Other people may think thats a good idea - I definitely do not think so. It also has issues with transaction control which cause idle transactions if the Django front-end times out due to load. --Josh Berkus
On Friday 05 February 2010 21:34:53 Marko Kreen wrote: > On 2/5/10, Josh Berkus <josh@agliodbs.com> wrote: > > > I think another difference is that the Perl DBI interface is very > > > rich, whereas the Python DB-API is quite minimal and almost forces > > > people to write (incompatible) extensions. The DB-SIG at Python that > > > ought to drive all this is also quite dead, possibly because everyone > > > has moved on to SQLAlchemy. > > > > For people who use Python a lot, could I have a list of the deficiencies > > > > in DBAPI? I've got my horse and lance ready. > Psycopg was the leader, especially in web-environments, > but it has non-obvious license and with dead website it does not > seem that attractive. Although it is well-maintained still. Imho a big problem is that it does way too much itself - i.e. it does not use things like PQExecParams but does escaping/parsing itself... Other people may think thats a good idea - I definitely do not think so. Andres
On 2/5/10, Greg Smith <greg@2ndquadrant.com> wrote: > The issues here have already been identified: the Perl DBI is an excellent > spec, while the Python one is so weak everybody ends up needing their own > extensions to it. And then portability *even among Python PostgreSQL > drivers* goes out the window. Well, no. You are overplaying on the DBI angle. If the driver can offer via the minimal DB-API the same features the DBI driver can offer, then the DBI API is bloated, not the minimal API weak... Not that DB-API is perfect, a way to toggle prepare and db cursors on/off in per-query basis would be good to have, but thats mostly it. Rest of extra features (COPY) need to be done as extensions anyway. But all that is off-topic to the current driver suckiness. The problems with Python drivers are that they fail to implement even the minimal DB-API 2.0: - correctly, across all interesting data types. PygreSQL (bytea). pg8000 (does rollback after query fails - WTF). None ofthe drivers seem to support both array and tuple. - robustly. Psycopg2 - their misguided attempt to turn DB-API calls optionally async cost their stability big-time as itcomplexified their codebase. It got stable around 2.0.[678]. I'm not holding my breath on the other ones. Fear the fancyAPI-s. - with threadsafety level > 1. (I don't care, but seems in some circles it's necessary.) Note how many ways a driver can suck without stepping above basic DB-API 2.0. > If somebody built a BSD/MIT licensed driver > that replaces every useful feature of all the forks, with no major problems, > and a couple of major projects switched over to it (think "Skype" level > big), maybe this mess could get resolved. I think it would take someone > already familiar with the major issues involved a couple of months of > regular work to make any serious progress on it. Yeah, but the problem here is that there actually is not a problem. Psycopg license is not bad, just confusing on first sight. And website problem can be fix by advertisinghttp://wiki.postgresql.org/wiki/Psycopg as new website... If your requirement is BSD license, then probably getting PyGreSQL into shape would be faster than writing from scratch. Main dev effort would be writing Postgres-specific DB-API 2.0 testsuite anyway. It's likely that actual improvements needed would be small. Also I would suggest to match Psycopg extensions API, that would make it usable to much broader audience. (Eg: PygreSql does not offer COPY extensions via the DB-API 2.0 API. It would be good to match Psycopg here, instead inventing new API.) -- marko
On Feb 5, 2010, at 1:34 PM, Marko Kreen wrote: > py-postgresql seems to be more serious, but as it's python3 only > which makes it irrelevant today. Furthermore, if it did work on python2, it's *not* something that's going to appeal to mainstream users (Python heavy webframeworks) as it *partially* suffers from the same problem that pg8000 does. It's mostly pure-Python, but it has someC optimizations(notably, PQ message buffer). I have done some profiling, and *with a few tweaks* it's about 2x-3x *slowerthan psycopg2* for the retrieval of a single int column. I think it could go faster, but I don't think it's worththe work. ISTM that the target audience are folk who are married to PG, and are generally unhappy with DB-API, but do not want to buyinto a "big" abstraction layer like SQLAlchemy. Sure, it supports DB-API like other drivers so it *would be* usable withframeworks, but why take the 3x *or greater* hit over a properly implemented libpq version? Finally, I just don't see the existing (often PG specific) goals that I have in mind for it appealing to the majority of[web framework/abstraction] users. > Psycopg was the leader, especially in web-environments, > but it has non-obvious license and with dead website it does not > seem that attractive. Although it is well-maintained still. > > Best path forward would be to talk with Psycopg guys about > license clarification/change. Yep.
On Feb 5, 2010, at 11:34 AM, Josh Berkus wrote: > For people who use Python a lot, could I have a list of the deficiencies > in DBAPI? I've got my horse and lance ready. > > Given that SQLAlchemy isn't for everyone, of course ... it couldn't be, > or Django would use it, no? Here are some to start with: - paramstyle- No direct support of prepared statements [they *tried* to compensate for this with "cached statements", butit's inadequate]- Too many *optional* extensions (Cursor Method .__iter__() being rather notable) http://www.python.org/dev/peps/pep-0249/
On Feb 5, 2010, at 8:00 AM, Peter Eisentraut wrote: > I think another difference is that the Perl DBI interface is very rich, > whereas the Python DB-API is quite minimal and almost forces people to > write (incompatible) extensions. Yep. > The DB-SIG at Python that ought to drive all this is also quite dead, From reading messages that come into that list(not-so-lately), I've gotten the impression that they are content with DB-API2. Aside from the TPC interfaces, the last round of activity that I know of was dbapi3[1 (which was a long while ago)]. > possibly because everyone has moved on to SQLAlchemy. Yeah. AFAICT, that is the general direction encouraged by DB-SIG. Write an abstraction on top of DB-API. SQLAlchemy, SQLObject,anysql, and I'm sure there are others. [1] http://wiki.python.org/moin/Aug2001DbApi3Strawman http://wiki.python.org/moin/DbApi3
On Fri, 2010-02-05 at 10:35 -0800, Josh Berkus wrote: > I'm not as concerned about "confusion" as the fact that *all* of the > various Python drivers suck in different, and crippling, ways. I don't > care how many drivers we have, as long as we have at least one 1st-class > driver. Absolutely. And I would prefer that it can provide all (or nearly all) of the capabilities of libpq. PyGreSQL apparently doesn't even offer parameterized queries! http://pygresql.org/pg.html That was based on a quick glance at the document; I hope I'm mistaken. Regards,Jeff Davis
Marko Kreen wrote: > The pg8000 / bpgsql seem to be toy projects, and anyway you dont > want to use pure-Python drivers in high-performance environments. > We are not talking about C#/java here. > Right, and the comments from James reinforce this general idea: there is little value to the people who most want Python+PostgreSQL support in working on any of the pure Python implementations, because best case performance is still half or less of the ones that more directly wrap libpq. Even the best case with psycopg is enough of a performance hit as it is. > py-postgresql seems to be more serious, but as it's python3 only > which makes it irrelevant today. > Also true. > Psycopg was the leader, especially in web-environments, > but it has non-obvious license and with dead website it does not > seem that attractive. Although it is well-maintained still. > Best path forward would be to talk with Psycopg guys about > license clarification/change. > Agreed. A relicensed Psycopg, with a more professional looking introduction/documentation page (like the budding page on our Wiki) than what the initd web site has, seems like the best platform to hack on top of to me as well. The fact that they've moved to git recently makes it that much easier for another branch to exist even outside of their somewhat troubled infrastructure. To summarize what I saw on this thread, the primary wishlist of changes to it are: -License change -Consider refactoring to better follow standard driver practices, such as using PQExecParams -Improvement in transaction control to resolve issues that cause idle transactions -Possible simplifications in how it's implemented async operations, to improve robustness/reduce code complexity -Confirm/add multi-threaded support -Confirm/add support for the most common standard types (such as array) > PyGreSQL is the oldest, older than DB-API, and so it's DB-API > interface seems an afterthought and is untested/underused - eg. > it does not support bytea. > And if Psycopg can't be relicensed happily and/or improved as above, as the only other native Python driver PyGreSQL looks like the next candidate to build on top of. Its major issues are: -Test/complete/refactor for full DB-API 2.0 support -Add bytea support -Add extension support, perhaps modeled on what Psycopg. -Build a COPY extension -Confirm/add multi-threaded support -Confirm/add support for the most common standard types (such as array) Any other suggestions before I turn the above into a roadmap page on the wiki? -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > To summarize what I saw on this thread, the primary wishlist of changes > to it are: > > -License change > -Consider refactoring to better follow standard driver practices, such > as using PQExecParams > -Improvement in transaction control to resolve issues that cause idle > transactions > -Possible simplifications in how it's implemented async operations, to > improve robustness/reduce code complexity > -Confirm/add multi-threaded support > -Confirm/add support for the most common standard types (such as array) > > > PyGreSQL is the oldest, older than DB-API, and so it's DB-API > > interface seems an afterthought and is untested/underused - eg. > > it does not support bytea. > > > And if Psycopg can't be relicensed happily and/or improved as above, as > the only other native Python driver PyGreSQL looks like the next > candidate to build on top of. Its major issues are: > > -Test/complete/refactor for full DB-API 2.0 support > -Add bytea support > -Add extension support, perhaps modeled on what Psycopg. > -Build a COPY extension > -Confirm/add multi-threaded support > -Confirm/add support for the most common standard types (such as array) > > Any other suggestions before I turn the above into a roadmap page on the > wiki? Great summary! This is what I was hoping for. One suggestion on the license issue is that the LGPL seems like the type of license intended by the author, and I think it is an acceptable license to most client application programmers. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 2/7/10, Greg Smith <greg@2ndquadrant.com> wrote: > Marko Kreen wrote: > > Psycopg was the leader, especially in web-environments, > > but it has non-obvious license and with dead website it does not > > seem that attractive. Although it is well-maintained still. > > Best path forward would be to talk with Psycopg guys about > > license clarification/change. > > Agreed. A relicensed Psycopg, with a more professional looking > introduction/documentation page (like the budding page on our Wiki) than > what the initd web site has, seems like the best platform to hack on top of > to me as well. The fact that they've moved to git recently makes it that > much easier for another branch to exist even outside of their somewhat > troubled infrastructure. > > To summarize what I saw on this thread, the primary wishlist of changes to > it are: > > -License change Note that as long they use non-BSD licence, they need the exception for the horror that is the OpenSSL licence. So you cannot 100% satisfy the one-click-lawyers. > -Consider refactoring to better follow standard driver practices, such as > using PQExecParams This is long-term todo item for psycopg, seems offtopic to the "driver situation". > -Improvement in transaction control to resolve issues that cause idle > transactions This is routine bug in either app or psycopg, we have no reason to touch it. The guy should report to appropriate lists. > -Possible simplifications in how it's implemented async operations, to > improve robustness/reduce code complexity Long-term todo item for psycopg2, offtopic for "driver situation". > -Confirm/add multi-threaded support. Seems psycopg2 already has good enough threading. > -Confirm/add support for the most common standard types (such as array) psycopg2 has array support, I'd like to have tuple/record also. Minor todo item for psycopg, mostly but not completely offtopic for "driver situation". > > PyGreSQL is the oldest, older than DB-API, and so it's DB-API > > interface seems an afterthought and is untested/underused - eg. > > it does not support bytea. > > > > > And if Psycopg can't be relicensed happily and/or improved as above, as the > only other native Python driver PyGreSQL looks like the next candidate to > build on top of. Its major issues are: > > -Test/complete/refactor for full DB-API 2.0 support > -Add bytea support timestamp/timestamptz also... > -Add extension support, perhaps modeled on what Psycopg. > -Build a COPY extension They seems to have the low-level parts, just they are missing from DB-API interface. > -Confirm/add multi-threaded support > -Confirm/add support for the most common standard types (such as array) > > Any other suggestions before I turn the above into a roadmap page on the > wiki? I think we should concentrate on the PR problem and technical issues related to that, keep the other low-level and non-user-visible issues out. Or at least separate. (PsycopgTodo wiki page?) -- marko
> Finally, I just don't see the existing (often PG specific) goals that I have in mind for it appealing to the majority of[web framework/abstraction] users. What are those goals? --Josh Berkus
On Sat, Feb 6, 2010 at 7:48 PM, Marko Kreen <markokr@gmail.com> wrote: > This is long-term todo item for psycopg, seems offtopic > to the "driver situation". [...] > This is routine bug in either app or psycopg, we have no reason > to touch it. The guy should report to appropriate lists. [...] > Long-term todo item for psycopg2, offtopic for "driver situation". [...] > psycopg2 has array support, I'd like to have tuple/record also. > > Minor todo item for psycopg, mostly but not completely offtopic > for "driver situation". I'm not a Python user myself, but I have trouble understanding how you can describe bugs in one of the Python drivers as off-topic to the Python driver situation. ...Robert
On Feb 6, 2010, at 5:51 PM, Josh Berkus wrote: >> Finally, I just don't see the existing (often PG specific) goals that I have in mind for it appealing to the majorityof [web framework/abstraction] users. > > What are those goals? I think the most interesting one that has yet to be implemented is fast, multiple destination COPY support. Currently, COPYis supported, but a bytes() object is allocated for each row, so it's currently not efficient for moving data(pg-to-pgETL sans the T? =). While some C is still needed to make it properly efficient, it's primarily to keep trackof the COPY's state and to update stats. This is pretty useless to a django user... Well, I suppose it might be interestingif COPY OUT could target [or transform into] JSON, but idk... The general, ongoing goal is to implement and document *convenient* Python interfaces to PostgreSQL features. A, perhapsuninteresting, case being "supporting" advisory locks. I was thinking a context manager, but it might just be somethingas trivial as an additional method on the connection(very simple/direct binding). Some "fuzzy" goals: twisted support, some asynchronous interfaces, and greater user control over type I/O. The first, twisted,mostly interests me as an exercise. The second, async interfaces, scares me as it took me some time just to feel"not unhappy" with the blocking APIs. The third will probably happen, but it's going to be a while. I also have some goals not directly related to a driver. postgresql.unittest is currently only used internally, but I hopeto document it some day soon so that people can write Python unittest.TestCase's that auto-magically build out a targetcluster(~pg_regress/pgTap for Python?). Well, it works, but it's not documented and the APIs haven't been given muchthought. Generally, basic cluster management tools for Python. (At one point I tried to write a programmer's HBA editor,but I think I hurt myself trying to figure out rule reduction.. That is, it was trying to be smarter than "insert/deleterule at position x".) Well, these are the ones that come to mind, atm, but I don't think there's much beyond them.
On Fri, 2010-02-05 at 09:19 -0500, Bruce Momjian wrote: > My son has brought to my attention that our current crop of Python > client libraries is inadequate/confusing. I took a look myself, and > asked on our IRC channel, and am now convinced this area needs > attention. I have written up a set of guidelines for driver development based on what I learned working on ruby-pg: http://wiki.postgresql.org/wiki/Driver_development Whether we take one of the existing projects and improve upon it, or start a complete rewrite, I hope these guidelines will be a useful destination. Note that the ruby-pg driver doesn't 100% adhere to those standards (encoding is the primary problem, and that will be fixed). I would appreciate comments by anyone (Greg Sabino Mullane: I included you in the CC because I thought you may have some input). Even if the python driver doesn't go in that direction, it will help me improve ruby-pg. Regards,Jeff Davis
Marko Kreen wrote: > I think we should concentrate on the PR problem and technical issues > related to that, keep the other low-level and non-user-visible > issues out. Or at least separate. (PsycopgTodo wiki page?) > That's just a matter of prioritizing the issues. Put the big ones at the top, the trivia at the bottom, and if you knock stuff of there somewhere along the way you discover you've made enough progress that the PR stuff starts going away, because people are able to get their work done with less drama. Here's a full TODO page that includes everything mentioned here as best I could summarize it: http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO Looks like the first action item is to talk with the Psycopg people about their license. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On 2/7/10, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Feb 6, 2010 at 7:48 PM, Marko Kreen <markokr@gmail.com> wrote: > > This is long-term todo item for psycopg, seems offtopic > > to the "driver situation". > > [...] > > > This is routine bug in either app or psycopg, we have no reason > > to touch it. The guy should report to appropriate lists. > > [...] > > > Long-term todo item for psycopg2, offtopic for "driver situation". > > [...] > > > psycopg2 has array support, I'd like to have tuple/record also. > > > > Minor todo item for psycopg, mostly but not completely offtopic > > for "driver situation". > > > I'm not a Python user myself, but I have trouble understanding how you > can describe bugs in one of the Python drivers as off-topic to the > Python driver situation. I thought the topic was "Confusion over Python drivers"? The only bug there was likely app one, or at least its not widespread so off-topic. Rest are more like non-essential cool features, so again off-topic. -- marko
Marko,
Those lack of "non-essential cool features" is right on topic - because what one developer may see as "non-essential" is the most important feature for the next developer.
Exactly these kind of issues are the source of the confusion we now phase: developer B needing feature X, which was ignored by driver A.
So I concur to put them on the discussion agenda; if they drop out of priority, at least it is documented WHY and will save other developers these thoughts.
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
I thought the topic was "Confusion over Python drivers"?
The only bug there was likely app one, or at least its not widespread
so off-topic. Rest are more like non-essential cool features,
so again off-topic.
Those lack of "non-essential cool features" is right on topic - because what one developer may see as "non-essential" is the most important feature for the next developer.
Exactly these kind of issues are the source of the confusion we now phase: developer B needing feature X, which was ignored by driver A.
So I concur to put them on the discussion agenda; if they drop out of priority, at least it is documented WHY and will save other developers these thoughts.
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
>> I'm not a Python user myself, but I have trouble understanding how you >> can describe bugs in one of the Python drivers as off-topic to the >> Python driver situation. > > I thought the topic was "Confusion over Python drivers"? > > The only bug there was likely app one, or at least its not widespread > so off-topic. Rest are more like non-essential cool features, > so again off-topic. Not at all, except that maybe this discussion belongs on -advocacy rather than here. Driver quality is partly performance and stability, but also partly what features it implements and how well it implements them. *particularly* for PostgreSQL, which is a database with lots of cool features; if the driver doesn't support our cool features, then they don't matter to Python developers. Anyway, I don't yet have a full diagnosis on the transaction control issue or I'd already have posted it to psycopg -- it may be a toxic interaction between Django and Psycopg2 rather than psycopg2 alone. I'd not have brought it up except for this discussion. --Josh Berkus
Greg Smith wrote: > Here's a full TODO page that includes everything mentioned here as > best I could summarize it: > http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO > > Looks like the first action item is to talk with the Psycopg people > about their license. Oh: and I'm going to take care of this. License changes can be a very sensitive topic and I'm told that discussion probably needs to happy in Italian too; I can arrange that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Josh Berkus wrote: > Anyway, I don't yet have a full diagnosis on the transaction control > issue or I'd already have posted it to psycopg -- it may be a toxic > interaction between Django and Psycopg2 rather than psycopg2 alone. I'd > not have brought it up except for this discussion. > I'm going to remove it from the list on the wiki then for now. I don't want to annoy the developers by adding a more speculative bug that might not even be in their software. If you get to where it's confirmed and info posted to their list, please add a link back into the page once it's reported, i.e. link to their mailing list archives or something like that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
>Any other suggestions before I turn the above into a roadmap page on the >wiki? I got sick of the constant stream of escaping bugs impacting on psycopg and pyPgSQL, and wrote my own DB-API driver, using the more modern libpq/binary/protocol 3 APIs where ever possible. The result is BSD licensed: http://code.google.com/p/ocpgdb/ As well as using the newer APIs, I have attempted to keep the code as simple as possible, eschewing things like threading as adding too much complexity for too little gain (particularly true of Python threading), and I kept to just the code DB-API functionality. The C code exists mainly to present a pythonic view of libpq. I found that type conversion and marshalling could generally be done from python with more than acceptable performance (via the C-coded "struct" module in the standard library for common types). In my tests, ocpgdb has performed at least as well as pyPgSQL and psycopg, often a lot better, primarily due to the use of the libpq binary protocols, I think. I'm not proposing my module as your canonical implementation, although you're welcome to it if you like. Rather, it demonstrates another viable approach, minimal, and using newer libpq APIs. BTW, with respect to the discussion of the Python DB-API - I see it as specifying a lowest-common-denominator, or the subset of functionality that should be available from most databases without requiring contortions. Like eating at McDonalds, it does the job, but it's never going to delight or surprise. A PostGreSQL blessed adapter really should provide access to all the features in libpq, and I'm not sure this is directly compatible with DBAPI. Instead, the DBAPI-compliance should be layered on top. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Andrew McNamara wrote: > I got sick of the constant stream of escaping bugs impacting on psycopg > and pyPgSQL, and wrote my own DB-API driver, using the more modern > libpq/binary/protocol 3 APIs where ever possible. The result is BSD > licensed: > http://code.google.com/p/ocpgdb/ > I added you into the list at http://wiki.postgresql.org/wiki/Python Can you check what I put in there, confirm Windows compatibility, and comment on Python 3.X support? I'd be curious to hear more about the escaping bugs you ran into as well. We already have some notes on the TODO that pushing more of this work toward the standard libpq routines would seem appropriate for things passing between the driver and libpq. Were the issues you ran into on that side, or more on the Python side of how things were being formatted? -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
>I added you into the list at http://wiki.postgresql.org/wiki/Python Thanks. >Can you check what I put in there, confirm Windows compatibility, and >comment on Python 3.X support? I haven't tried it under Windows and I haven't had any feedback either way from Windows users. For now, ocpgdb has no Python 3 support (I don't foresee any real problems, however). >I'd be curious to hear more about the escaping bugs you ran into as well. >We already have some notes on the TODO that pushing more of this work >toward the standard libpq routines would seem appropriate for things >passing between the driver and libpq. Were the issues you ran into on >that side, or more on the Python side of how things were being formatted? It was a while ago now and I can't remember the specific details - it was more a general feeling that the existing offerings were going about it the wrong way (with respect to parameter passing and escaping). I suspect this was a historical artifact (presumably libpq didn't provide escaping facilities or parameterised queries when the adapters were written). Essentially, I just wanted a pyPgSQL with a more modern implementation. Psycopg was (is?) also using Protocol 2. I felt that the way forward was to switch to the Protocol 3 API features, in particular, parameterised queries, and none of the existing Python adapters had done that (I got the impression while writing my module that nobody was exercising the new features). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
On Mon, 2010-02-08 at 12:25 +1100, Andrew McNamara wrote: > For now, ocpgdb has no Python 3 support (I don't foresee any real > problems, however). Encoding issues are the big one. There are a couple gotchas, and I provided the details here: http://wiki.postgresql.org/wiki/Driver_development#Text_Encoding Regards,Jeff Davis
<br />>The pg8000 / bpgsql seem to be toy projects, and anyway you dont<br />>want to use pure-Python drivers in high-performanceenvironments.<br /><br />I agree that there are some performance-challenges with pure-Python drivers.<br/><br />And we should not forget to look for the reasons for the incubation of that many pure-Python drivers:<br/><br />a) Python is no longer one-language, one-implementation. There are (at least) cPython (the original),Jython (on JVM), IronPython (from Microsoft on CLR), PyPy (Python on Python), Unladen Swallow (from Google on LLVM).In addition the nearly-Pythons as in Cython, RPython and ShedSkin<br /><br />Everything apart from cPython (and possibleUnladenSwallow) has its challenges dealing with non-Python extension modules. From a developer standpoint it canbe tempting to be able to rely on the same database adapter across more then one implementation<br /><br />b) the stabilizationof an Python Application Binary Interface is in early discussion stage; meaning: it will take some time untilla non-Python extension can be usable across Python versions. c-Extensions are allways a major stumbling block on Python-nto Python-(n+1) versions<br /><br />c) Stability. Python code is same-on-same more robust then C-Code, as some ofthe crash-friendly-problems are eliminated (you cannot allocate memory wrongly within Python, you cannot errorly accessmemory cross array boundaries...)<br /><br />especially a) is a point to consider when standardizing on a PostgreSQLblessed Python-Postgresql-driver. How will the blessing extend to Jython / Ironpython / PyPy?<br /><br />Harald<br/>-- <br />GHUM Harald Massa<br />persuadere et programmare<br /> Harald Armin Massa<br />Spielberger Straße 49<br/>70435 Stuttgart<br />0173/9409607<br />no fx, no carrier pigeon <br />-<br />%s is too gigantic of an industry tobend to the whims of reality<br />
Massa, Harald Armin wrote: > I agree that there are some performance-challenges with pure-Python > drivers. > And we should not forget to look for the reasons for the incubation of > that many pure-Python drivers: > a) Python is no longer one-language, one-implementation. There are (at > least) cPython (the original), Jython (on JVM), IronPython (from > Microsoft on CLR), PyPy (Python on Python), Unladen Swallow (from > Google on LLVM). In addition the nearly-Pythons as in Cython, RPython > and ShedSkin... > especially a) is a point to consider when standard, it's getting one > driver that satisfies the needs of the people most like > izing on a PostgreSQL blessed Python-Postgresql-driver. How will the > blessing extend to Jython / Ironpython / PyPy? The point isn't so much "standardizing". Having a low performance Python driver turns into a PostgreSQL PR issue. Last thing we need is the old "PostgreSQL is slow" meme to crop back up again via the Python community, if the driver suggested by the community isn't written with performance as a goal so that, say, PostgreSQL+Python looks really slow compared to MySQL+Python. And if you're writing a database driver with performance as a goal, native Python is simply not an option. Now, once *that* problem is under control, and there's a nicely licensed, well documented, major feature complete, and good performing driver, at that point it would be completely appropriate to ask "what about people who want support for other Python platforms and don't care if it's slower?". And as you say, nurturing the "incubation" of such drivers is completely worthwhile. I just fear that losing focus by wandering too far in that direction, before resolving the main problem here, is just going to extend resolving the parts of the Python driver situation I feel people are most displeased with. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Hi there, Greg Smith ha scritto: >> Looks like the first action item is to talk with the Psycopg people >> about their license. > > Oh: and I'm going to take care of this. License changes can be a > very sensitive topic and I'm told that discussion probably needs to > happy in Italian too; I can arrange that. > I can try and help with this issue, given my role with the Italian PostgreSQL community and PostgreSQL business with 2ndQuadrant Italia. I have met Psycopg's developer a couple of times at open-source conferences. I have great respect for his work and his contribution in the open-source community, and I will be very happy to try and explain the situation to him. I will keep you posted. Ciao, Gabriele -- Gabriele Bartolini - 2ndQuadrant ItaliaPostgreSQL Training, Services and Supportgabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it
Greg,
I totally agree.
>And if you're writing a database driver with performance as a goal, native Python is simply not >an option.
yes. Additionally: performance is not the only challenge. A native Python implementation, without using libpq, will have to reimplement much of libpq - just let me isolate proper escaping, and will have its own bugs.
Pure Pythondrivers do exist now; and they are allready discussed in the summaries - which is a good thing. With my remarks I just want to recommend that we at least should document a position for them; and a way ahead. And I need a place to point out that Python grew a FFI with ctypes. Maybe someone will think of a DBAPI2.0 compatible ctypes libpq wrapper ...
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
The point isn't so much "standardizing". Having a low performance Python driver turns into a PostgreSQL PR issue.
I totally agree.
>And if you're writing a database driver with performance as a goal, native Python is simply not >an option.
yes. Additionally: performance is not the only challenge. A native Python implementation, without using libpq, will have to reimplement much of libpq - just let me isolate proper escaping, and will have its own bugs.
Now, once *that* problem is under control, and there's a nicely licensed, well documented, major feature complete, and good performing driver, at that point it would be completely appropriate to ask "what about people who want support for other Python platforms and don't care if it's slower?".
Pure Pythondrivers do exist now; and they are allready discussed in the summaries - which is a good thing. With my remarks I just want to recommend that we at least should document a position for them; and a way ahead. And I need a place to point out that Python grew a FFI with ctypes. Maybe someone will think of a DBAPI2.0 compatible ctypes libpq wrapper ...
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > I have written up a set of guidelines for driver development > based on what I learned working on ruby-pg: > > http://wiki.postgresql.org/wiki/Driver_development ... > I would appreciate comments by anyone (Greg Sabino Mullane: I included > you in the CC because I thought you may have some input). Good page. I looked it over but have nothing to add at the moment. I may do so later once my head is in dbdpg mode (working on other project at the moment :) - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201002080931 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAktwIEYACgkQvJuQZxSWSsjczQCgkU5b6iHPREJYMtAdWlFRDkYI cS4An3AMyc+O06HzN8MYkfq8HG62371y =+WCV -----END PGP SIGNATURE-----
* Jeff Davis: > I have written up a set of guidelines for driver development based on > what I learned working on ruby-pg: > > http://wiki.postgresql.org/wiki/Driver_development Interesting, thanks. I'm contemplating to create a new language binding for libpq (or, to be more precise, turn an existing language binding into something that can be published). I've been agonizing a bit over how to create a bridge between the host language type system and the PostgreSQL type system. If I understand you correctly, you suggest to leave everything as strings. This solution has the appeal of being implemented easily. It also sidesteps a lot of issues revolving around different representation choices for numbers. Do you really suggest to preserve the PQexecParams API verbatim, that is, passing in three arrays containing type, value, and format? That seems to be a bit problematic. I suspect the common case will be to use unknown types, text format, and the default conversion from values to strings. Only for BYTEA values, something else is required, and I'm wondering how to encode that (the host language doesn't offer a distinction between text and binary strings). Conversely, for result sets, I'm tempted to transparently decode escaped BYTEA columns. > Note that the ruby-pg driver doesn't 100% adhere to those standards > (encoding is the primary problem, and that will be fixed). Lack of Unicode support means that I can punt that to application authors, I guess. By the way, the downside of using strings everywhere is that your binding API will most likely not work with SQLite (or any other SQL-like database which lacks column type information).
* Andrew McNamara: >>Any other suggestions before I turn the above into a roadmap page on the >>wiki? > > I got sick of the constant stream of escaping bugs impacting on psycopg > and pyPgSQL, and wrote my own DB-API driver, using the more modern > libpq/binary/protocol 3 APIs where ever possible. The result is BSD > licensed: > > http://code.google.com/p/ocpgdb/ I saw your note that you have to specify the types for date values etc. Is this really desirable or even necessary? Can't you specify the type as unknown (OID 705, I believe)? At work, we recently used to typelessness of Perl's DBD::Pg with great effect, introducing a more compact, type-safe representation for a few columns, without having to change all the existing Perl scripts accessing the database. That's why I'm wondering... (And we might be using Python instead of Perl today. Lack of a decent PostgreSQL module for Python meant it was very hard to argue against using Perl ...)
On Mon, 2010-02-08 at 20:29 +0100, Florian Weimer wrote: > I'm contemplating to create a new language binding for libpq (or, to > be more precise, turn an existing language binding into something that > can be published). I've been agonizing a bit over how to create a > bridge between the host language type system and the PostgreSQL type > system. If I understand you correctly, you suggest to leave > everything as strings. This solution has the appeal of being > implemented easily. It also sidesteps a lot of issues revolving > around different representation choices for numbers. Agreed. Ultimately, the conversion has to be done somewhere, but I don't believe the driver is the place for it. Type conversions are always going to be imperfect, and this has some important consequences:* The type conversion system will be endlessly tweaked toimprove it* Developers will always run into problems with it in any complex application, so we need to allow them to circumvent the system and do it themselves when necessary. Both of these things point to another layer on top of the driver itself. It could be some extra convenience functions that come with the driver, or an entirely separate layer (like ActiveRecord). But if we always let the developer have access to the full power of libpq, it limits the damage that can be done by a slightly-too-creative API on top of it. > Do you really suggest to preserve the PQexecParams API verbatim, that > is, passing in three arrays containing type, value, and format? That > seems to be a bit problematic. I suspect the common case will be to > use unknown types, text format, and the default conversion from values > to strings. I tried to address this specifically in the document: "For example: it should be easy to pass parameters so that PQexecParams (and others) can be used, avoiding SQL injection risks. The important thing is to maintain close to a one-to-one mapping between libpq and the driver's API, and to provide all of the functionality of libpq." In ruby-pg, you can just do: conn.exec("INSERT INTO foo VALUES($1)", ["Jeff"]) And I think that's appropriate. What I'm saying is that there should still exist some way to pass explicit types or formats (although that should still be easier than it is in C ;). Here's the long form: conn.exec("INSERT INTO foo VALUES($1)", [{:value => "Jeff", :format => 0, :type => 0}]) The nice thing about that format is that you can do the "easy" thing for most of the parameters in a query, but then choose binary format for that one BYTEA parameter. That's because, in ruby, you can mix strings and hashes in the same array. So I'm not saying we should make everyone code ruby that looks like C. I'm saying that the job of the driver is to provide full access to libpq, and anything beyond that should be an optional convenience routine, and should be free of magic and cleverness (that's the job of a higher layer). > Conversely, for result sets, I'm tempted to transparently decode > escaped BYTEA columns. Consider the following ruby-pg program, where you have two empty tables foo and bar, each with a single BYTEA column "b": conn = PGconn.connect(...) conn.exec("INSERT INTO foo VALUES($1)", ["\\\\000"]) # copy the single value in foo into bar val = conn.exec("SELECT b FROM foo LIMIT 1")[0]["b"] conn.exec("INSERT INTO barVALUES($1)", [val]) That copies value so that foo and bar have the same contents: a 4 byte value "\000". What would happen though, if val was transparently decoded? It would decode it once in ruby, and again inside of postgres (in byteain), leaving you with a one byte value in bar, even though foo has a four-byte value. I really think that only higher layers should implement that kind of magic, no matter how "obvious" it may seem that the user wants something extra. > > Note that the ruby-pg driver doesn't 100% adhere to those standards > > (encoding is the primary problem, and that will be fixed). > > Lack of Unicode support means that I can punt that to application > authors, I guess. Ruby 1.9+ and Python 3.0+ both have string encoding models that can't just be ignored. We could punt it by always returning byte sequences rather than strings, but I think that's a particularly extreme version of my philosophy of not trying to convert between types. > By the way, the downside of using strings everywhere is that your > binding API will most likely not work with SQLite (or any other > SQL-like database which lacks column type information). I am trying to develop standards suitable for PostgreSQL drivers based on libpq. These are not meant to be standards for a database-agnostic API, standards for a high-level database adapter, or even standards for a driver written against something other than libpq (like the JDBC driver). Thank you for your comments. I will try to integrate these thoughts into the document. Regards,Jeff Davis
>> http://code.google.com/p/ocpgdb/ > >I saw your note that you have to specify the types for date values >etc. Is this really desirable or even necessary? Can't you specify >the type as unknown (OID 705, I believe)? > >At work, we recently used to typelessness of Perl's DBD::Pg with great >effect, introducing a more compact, type-safe representation for a few >columns, without having to change all the existing Perl scripts >accessing the database. That's why I'm wondering... I can't see how this would work with binary query parameters - the server will see a blob of binary data and have no way to know what it represents. I presume DBD::Pg is using text parameters, rather than binary. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
On Tue, 2010-02-09 at 09:15 +1100, Andrew McNamara wrote: > I can't see how this would work with binary query parameters - the server > will see a blob of binary data and have no way to know what it represents. Unknown is unknown, whether in binary or text format. As far as I know, PostgreSQL never looks inside a literal of unknown type to try to determine its type -- it only looks at the context (to what function is it an argument?). For instance: SELECT '5'; -- has no idea what type it is SELECT '5' + 1; -- it's an int SELECT 'a' + 1; -- it's still an int ERROR: invalid input syntax for integer: "a" LINE 1: SELECT 'a' + 1; SELECT '5.0' + 1; -- still an int, bad input format ERROR: invalid input syntax for integer: "5.0" LINE 1: SELECT '5.0'+ 1; Regards,Jeff Davis
>On Tue, 2010-02-09 at 09:15 +1100, Andrew McNamara wrote: >> I can't see how this would work with binary query parameters - the server >> will see a blob of binary data and have no way to know what it represents. > >Unknown is unknown, whether in binary or text format. As far as I know, >PostgreSQL never looks inside a literal of unknown type to try to >determine its type -- it only looks at the context (to what function is >it an argument?). > >For instance: > > SELECT '5'; -- has no idea what type it is > > SELECT '5' + 1; -- it's an int > > SELECT 'a' + 1; -- it's still an int > ERROR: invalid input syntax for integer: "a" > LINE 1: SELECT 'a' + 1; > > SELECT '5.0' + 1; -- still an int, bad input format > ERROR: invalid input syntax for integer: "5.0" > LINE 1: SELECT '5.0' + 1; The problem is deeper than that - when query parameters use the binary option, the server has no way to decode the binary parameter without an appropriate type OID. As you say, postgres will cast types depending on context, however this is stricter when binary parameters are used (because they only have one valid interpretation, whereas a text parameter may have several). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
On Mon, 2010-02-08 at 09:14 +0100, Massa, Harald Armin wrote: > And we should not forget to look for the reasons for the incubation of > that many pure-Python drivers: All very good points. That's why the doc I wrote: http://wiki.postgresql.org/wiki/Driver_development is specifically targeted at libpq-based drivers (which is repeated several times). I think it would be valuable to have a complete, pure-python driver available (like the JDBC driver). That's a large project, however. People who use a different python implementation understand that libraries might not be as plentiful. It will be a while before there are as many pure-python libraries as there are pure-java libraries. Right now what we need is a driver toward which we can confidently direct cPython users. It's faster to wrap libpq than to write a complete driver. And if we don't have such a driver, we risk alienating an important community for postgresql growth. So, the cost is lower and the benefits are higher for wrapping libpq for the cPython users. At least, that seems to be the case right now; things may change in the future. Regards,Jeff Davis
On Tue, 2010-02-09 at 10:46 +1100, Andrew McNamara wrote: > The problem is deeper than that - when query parameters use the binary > option, the server has no way to decode the binary parameter without an > appropriate type OID. Postgres does not attempt to decode anything (text or binary format) until it figures out what type it is. > As you say, postgres will cast types depending on context, however this > is stricter when binary parameters are used (because they only have one > valid interpretation, whereas a text parameter may have several). Type casts are a different matter; they are only done after the unknown literals' types have been determined: create table n(i int); -- insert numeric literal, which is cast to int (assignment cast) insert into n values(5.0); -- succeeds -- insert unknown literal, which is inferred to be of type int insert into n values('5.0'); -- fails on integer type inputfunction ERROR: invalid input syntax for integer: "5.0" LINE 1: insert into n values('5.0'); Can you provide a concrete example in which the text versus binary format changes the type inference behavior? Regards,Jeff Davis
>On Tue, 2010-02-09 at 10:46 +1100, Andrew McNamara wrote: >> The problem is deeper than that - when query parameters use the binary >> option, the server has no way to decode the binary parameter without an >> appropriate type OID. > >Postgres does not attempt to decode anything (text or binary format) >until it figures out what type it is. How does it figure out what type it is? Either by the type oid passed by the caller, or by the context if the type oid is "unknown". Now, with the text format parameters, the parser usually does the right thing, since text formats have plenty of hints for us humans. However, with the binary format, unless the caller tells us, there's no way to tell whether we're correctly parsing the data. If the context implies one type, but the user passes another, we'll either get an ugly error or, worse, silently misparse their data. Generally this isn't a big problem with python, as we have good type information available. It's only an issue because people have gotten used to the text parameter parsing being so forgiving. Using my ocpgdb module, and interacting directly with the libpq wrapping code, you can see how postgres reacts to various inputs: >>> from oclibpq import * >>> from ocpgdb import pgoid >>> db=PgConnection('') No parameters: >>> r=db.execute('select 1', ()) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCell name '?column?', type 23,modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)] Int4 parameter, type specified: >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCellname '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)] Int4 parameter, type unknown, can't be determined from context: >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')]) >>> r.status PGRES_FATAL_ERROR >>> r.errorMessage 'ERROR: could not determine data type of parameter $1\n' Int4 parameter, type unknown, can be determined from context: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)] Text parameter, type unknown, mismatching context - surprising: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCellname '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)] Date parameter, type unknown, int context, the value gets misinterpreted: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)] -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
On Tue, 2010-02-09 at 12:51 +1100, Andrew McNamara wrote: > Now, with the text format parameters, the parser usually does the right > thing, since text formats have plenty of hints for us humans. The parser doesn't care whether it's text format or binary format when determining the type. > However, with the binary format, unless the caller tells us, there's no way > to tell whether we're correctly parsing the data. If the context implies > one type, but the user passes another, we'll either get an ugly error or, > worse, silently misparse their data. The difference between text and binary format is this: after it has already determined the type of the parameter, (a) if the format is text, it passes it to the type input function to constructthe value; or (b) if the format is binary, it passes it to the type recv function to construct the value. The argument to the input or recv functions may: (a) be valid input; or (b) be invalid input, and be detected as an errorby the input or recv function; or (c) be invalid input, and not be detected as an error by the input or recvfunction. For a given type, the input function may be more likely to catch an input error than the recv function; or the reverse. Either way, it is very type-specific, and the only difference is the whether the input is misinterpreted (type error not caught; bad) or an error is thrown (type error caught; better). > Using my ocpgdb module, and interacting directly with the libpq wrapping > code, you can see how postgres reacts to various inputs: None of the examples show a difference in the inferred type of a text versus binary parameter for the same query. > No parameters: > > >>> r=db.execute('select 1', ()) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)] Expected, because the literal 1 (without quotes) is an integer literal, not an unknown literal. > Int4 parameter, type specified: > > >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')]) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)] Expected, because you specified the type, and sent the binary data to the integer recv function, and it was valid input. > Int4 parameter, type unknown, can't be determined from context: > > >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')]) > >>> r.status > PGRES_FATAL_ERROR > >>> r.errorMessage > 'ERROR: could not determine data type of parameter $1\n' Expected -- there is no context to determine the type. Why do you call it an int4 parameter? It's just bytes, and you never told postgres what they are (as you did in the previous example). > Int4 parameter, type unknown, can be determined from context: > > >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')]) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)] Expected: the function + provides the context that allows the server to interpret the left argument as an integer. (Again, not an int4 parameter, it's unknown) > Text parameter, type unknown, mismatching context - surprising: > > >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')]) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)] Expected, because this is exactly the same as the previous one except for the data you pass in. Notice that the same type is inferred (23). Why do you call this "mismatching context" when the context is exactly the same as above? The only difference is which 4 bytes you provide. You never told postgres that the bytes were text bytes anywhere. You may think that it's doing 1111 + 1, but it's actually doing addition on the bytes. That is apparent in the next example: > Date parameter, type unknown, int context, the value gets misinterpreted: > > >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')]) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)] > Expected, because the only thing that could possibly detect the error is the int4recv function, which happens to accept any 4-byte input (so it will never fail on any 4 bytes of data). Regards,Jeff Davis
>For a given type, the input function may be more likely to catch an >input error than the recv function; or the reverse. Either way, it is >very type-specific, and the only difference is the whether the input is >misinterpreted (type error not caught; bad) or an error is thrown (type >error caught; better). This is the crux of the matter: the type input functions are universally more forgiving since, by their nature, text formats are designed for us fuzzy humans, and users of adapters have come to expect this. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
On Tue, 2010-02-09 at 13:56 +1100, Andrew McNamara wrote: > >For a given type, the input function may be more likely to catch an > >input error than the recv function; or the reverse. Either way, it is > >very type-specific, and the only difference is the whether the input is > >misinterpreted (type error not caught; bad) or an error is thrown (type > >error caught; better). > > This is the crux of the matter: the type input functions are universally > more forgiving since, by their nature, text formats are designed for us > fuzzy humans, and users of adapters have come to expect this. Except that it's exactly the opposite with integers. Pass any 4 bytes to in4recv(), and it will accept it. However, try passing '4.0' to int4in(), and you get an error. If I had to make an educated guess about the forgiveness of various type input and type recv functions, I would say that the recv functions are more forgiving. After all, you would expect the binary format to be less redundant, and therefore less likely to catch inconsistencies. I don't see much of a universal truth there, however. This is getting pretty far off-topic, so let's just leave it at that. The drivers should support both formats; the type inference logic doesn't care at all about the contents of the unknown literals (text or binary); and queries should be written in such a way that the types are unambiguous and unsurprising. Regards,Jeff Davis
On Mon, 2010-02-08 at 20:50 +0100, Florian Weimer wrote: > I saw your note that you have to specify the types for date values > etc. Is this really desirable or even necessary? Can't you specify > the type as unknown (OID 705, I believe)? I believe the problem that Andrew is describing is that: SELECT $1 + 1; will infer that $1 is of type int4. But if you really intended $1 to be a date (which is also valid), it will cause a problem. If the date is passed in text format, it will cause an error in int4in(), because the text representation of a date isn't a valid text representation for an integer. If the date is passed in binary format, it will pass it to int4recv() -- but because the date is 4 bytes, and int4recv is defined for any 4-byte input, it won't cause an error; it will produce a wrong result. In other words, the binary representation for a date _is_ a valid binary representation for an integer. The type inference has found the wrong type, but the recv function still accepts it, which causes a problem. The solution is to write the query in an unambiguous way: SELECT $1::date + 1; which is good practice, anyway. If it's not obvious to the type inference system, it's probably not obvious to you, and will probably surprise you ;) Or, as Andrew suggests, you can pass the type oid along with the parameter so that postgresql knows the right type. Either way, relying on a type input or a recv function to cause a type error is much more fragile. Regards,Jeff Davis
>If the date is passed in binary format, it will pass it to int4recv() -- >but because the date is 4 bytes, and int4recv is defined for any 4-byte >input, it won't cause an error; it will produce a wrong result. In other >words, the binary representation for a date _is_ a valid binary >representation for an integer. The type inference has found the wrong >type, but the recv function still accepts it, which causes a problem. Yes - of the worst kind: silent data corruption. >The solution is to write the query in an unambiguous way: > > SELECT $1::date + 1; > >which is good practice, anyway. If it's not obvious to the type >inference system, it's probably not obvious to you, and will probably >surprise you ;) That address this specific case, but it's ugly and not general. The right thing is to set the correct type when you're marshalling the parameters... >Or, as Andrew suggests, you can pass the type oid along with the >parameter so that postgresql knows the right type. That's right - if using the binary parameters, you *must* pass an appropriate type oid for the data you send to the server. If you use the "unknown" oid, bad things will happen (sooner or later). While this is strictly true of both binary and text parameters, text parameters have enough redundancy built into the format that it's rarely a problem. Users have come to expect this leniency. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
>That's just a matter of prioritizing the issues. Put the big ones at >the top, the trivia at the bottom, [...] I'd like to see a requirement for the use of PQexecParams() over PQexec() - even when using libpq's PQescapeStringConn(), PQexec() makes me uneasy. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Andrew McNamara <andrewm@object-craft.com.au> writes: >> That's just a matter of prioritizing the issues. Put the big ones at >> the top, the trivia at the bottom, [...] > I'd like to see a requirement for the use of PQexecParams() over PQexec() - > even when using libpq's PQescapeStringConn(), PQexec() makes me uneasy. Such a rule seems pretty entirely pointless, unless you have a way to enforce that the query string passed to the function hasn't been assembled from parts somewhere along the way. regards, tom lane
>> I'd like to see a requirement for the use of PQexecParams() over PQexec() - >> even when using libpq's PQescapeStringConn(), PQexec() makes me uneasy. > >Such a rule seems pretty entirely pointless, unless you have a way to >enforce that the query string passed to the function hasn't been >assembled from parts somewhere along the way. The point is that if the driver is doing the right thing, the user of the driver at least has to choice to do things safely. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 >>The solution is to write the query in an unambiguous way: >> >> SELECT $1::date + 1; >> >>which is good practice, anyway. If it's not obvious to the type >>inference system, it's probably not obvious to you, and will probably >>surprise you ;) > > That address this specific case, but it's ugly and not general. The right > thing is to set the correct type when you're marshalling the parameters... Well, ugly is in the eye of the beholder, and it certainly is a general solution. Any query with ambiguity in its parameters should explicitly declare the types, inside the query itself. Having the driver indicate the type should be the exception, not the rule. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201002091811 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE
>>>The solution is to write the query in an unambiguous way: >>> >>> SELECT $1::date + 1; >>> >>>which is good practice, anyway. If it's not obvious to the type >>>inference system, it's probably not obvious to you, and will probably >>>surprise you ;) >> >> That address this specific case, but it's ugly and not general. The right >> thing is to set the correct type when you're marshalling the parameters... > >Well, ugly is in the eye of the beholder, and it certainly is a general >solution. Any query with ambiguity in its parameters should explicitly >declare the types, inside the query itself. Having the driver indicate >the type should be the exception, not the rule. You are missing the point: this is not about what types the SQL execution sees. It is about making sure the correct recv function is applied to the binary parameter data. The server cannot reliably infer which recv function to use based in query context (although it tries). A wrong guess can lead to silent data corruption, which is utterly unacceptable. If the client (driver) sets the type OID to match the format of the binary parameter it sends, the server can unambiguously decode the data (and cast the type, if need be). I would go as far as to suggest that postgres should not accept binary parameters with an "unknown" OID - it's dangerous, unreliable and serves no purpose. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Andrew McNamara <andrewm@object-craft.com.au> writes: >>> The solution is to write the query in an unambiguous way: >>> SELECT $1::date + 1; > You are missing the point: this is not about what types the SQL execution > sees. It is about making sure the correct recv function is applied to > the binary parameter data. Indeed, and the above locution does set that. regards, tom lane
>Andrew McNamara <andrewm@object-craft.com.au> writes: >>>> The solution is to write the query in an unambiguous way: >>>> SELECT $1::date + 1; > >> You are missing the point: this is not about what types the SQL execution >> sees. It is about making sure the correct recv function is applied to >> the binary parameter data. > >Indeed, and the above locution does set that. Sure, but it requires the driver to modify the query - that isn't reasonable or practical. Expecting the user to the driver to know and correct set the type the driver will ultimately see is a recipe for disaster. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
* Jeff Davis: > Agreed. Ultimately, the conversion has to be done somewhere, but I don't > believe the driver is the place for it. Type conversions are always > going to be imperfect, and this has some important consequences: > * The type conversion system will be endlessly tweaked to improve it > * Developers will always run into problems with it in any complex > application, so we need to allow them to circumvent the system and do it > themselves when necessary. The downside is that passing strings up to the application may have distinctly worse performance characteristics than passing a number. > In ruby-pg, you can just do: > > conn.exec("INSERT INTO foo VALUES($1)", ["Jeff"]) > > And I think that's appropriate. What I'm saying is that there should > still exist some way to pass explicit types or formats (although that > should still be easier than it is in C ;). Here's the long form: > > conn.exec("INSERT INTO foo VALUES($1)", > [{:value => "Jeff", :format => 0, :type => 0}]) Okay, this isn't too bad an API. I will use the same approach. In my case, it means no transparent support for arrays, but per your own guidelines, this is okay. > That copies value so that foo and bar have the same contents: a 4 byte > value "\000". What would happen though, if val was transparently > decoded? It would decode it once in ruby, and again inside of postgres > (in byteain), leaving you with a one byte value in bar, even though foo > has a four-byte value. I've never viewed it from this angle, and I agree that it makes sense. Thanks for your observations and explanations, they were helpful.
On Sun, 2010-02-14 at 20:43 +0100, Florian Weimer wrote: > The downside is that passing strings up to the application may have > distinctly worse performance characteristics than passing a number. Yes, that is a good point. I tried to clarify this in the doc. I think this would fall under the optional type conversion convenience functions. As long as it's explicit that the conversion is happening, I think it's OK. Regards,Jeff Davis