Thread: Confusion over Python drivers

Confusion over Python drivers

From

Bruce Momjian

Date:

05 February 2010, 10:19:55

My son has brought to my attention that our current crop of Python
client libraries is inadequate/confusing.  I took a look myself, and
asked on our IRC channel, and am now convinced this area needs
attention.

First, the sheer number of libraries is confusing.  This is the page
from the Postgres wiki:
http://wiki.postgresql.org/wiki/Python

The first one listed, Psycopg, is noted as "preferred libpq-based
driver", but the license is GPL.  Isn't that a problem for many client
applications?

The next one, PyGreSQL, is BSD licensed, but only has documentation for
the "classic" interface.  The DB-API module says about documentation:
http://www.pygresql.org/pgdb.html
This section of the documentation still needs to be written.

The other three are pure Python drivers, which I guess can be good, but
why three, and then there are three more listed as "obsolete/stalled".

Clearly something is wrong here.

The Python-hosted PostgreSQL page has similar problems:
http://wiki.python.org/moin/PostgreSQL

Does Perl have a similar mess?

While I realize experienced people can easily navigate this confusion, I
am concerned about new Postgres adopters being very confused by this and
it is hurting our database adoption in general.  

What is really needed is for someone to take charge of one of these
projects and make a best-of-breed Python driver that can gain general
acceptance as our preferred driver.  I feel Python is too important a
language to be left in this state.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

"Massa, Harald Armin"

Date:

05 February 2010, 10:36:00

Bruce,

http://wiki.postgresql.org/wiki/Python

The first one listed, Psycopg, is noted as "preferred libpq-based
driver", but the license is GPL. Isn't that a problem for many client
applications?

The licence of psycopg2 is a little more complicated; the "GPL" in that summary just tries to sum it. The actual licence is: [from the LICENCE in the source distribution]

---------------------------------------------
psycopg and the GPL
===================

psycopg is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version. See file COPYING for details.

As a special exception, specific permission is granted for the GPLed
code in this distribition to be linked to OpenSSL and PostgreSQL libpq
without invoking GPL clause 2(b).

Note that the GPL was chosen to avoid proprietary adapters based on
psycopg code. Using psycopg in a proprietary product (even bundling
psycopg with the proprietary product) is fine as long as:

1. psycopg is called from Python only using only the provided API
    (i.e., no linking with C code and no C modules based on it); and

2. all the other points of the GPL are respected (you offer a copy
    of psycopg's source code, and so on.)

Alternative licenses
====================

If you prefer you can use the Zope Database Adapter ZPsycopgDA (i.e.,
every file inside the ZPsycopgDA directory) user the ZPL license as
published on the Zope web site, http://www.zope.org/Resources/ZPL.

Also, the following BSD-like license applies (at your option) to the
files following the pattern psycopg/adapter*.{h,c} and
psycopg/microprotocol*.{h,c}:

Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:

1. The origin of this software must not be misrepresented; you must not
    claim that you wrote the original software. If you use this
    software in a product, an acknowledgment in the product documentation
    would be appreciated but is not required.

2. Altered source versions must be plainly marked as such, and must not
    be misrepresented as being the original software.

3. This notice may not be removed or altered from any source distribution.

psycopg is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

Proprietary licenses
====================

A non-exclusive license is available for companies that want to include
psycopg in their proprietary products without respecting the spirit of the
GPL. The price of the license is one day of development done by the author,
at the consulting fee he applies to his usual customers at the day of the
request.

---------------------------------------------

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality

Re: Confusion over Python drivers

From

Bruce Momjian

Date:

05 February 2010, 10:38:46

Massa, Harald Armin wrote:
> Bruce,
> 
>        http://wiki.postgresql.org/wiki/Python
> >
> > The first one listed, Psycopg, is noted as "preferred libpq-based
> > driver", but the license is GPL.  Isn't that a problem for many client
> > applications?
> >
> > The licence of psycopg2 is a little more complicated; the "GPL" in that
> summary just tries to sum it. The actual licence is: [from the LICENCE in
> the source distribution]

Wow, that is super-confusing.  I am dealing with an issue now that some
companies are concerned that individual names appear with our own BSD
copyrights in small files.   That pales in comparison to this copyright.
I would never be able to recommend software with that complex a
copyright.

I think the community needs to look at other alternatives.

(I am amazed that not only are there a confusing number of drivers, but
there are a confusing number of copyright options for this single driver.)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

Peter Eisentraut

Date:

05 February 2010, 10:43:48

On fre, 2010-02-05 at 09:19 -0500, Bruce Momjian wrote:
> While I realize experienced people can easily navigate this confusion,
> I
> am concerned about new Postgres adopters being very confused by this
> and
> it is hurting our database adoption in general.  
> 
> What is really needed is for someone to take charge of one of these
> projects and make a best-of-breed Python driver that can gain general
> acceptance as our preferred driver.  I feel Python is too important a
> language to be left in this state.

The situation is unfortunate, but you might as well argue that too many
Linux desktops or Linux distributions confuse new users and hurt
adoption.  These alternatives all exist for a reason, and it will be
difficult to get some of them to abandon their work with the aim of
reducing the overall "confusion".

Re: Confusion over Python drivers

From

Tim Bunce

Date:

05 February 2010, 10:45:47

On Fri, Feb 05, 2010 at 09:19:26AM -0500, Bruce Momjian wrote:
> My son has brought to my attention that our current crop of Python
> client libraries is inadequate/confusing.  I took a look myself, and
> asked on our IRC channel, and am now convinced this area needs
> attention.
> 
>     http://wiki.postgresql.org/wiki/Python
> The Python-hosted PostgreSQL page has similar problems:
>     http://wiki.python.org/moin/PostgreSQL
> 
> Does Perl have a similar mess?

I don't think so.

The primary database interface is DBI and as far as I can see there's
only one DBI PostgreSQL driver: http://search.cpan.org/dist/DBD-Pg/

The only non-DBI interfaces I could find (by skimming the 384 results
for postgresql on search.cpan.org) were:
Postgres: http://search.cpan.org/dist/Postgres/    last updated in 1998.
Pg:       http://search.cpan.org/dist/pgsql_perl5/ last updated in 2000.

Tim.

Re: Confusion over Python drivers

From

Bruce Momjian

Date:

05 February 2010, 10:46:00

Peter Eisentraut wrote:
> On fre, 2010-02-05 at 09:19 -0500, Bruce Momjian wrote:
> > While I realize experienced people can easily navigate this confusion,
> > I
> > am concerned about new Postgres adopters being very confused by this
> > and
> > it is hurting our database adoption in general.  
> > 
> > What is really needed is for someone to take charge of one of these
> > projects and make a best-of-breed Python driver that can gain general
> > acceptance as our preferred driver.  I feel Python is too important a
> > language to be left in this state.
> 
> The situation is unfortunate, but you might as well argue that too many
> Linux desktops or Linux distributions confuse new users and hurt
> adoption.  These alternatives all exist for a reason, and it will be
> difficult to get some of them to abandon their work with the aim of
> reducing the overall "confusion".

Agreed, but can't we do a better job of steering new users and improving
the novice experience for the most popular of them, i.e. think Ubuntu.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

Bruce Momjian

Date:

05 February 2010, 10:51:42

Tim Bunce wrote:
> On Fri, Feb 05, 2010 at 09:19:26AM -0500, Bruce Momjian wrote:
> > My son has brought to my attention that our current crop of Python
> > client libraries is inadequate/confusing.  I took a look myself, and
> > asked on our IRC channel, and am now convinced this area needs
> > attention.
> > 
> >     http://wiki.postgresql.org/wiki/Python
> > The Python-hosted PostgreSQL page has similar problems:
> >     http://wiki.python.org/moin/PostgreSQL
> > 
> > Does Perl have a similar mess?
> 
> I don't think so.
> 
> The primary database interface is DBI and as far as I can see there's
> only one DBI PostgreSQL driver: http://search.cpan.org/dist/DBD-Pg/
> 
> The only non-DBI interfaces I could find (by skimming the 384 results
> for postgresql on search.cpan.org) were:
> Postgres: http://search.cpan.org/dist/Postgres/    last updated in 1998.
> Pg:       http://search.cpan.org/dist/pgsql_perl5/ last updated in 2000.

Yes, that's what I thought, and am glad to here it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

Peter Eisentraut

Date:

05 February 2010, 11:02:37

On fre, 2010-02-05 at 14:45 +0000, Tim Bunce wrote:
> > Does Perl have a similar mess?
> 
> I don't think so.
> 
> The primary database interface is DBI and as far as I can see there's
> only one DBI PostgreSQL driver: http://search.cpan.org/dist/DBD-Pg/

I think another difference is that the Perl DBI interface is very rich,
whereas the Python DB-API is quite minimal and almost forces people to
write (incompatible) extensions.  The DB-SIG at Python that ought to
drive all this is also quite dead, possibly because everyone has moved
on to SQLAlchemy.

Re: Confusion over Python drivers

From

Bruce Momjian

Date:

05 February 2010, 13:40:33

Peter Eisentraut wrote:
> On fre, 2010-02-05 at 14:45 +0000, Tim Bunce wrote:
> > > Does Perl have a similar mess?
> > 
> > I don't think so.
> > 
> > The primary database interface is DBI and as far as I can see there's
> > only one DBI PostgreSQL driver: http://search.cpan.org/dist/DBD-Pg/
> 
> I think another difference is that the Perl DBI interface is very rich,
> whereas the Python DB-API is quite minimal and almost forces people to
> write (incompatible) extensions.  The DB-SIG at Python that ought to
> drive all this is also quite dead, possibly because everyone has moved
> on to SQLAlchemy.

I assumed it would be Perl that had confusion because of the Perl 6, but
obviously it is Python.  The second PL/Python server-side language that
was proposed recently is taking on a whole new appearance for me.  :-(

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

Josh Berkus

Date:

05 February 2010, 14:46:21

> I think another difference is that the Perl DBI interface is very rich,
> whereas the Python DB-API is quite minimal and almost forces people to
> write (incompatible) extensions.  The DB-SIG at Python that ought to
> drive all this is also quite dead, possibly because everyone has moved
> on to SQLAlchemy.

For people who use Python a lot, could I have a list of the deficiencies
in DBAPI?  I've got my horse and lance ready.

Given that SQLAlchemy isn't for everyone, of course ... it couldn't be,
or Django would use it, no?

--Josh Berkus

Re: Confusion over Python drivers

From

Josh Berkus

Date:

05 February 2010, 15:18:46

> The situation is unfortunate, but you might as well argue that too many
> Linux desktops or Linux distributions confuse new users and hurt
> adoption.  These alternatives all exist for a reason, and it will be
> difficult to get some of them to abandon their work with the aim of
> reducing the overall "confusion".

I'm not as concerned about "confusion" as the fact that *all* of the
various Python drivers suck in different, and crippling, ways.  I don't
care how many drivers we have, as long as we have at least one 1st-class
driver.

--Josh Berkus

Re: Confusion over Python drivers

From

Bruce Momjian

Date:

05 February 2010, 15:40:33

Josh Berkus wrote:
> 
> > The situation is unfortunate, but you might as well argue that too many
> > Linux desktops or Linux distributions confuse new users and hurt
> > adoption.  These alternatives all exist for a reason, and it will be
> > difficult to get some of them to abandon their work with the aim of
> > reducing the overall "confusion".
> 
> I'm not as concerned about "confusion" as the fact that *all* of the
> various Python drivers suck in different, and crippling, ways.  I don't
> care how many drivers we have, as long as we have at least one 1st-class
> driver.

Agreed, and we can point newbies to that one great driver.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

Bruce Momjian

Date:

05 February 2010, 16:50:51

Marko Kreen wrote:
> Psycopg was the leader, especially in web-environments,
> but it has non-obvious license and with dead website it does not
> seem that attractive.  Although it is well-maintained still.
> 
> Best path forward would be to talk with Psycopg guys about
> license clarification/change.
> 
> I suspect GPL does not extend anyway to Python code that imports
> it dynamically, and it does not seem to be their intention - they
> even try to tell it in their current clarification, but it
> is not as clear as it could be.

Yes, this is a good analysis and has some good suggestions for moving
forward.  In summary, I don't think the current PG/Python driver
situation is helping PG or Python, and I believe there are enough people
who care about those two to get a group together to sort out a viable
direction and start working toward the goal of improving things.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

Greg Smith

Date:

05 February 2010, 16:51:40

Bruce Momjian wrote:
> While I realize experienced people can easily navigate this confusion...

No, that's the worst part--the more you know and the deeper you dig into 
it, the more broken you realize the whole thing is.  When one of the 
best drivers (in some respects) has a web page that looks like this:  
http://initd.org/ that doesn't seem so bad at first--but if you're 
experienced, you know that the page has been in that disturbing state 
since late 2006.  You start digging into the driver mess, figure you 
just need to learn how things fit together, but the hole keeps getting 
bigger as you dig.

The issues here have already been identified:  the Perl DBI is an 
excellent spec, while the Python one is so weak everybody ends up 
needing their own extensions to it.  And then portability *even among 
Python PostgreSQL drivers* goes out the window.  If somebody built a 
BSD/MIT licensed driver that replaces every useful feature of all the 
forks, with no major problems, and a couple of major projects switched 
over to it (think "Skype" level big), maybe this mess could get 
resolved.  I think it would take someone already familiar with the major 
issues involved a couple of months of regular work to make any serious 
progress on it.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com

Re: Confusion over Python drivers

From

Bruce Momjian

Date:

05 February 2010, 17:02:08

Greg Smith wrote:
> Bruce Momjian wrote:
> > While I realize experienced people can easily navigate this confusion...
> 
> No, that's the worst part--the more you know and the deeper you dig into 
> it, the more broken you realize the whole thing is.  When one of the 
> best drivers (in some respects) has a web page that looks like this:  
> http://initd.org/ that doesn't seem so bad at first--but if you're 
> experienced, you know that the page has been in that disturbing state 
> since late 2006.  You start digging into the driver mess, figure you 
> just need to learn how things fit together, but the hole keeps getting 
> bigger as you dig.
> 
> The issues here have already been identified:  the Perl DBI is an 
> excellent spec, while the Python one is so weak everybody ends up 
> needing their own extensions to it.  And then portability *even among 
> Python PostgreSQL drivers* goes out the window.  If somebody built a 
> BSD/MIT licensed driver that replaces every useful feature of all the 
> forks, with no major problems, and a couple of major projects switched 
> over to it (think "Skype" level big), maybe this mess could get 
> resolved.  I think it would take someone already familiar with the major 
> issues involved a couple of months of regular work to make any serious 
> progress on it.

Yes, it is going to take serious time, and it is going to take more than
one person to be involved, but I think there are enough people who care
that something serious can be done to improve its current state.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

Jeff Davis

Date:

05 February 2010, 17:04:08

On Fri, 2010-02-05 at 09:38 -0500, Bruce Momjian wrote:
> Wow, that is super-confusing.

Agreed. Standardization among licenses is useful, and I think it's
important to have a driver with a license that people already
understand.

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Marko Kreen

Date:

05 February 2010, 18:16:03

On 2/5/10, Josh Berkus <josh@agliodbs.com> wrote:
>  > I think another difference is that the Perl DBI interface is very rich,
>  > whereas the Python DB-API is quite minimal and almost forces people to
>  > write (incompatible) extensions.  The DB-SIG at Python that ought to
>  > drive all this is also quite dead, possibly because everyone has moved
>  > on to SQLAlchemy.
>
>
> For people who use Python a lot, could I have a list of the deficiencies
>  in DBAPI?  I've got my horse and lance ready.

I took quick glance on DBI and compared to that, DB-API does
not define specific api for:
- resultset cursors
- prepared plans
- db structure examination

OTOH, my guess is that DB-API authors assumed the first two
are used transparently by the driver, and I see no reason why
they cannot be, with current DB-API.

Last item is "rich" indeed, but seems slightly overengineered..

Now, none of the differences between drivers and current confusion
seem to be related to above points.  "Hysterical Raisins" is only
good explanation about current situation.

The pg8000 / bpgsql seem to be toy projects, and anyway you dont
want to use pure-Python drivers in high-performance environments.
We are not talking about C#/java here.

py-postgresql seems to be more serious, but as it's python3 only
which makes it irrelevant today.

PyGreSQL is the oldest, older than DB-API, and so it's DB-API
interface seems an afterthought and is untested/underused - eg.
it does not support bytea.

Psycopg was the leader, especially in web-environments,
but it has non-obvious license and with dead website it does not
seem that attractive.  Although it is well-maintained still.

Best path forward would be to talk with Psycopg guys about
license clarification/change.

I suspect GPL does not extend anyway to Python code that imports
it dynamically, and it does not seem to be their intention - they
even try to tell it in their current clarification, but it
is not as clear as it could be.

-- 
marko

Re: Confusion over Python drivers

From

Josh Berkus

Date:

05 February 2010, 20:43:19

> Imho a big problem is that it does way too much itself - i.e. it does not use 
> things like PQExecParams but does escaping/parsing itself...
> Other people may think thats a good idea - I definitely do not think so.

It also has issues with transaction control which cause idle
transactions if the Django front-end times out due to load.

--Josh Berkus

Re: Confusion over Python drivers

From

Andres Freund

Date:

05 February 2010, 21:04:56

On Friday 05 February 2010 21:34:53 Marko Kreen wrote:
> On 2/5/10, Josh Berkus <josh@agliodbs.com> wrote:
> >  > I think another difference is that the Perl DBI interface is very
> >  > rich, whereas the Python DB-API is quite minimal and almost forces
> >  > people to write (incompatible) extensions.  The DB-SIG at Python that
> >  > ought to drive all this is also quite dead, possibly because everyone
> >  > has moved on to SQLAlchemy.
> > 
> > For people who use Python a lot, could I have a list of the deficiencies
> > 
> >  in DBAPI?  I've got my horse and lance ready.
> Psycopg was the leader, especially in web-environments,
> but it has non-obvious license and with dead website it does not
> seem that attractive.  Although it is well-maintained still.
Imho a big problem is that it does way too much itself - i.e. it does not use 
things like PQExecParams but does escaping/parsing itself...
Other people may think thats a good idea - I definitely do not think so.

Andres

Re: Confusion over Python drivers

From

Marko Kreen

Date:

05 February 2010, 21:25:01

On 2/5/10, Greg Smith <greg@2ndquadrant.com> wrote:
>  The issues here have already been identified:  the Perl DBI is an excellent
> spec, while the Python one is so weak everybody ends up needing their own
> extensions to it.  And then portability *even among Python PostgreSQL
> drivers* goes out the window.

Well, no.   You are overplaying on the DBI angle.  If the driver can
offer via the minimal DB-API the same features the DBI driver
can offer, then the DBI API is bloated, not the minimal API weak...

Not that DB-API is perfect, a way to toggle prepare and db cursors
on/off in per-query basis would be good to have, but thats mostly it.
Rest of extra features (COPY) need to be done as extensions anyway.

But all that is off-topic to the current driver suckiness.

The problems with Python drivers are that they fail to implement
even the minimal DB-API 2.0:

- correctly, across all interesting data types. PygreSQL (bytea). pg8000 (does rollback after query fails - WTF). None
ofthe drivers seem to support both array and tuple.

- robustly.  Psycopg2 - their misguided attempt to turn DB-API calls optionally async cost their stability big-time as
itcomplexified their codebase.  It got stable around 2.0.[678]. I'm not holding my breath on the other ones. Fear the
fancyAPI-s.

- with threadsafety level > 1.  (I don't care, but seems in some circles it's necessary.)

Note how many ways a driver can suck without stepping above
basic DB-API 2.0.

> If somebody built a BSD/MIT licensed driver
> that replaces every useful feature of all the forks, with no major problems,
> and a couple of major projects switched over to it (think "Skype" level
> big), maybe this mess could get resolved.  I think it would take someone
> already familiar with the major issues involved a couple of months of
> regular work to make any serious progress on it.

Yeah, but the problem here is that there actually is not a problem.
Psycopg license is not bad, just confusing on first
sight.  And website problem can be fix by advertisinghttp://wiki.postgresql.org/wiki/Psycopg as new website...

If your requirement is BSD license, then probably getting PyGreSQL
into shape would be faster than writing from scratch.
Main dev effort would be writing Postgres-specific DB-API 2.0
testsuite anyway.  It's likely that actual improvements needed
would be small.

Also I would suggest to match Psycopg extensions API, that would
make it usable to much broader audience.  (Eg: PygreSql does
not offer COPY extensions via the DB-API 2.0 API.  It would be
good to match Psycopg here, instead inventing new API.)

-- 
marko

Re: Confusion over Python drivers

From

James William Pye

Date:

05 February 2010, 23:46:58

On Feb 5, 2010, at 1:34 PM, Marko Kreen wrote:
> py-postgresql seems to be more serious, but as it's python3 only
> which makes it irrelevant today.

Furthermore, if it did work on python2, it's *not* something that's going to appeal to mainstream users (Python heavy
webframeworks) as it *partially* suffers from the same problem that pg8000 does. It's mostly pure-Python, but it has
someC optimizations(notably, PQ message buffer). I have done some profiling, and *with a few tweaks* it's about 2x-3x
*slowerthan psycopg2* for the retrieval of a single int column. I think it could go faster, but I don't think it's
worththe work. 

ISTM that the target audience are folk who are married to PG, and are generally unhappy with DB-API, but do not want to
buyinto a "big" abstraction layer like SQLAlchemy. Sure, it supports DB-API like other drivers so it *would be* usable
withframeworks, but why take the 3x *or greater* hit over a properly implemented libpq version? 

Finally, I just don't see the existing (often PG specific) goals that I have in mind for it appealing to the majority
of[web framework/abstraction] users. 

> Psycopg was the leader, especially in web-environments,
> but it has non-obvious license and with dead website it does not
> seem that attractive.  Although it is well-maintained still.
>
> Best path forward would be to talk with Psycopg guys about
> license clarification/change.

Yep.

Re: Confusion over Python drivers

From

James William Pye

Date:

05 February 2010, 23:46:58

On Feb 5, 2010, at 11:34 AM, Josh Berkus wrote:
> For people who use Python a lot, could I have a list of the deficiencies
> in DBAPI?  I've got my horse and lance ready.
>
> Given that SQLAlchemy isn't for everyone, of course ... it couldn't be,
> or Django would use it, no?


Here are some to start with:
- paramstyle- No direct support of prepared statements  [they *tried* to compensate for this with "cached statements",
butit's inadequate]- Too many *optional* extensions (Cursor Method .__iter__() being rather notable) 

http://www.python.org/dev/peps/pep-0249/

Re: Confusion over Python drivers

From

James William Pye

Date:

05 February 2010, 23:46:59

On Feb 5, 2010, at 8:00 AM, Peter Eisentraut wrote:
> I think another difference is that the Perl DBI interface is very rich,
> whereas the Python DB-API is quite minimal and almost forces people to
> write (incompatible) extensions.

Yep.

> The DB-SIG at Python that ought to drive all this is also quite dead,

From reading messages that come into that list(not-so-lately), I've gotten the impression that they are content with
DB-API2. Aside from the TPC interfaces, the last round of activity that I know of was dbapi3[1 (which was a long while
ago)].

> possibly because everyone has moved on to SQLAlchemy.

Yeah. AFAICT, that is the general direction encouraged by DB-SIG. Write an abstraction on top of DB-API. SQLAlchemy,
SQLObject,anysql, and I'm sure there are others. 

[1] http://wiki.python.org/moin/Aug2001DbApi3Strawman   http://wiki.python.org/moin/DbApi3

Re: Confusion over Python drivers

From

Jeff Davis

Date:

06 February 2010, 07:26:40

On Fri, 2010-02-05 at 10:35 -0800, Josh Berkus wrote:
> I'm not as concerned about "confusion" as the fact that *all* of the
> various Python drivers suck in different, and crippling, ways.  I don't
> care how many drivers we have, as long as we have at least one 1st-class
> driver.

Absolutely.

And I would prefer that it can provide all (or nearly all) of the
capabilities of libpq. PyGreSQL apparently doesn't even offer
parameterized queries!

http://pygresql.org/pg.html

That was based on a quick glance at the document; I hope I'm mistaken.

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Greg Smith

Date:

06 February 2010, 18:39:10

Marko Kreen wrote:
> The pg8000 / bpgsql seem to be toy projects, and anyway you dont
> want to use pure-Python drivers in high-performance environments.
> We are not talking about C#/java here.
>   

Right, and the comments from James reinforce this general idea:  there 
is little value to the people who most want Python+PostgreSQL support in 
working on any of the pure Python implementations, because best case 
performance is still half or less of the ones that more directly wrap 
libpq.  Even the best case with psycopg is enough of a performance hit 
as it is.

> py-postgresql seems to be more serious, but as it's python3 only
> which makes it irrelevant today.
>   

Also true.

> Psycopg was the leader, especially in web-environments,
> but it has non-obvious license and with dead website it does not
> seem that attractive.  Although it is well-maintained still.
> Best path forward would be to talk with Psycopg guys about
> license clarification/change.
>   

Agreed.  A relicensed Psycopg, with a more professional looking 
introduction/documentation page (like the budding page on our Wiki) than 
what the initd web site has, seems like the best platform to hack on top 
of to me as well.  The fact that they've moved to git recently makes it 
that much easier for another branch to exist even outside of their 
somewhat troubled infrastructure.

To summarize what I saw on this thread, the primary wishlist of changes 
to it are:

-License change
-Consider refactoring to better follow standard driver practices, such 
as using PQExecParams
-Improvement in transaction control to resolve issues that cause idle 
transactions
-Possible simplifications in how it's implemented async operations, to 
improve robustness/reduce code complexity
-Confirm/add multi-threaded support
-Confirm/add support for the most common standard types (such as array)

> PyGreSQL is the oldest, older than DB-API, and so it's DB-API
> interface seems an afterthought and is untested/underused - eg.
> it does not support bytea.
>   
And if Psycopg can't be relicensed happily and/or improved as above, as 
the only other native Python driver PyGreSQL looks like the next 
candidate to build on top of.  Its major issues are:

-Test/complete/refactor for full DB-API 2.0 support
-Add bytea support
-Add extension support, perhaps modeled on what Psycopg.
-Build a COPY extension
-Confirm/add multi-threaded support
-Confirm/add support for the most common standard types (such as array)

Any other suggestions before I turn the above into a roadmap page on the 
wiki?

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Confusion over Python drivers

From

Bruce Momjian

Date:

06 February 2010, 19:56:05

Greg Smith wrote:
> To summarize what I saw on this thread, the primary wishlist of changes 
> to it are:
> 
> -License change
> -Consider refactoring to better follow standard driver practices, such 
> as using PQExecParams
> -Improvement in transaction control to resolve issues that cause idle 
> transactions
> -Possible simplifications in how it's implemented async operations, to 
> improve robustness/reduce code complexity
> -Confirm/add multi-threaded support
> -Confirm/add support for the most common standard types (such as array)
> 
> > PyGreSQL is the oldest, older than DB-API, and so it's DB-API
> > interface seems an afterthought and is untested/underused - eg.
> > it does not support bytea.
> >   
> And if Psycopg can't be relicensed happily and/or improved as above, as 
> the only other native Python driver PyGreSQL looks like the next 
> candidate to build on top of.  Its major issues are:
> 
> -Test/complete/refactor for full DB-API 2.0 support
> -Add bytea support
> -Add extension support, perhaps modeled on what Psycopg.
> -Build a COPY extension
> -Confirm/add multi-threaded support
> -Confirm/add support for the most common standard types (such as array)
> 
> Any other suggestions before I turn the above into a roadmap page on the 
> wiki?

Great summary!  This is what I was hoping for.  One suggestion on the
license issue is that the LGPL seems like the type of license intended
by the author, and I think it is an acceptable license to most client
application programmers.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Confusion over Python drivers

From

Marko Kreen

Date:

06 February 2010, 20:48:47

On 2/7/10, Greg Smith <greg@2ndquadrant.com> wrote:
> Marko Kreen wrote:
> > Psycopg was the leader, especially in web-environments,
> > but it has non-obvious license and with dead website it does not
> > seem that attractive.  Although it is well-maintained still.
> > Best path forward would be to talk with Psycopg guys about
> > license clarification/change.
>
>  Agreed.  A relicensed Psycopg, with a more professional looking
> introduction/documentation page (like the budding page on our Wiki) than
> what the initd web site has, seems like the best platform to hack on top of
> to me as well.  The fact that they've moved to git recently makes it that
> much easier for another branch to exist even outside of their somewhat
> troubled infrastructure.
>
>  To summarize what I saw on this thread, the primary wishlist of changes to
> it are:
>
>  -License change

Note that as long they use non-BSD licence, they need the exception
for the horror that is the OpenSSL licence.

So you cannot 100% satisfy the one-click-lawyers.

>  -Consider refactoring to better follow standard driver practices, such as
> using PQExecParams

This is long-term todo item for psycopg, seems offtopic
to the "driver situation".

>  -Improvement in transaction control to resolve issues that cause idle
> transactions

This is routine bug in either app or psycopg, we have no reason
to touch it.  The guy should report to appropriate lists.

>  -Possible simplifications in how it's implemented async operations, to
> improve robustness/reduce code complexity

Long-term todo item for psycopg2, offtopic for "driver situation".

>  -Confirm/add multi-threaded support.

Seems psycopg2 already has good enough threading.

>  -Confirm/add support for the most common standard types (such as array)

psycopg2 has array support, I'd like to have tuple/record also.

Minor todo item for psycopg, mostly but not completely offtopic
for "driver situation".

> > PyGreSQL is the oldest, older than DB-API, and so it's DB-API
> > interface seems an afterthought and is untested/underused - eg.
> > it does not support bytea.
> >
> >
>  And if Psycopg can't be relicensed happily and/or improved as above, as the
> only other native Python driver PyGreSQL looks like the next candidate to
> build on top of.  Its major issues are:
>
>  -Test/complete/refactor for full DB-API 2.0 support
>  -Add bytea support

timestamp/timestamptz also...

>  -Add extension support, perhaps modeled on what Psycopg.
>  -Build a COPY extension

They seems to have the low-level parts, just they are missing
from DB-API interface.

>  -Confirm/add multi-threaded support
>  -Confirm/add support for the most common standard types (such as array)
>
>  Any other suggestions before I turn the above into a roadmap page on the
> wiki?

I think we should concentrate on the PR problem and technical issues
related to that, keep the other low-level and non-user-visible
issues out.  Or at least separate.  (PsycopgTodo wiki page?)

-- 
marko

Re: Confusion over Python drivers

From

Josh Berkus

Date:

06 February 2010, 20:51:47

> Finally, I just don't see the existing (often PG specific) goals that I have in mind for it appealing to the majority
of[web framework/abstraction] users.
 

What are those goals?

--Josh Berkus

Re: Confusion over Python drivers

From

Robert Haas

Date:

06 February 2010, 22:11:18

On Sat, Feb 6, 2010 at 7:48 PM, Marko Kreen <markokr@gmail.com> wrote:
> This is long-term todo item for psycopg, seems offtopic
> to the "driver situation".
[...]
> This is routine bug in either app or psycopg, we have no reason
> to touch it.  The guy should report to appropriate lists.
[...]
> Long-term todo item for psycopg2, offtopic for "driver situation".
[...]
> psycopg2 has array support, I'd like to have tuple/record also.
>
> Minor todo item for psycopg, mostly but not completely offtopic
> for "driver situation".

I'm not a Python user myself, but I have trouble understanding how you
can describe bugs in one of the Python drivers as off-topic to the
Python driver situation.

...Robert

Re: Confusion over Python drivers

From

James William Pye

Date:

07 February 2010, 00:50:47

On Feb 6, 2010, at 5:51 PM, Josh Berkus wrote:
>> Finally, I just don't see the existing (often PG specific) goals that I have in mind for it appealing to the
majorityof [web framework/abstraction] users. 
>
> What are those goals?

I think the most interesting one that has yet to be implemented is fast, multiple destination COPY support. Currently,
COPYis supported, but a bytes() object is allocated for each row, so it's currently not efficient for moving
data(pg-to-pgETL sans the T? =). While some C is still needed to make it properly efficient, it's primarily to keep
trackof the COPY's state and to update stats. This is pretty useless to a django user... Well, I suppose it might be
interestingif COPY OUT could target [or transform into] JSON, but idk... 

The general, ongoing goal is to implement and document *convenient* Python interfaces to PostgreSQL features. A,
perhapsuninteresting, case being "supporting" advisory locks. I was thinking a context manager, but it might just be
somethingas trivial as an additional method on the connection(very simple/direct binding). 

Some "fuzzy" goals: twisted support, some asynchronous interfaces, and greater user control over type I/O. The first,
twisted,mostly interests me as an exercise. The second, async interfaces, scares me as it took me some time just to
feel"not unhappy" with the blocking APIs. The third will probably happen, but it's going to be a while. 

I also have some goals not directly related to a driver. postgresql.unittest is currently only used internally, but I
hopeto document it some day soon so that people can write Python unittest.TestCase's that auto-magically build out a
targetcluster(~pg_regress/pgTap for Python?). Well, it works, but it's not documented and the APIs haven't been given
muchthought. Generally, basic cluster management tools for Python. (At one point I tried to write a programmer's HBA
editor,but I think I hurt myself trying to figure out rule reduction.. That is, it was trying to be smarter than
"insert/deleterule at position x".) 

Well, these are the ones that come to mind, atm, but I don't think there's much beyond them.

Re: Confusion over Python drivers

From

Jeff Davis

Date:

07 February 2010, 02:41:07

On Fri, 2010-02-05 at 09:19 -0500, Bruce Momjian wrote:
> My son has brought to my attention that our current crop of Python
> client libraries is inadequate/confusing.  I took a look myself, and
> asked on our IRC channel, and am now convinced this area needs
> attention.

I have written up a set of guidelines for driver development based on
what I learned working on ruby-pg:

http://wiki.postgresql.org/wiki/Driver_development

Whether we take one of the existing projects and improve upon it, or
start a complete rewrite, I hope these guidelines will be a useful
destination.

Note that the ruby-pg driver doesn't 100% adhere to those standards
(encoding is the primary problem, and that will be fixed).

I would appreciate comments by anyone (Greg Sabino Mullane: I included
you in the CC because I thought you may have some input). Even if the
python driver doesn't go in that direction, it will help me improve
ruby-pg.

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Greg Smith

Date:

07 February 2010, 06:18:48

Marko Kreen wrote:
> I think we should concentrate on the PR problem and technical issues
> related to that, keep the other low-level and non-user-visible
> issues out.  Or at least separate.  (PsycopgTodo wiki page?)
>   

That's just a matter of prioritizing the issues.  Put the big ones at 
the top, the trivia at the bottom, and if you knock stuff of there 
somewhere along the way you discover you've made enough progress that 
the PR stuff starts going away, because people are able to get their 
work done with less drama.

Here's a full TODO page that includes everything mentioned here as best 
I could summarize it:  
http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO

Looks like the first action item is to talk with the Psycopg people 
about their license.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com

Re: Confusion over Python drivers

From

Marko Kreen

Date:

07 February 2010, 06:20:27

On 2/7/10, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Feb 6, 2010 at 7:48 PM, Marko Kreen <markokr@gmail.com> wrote:
>  > This is long-term todo item for psycopg, seems offtopic
>  > to the "driver situation".
>
> [...]
>
> > This is routine bug in either app or psycopg, we have no reason
>  > to touch it.  The guy should report to appropriate lists.
>
> [...]
>
> > Long-term todo item for psycopg2, offtopic for "driver situation".
>
> [...]
>
> > psycopg2 has array support, I'd like to have tuple/record also.
>  >
>  > Minor todo item for psycopg, mostly but not completely offtopic
>  > for "driver situation".
>
>
>  I'm not a Python user myself, but I have trouble understanding how you
>  can describe bugs in one of the Python drivers as off-topic to the
>  Python driver situation.

I thought the topic was "Confusion over Python drivers"?

The only bug there was likely app one, or at least its not widespread
so off-topic.   Rest are more like non-essential cool features,
so again off-topic.

-- 
marko

Re: Confusion over Python drivers

From

"Massa, Harald Armin"

Date:

07 February 2010, 07:27:59

Marko,

I thought the topic was "Confusion over Python drivers"?

The only bug there was likely app one, or at least its not widespread
so off-topic. Rest are more like non-essential cool features,
so again off-topic.

Those lack of "non-essential cool features" is right on topic - because what one developer may see as "non-essential" is the most important feature for the next developer.

Exactly these kind of issues are the source of the confusion we now phase: developer B needing feature X, which was ignored by driver A.

So I concur to put them on the discussion agenda; if they drop out of priority, at least it is documented WHY and will save other developers these thoughts.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality

Re: Confusion over Python drivers

From

Josh Berkus

Date:

07 February 2010, 15:50:29

>>  I'm not a Python user myself, but I have trouble understanding how you
>>  can describe bugs in one of the Python drivers as off-topic to the
>>  Python driver situation.
> 
> I thought the topic was "Confusion over Python drivers"?
> 
> The only bug there was likely app one, or at least its not widespread
> so off-topic.   Rest are more like non-essential cool features,
> so again off-topic.

Not at all, except that maybe this discussion belongs on -advocacy
rather than here.  Driver quality is partly performance and stability,
but also partly what features it implements and how well it implements
them.  *particularly* for PostgreSQL, which is a database with lots of
cool features; if the driver doesn't support our cool features, then
they don't matter to Python developers.

Anyway, I don't yet have a full diagnosis on the transaction control
issue or I'd already have posted it to psycopg -- it may be a toxic
interaction between Django and Psycopg2 rather than psycopg2 alone.  I'd
not have brought it up except for this discussion.

--Josh Berkus

Re: Confusion over Python drivers

From

Greg Smith

Date:

07 February 2010, 15:55:05

Greg Smith wrote:
> Here's a full TODO page that includes everything mentioned here as 
> best I could summarize it:  
> http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO
>
> Looks like the first action item is to talk with the Psycopg people 
> about their license.

Oh:  and I'm going to take care of this.  License changes can be a very 
sensitive topic and I'm told that discussion probably needs to happy in 
Italian too; I can arrange that.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Confusion over Python drivers

From

Greg Smith

Date:

07 February 2010, 16:23:36

Josh Berkus wrote:
> Anyway, I don't yet have a full diagnosis on the transaction control
> issue or I'd already have posted it to psycopg -- it may be a toxic
> interaction between Django and Psycopg2 rather than psycopg2 alone.  I'd
> not have brought it up except for this discussion.
>   

I'm going to remove it from the list on the wiki then for now.  I don't 
want to annoy the developers by adding a more speculative bug that might 
not even be in their software.  If you get to where it's confirmed and 
info posted to their list, please add a link back into the page once 
it's reported, i.e. link to their mailing list archives or something 
like that.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

07 February 2010, 20:09:01

>Any other suggestions before I turn the above into a roadmap page on the 
>wiki?

I got sick of the constant stream of escaping bugs impacting on psycopg
and pyPgSQL, and wrote my own DB-API driver, using the more modern
libpq/binary/protocol 3 APIs where ever possible. The result is BSD
licensed:
   http://code.google.com/p/ocpgdb/

As well as using the newer APIs, I have attempted to keep the code as
simple as possible, eschewing things like threading as adding too much
complexity for too little gain (particularly true of Python threading),
and I kept to just the code DB-API functionality.

The C code exists mainly to present a pythonic view of libpq. I found
that type conversion and marshalling could generally be done from python
with more than acceptable performance (via the C-coded "struct" module
in the standard library for common types). In my tests, ocpgdb has
performed at least as well as pyPgSQL and psycopg, often a lot better,
primarily due to the use of the libpq binary protocols, I think.

I'm not proposing my module as your canonical implementation, although
you're welcome to it if you like. Rather, it demonstrates another viable
approach, minimal, and using newer libpq APIs.

BTW, with respect to the discussion of the Python DB-API - I see it as
specifying a lowest-common-denominator, or the subset of functionality
that should be available from most databases without requiring
contortions. Like eating at McDonalds, it does the job, but it's never
going to delight or surprise. A PostGreSQL blessed adapter really should
provide access to all the features in libpq, and I'm not sure this is
directly compatible with DBAPI. Instead, the DBAPI-compliance should be
layered on top.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Greg Smith

Date:

07 February 2010, 20:19:51

Andrew McNamara wrote:
> I got sick of the constant stream of escaping bugs impacting on psycopg
> and pyPgSQL, and wrote my own DB-API driver, using the more modern
> libpq/binary/protocol 3 APIs where ever possible. The result is BSD
> licensed:
>     http://code.google.com/p/ocpgdb/
>   

I added you into the list at http://wiki.postgresql.org/wiki/Python

Can you check what I put in there, confirm Windows compatibility, and 
comment on Python 3.X support?

I'd be curious to hear more about the escaping bugs you ran into as 
well.  We already have some notes on the TODO that pushing more of this 
work toward the standard libpq routines would seem appropriate for 
things passing between the driver and libpq.  Were the issues you ran 
into on that side, or more on the Python side of how things were being 
formatted?

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

07 February 2010, 21:38:30

>I added you into the list at http://wiki.postgresql.org/wiki/Python

Thanks.

>Can you check what I put in there, confirm Windows compatibility, and 
>comment on Python 3.X support?

I haven't tried it under Windows and I haven't had any feedback either
way from Windows users. 

For now, ocpgdb has no Python 3 support (I don't foresee any real
problems, however).

>I'd be curious to hear more about the escaping bugs you ran into as well.
>We already have some notes on the TODO that pushing more of this work
>toward the standard libpq routines would seem appropriate for things
>passing between the driver and libpq.  Were the issues you ran into on
>that side, or more on the Python side of how things were being formatted?

It was a while ago now and I can't remember the specific details - it was
more a general feeling that the existing offerings were going about it
the wrong way (with respect to parameter passing and escaping). I suspect
this was a historical artifact (presumably libpq didn't provide escaping
facilities or parameterised queries when the adapters were written).

Essentially, I just wanted a pyPgSQL with a more modern implementation.
Psycopg was (is?) also using Protocol 2. I felt that the way forward was
to switch to the Protocol 3 API features, in particular, parameterised
queries, and none of the existing Python adapters had done that (I got
the impression while writing my module that nobody was exercising the
new features).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Jeff Davis

Date:

08 February 2010, 02:56:45

On Mon, 2010-02-08 at 12:25 +1100, Andrew McNamara wrote:
> For now, ocpgdb has no Python 3 support (I don't foresee any real
> problems, however).

Encoding issues are the big one. There are a couple gotchas, and I
provided the details here:

http://wiki.postgresql.org/wiki/Driver_development#Text_Encoding

Regards,Jeff Davis

Re: Confusion over Python drivers

From

"Massa, Harald Armin"

Date:

08 February 2010, 04:16:46

<br />>The pg8000 / bpgsql seem to be toy projects, and anyway you dont<br />>want to use pure-Python drivers in
high-performanceenvironments.<br /><br />I agree that there are some performance-challenges with pure-Python
drivers.<br/><br />And we should not forget to look for the reasons for the incubation of that many pure-Python
drivers:<br/><br />a) Python is no longer one-language, one-implementation. There are (at least) cPython (the
original),Jython (on JVM), IronPython (from Microsoft on CLR), PyPy (Python on Python), Unladen Swallow (from Google on
LLVM).In addition the nearly-Pythons as in Cython, RPython and ShedSkin<br /><br />Everything apart from cPython (and
possibleUnladenSwallow) has its challenges dealing with non-Python extension modules. From a developer standpoint it
canbe tempting to be able to rely on the same database adapter across more then one implementation<br /><br />b) the
stabilizationof an Python Application Binary Interface is in early discussion stage; meaning: it will take some time
untilla non-Python extension can be usable across Python versions. c-Extensions are allways a major stumbling block on
Python-nto Python-(n+1) versions<br /><br />c) Stability. Python code is same-on-same more robust then C-Code, as some
ofthe crash-friendly-problems are eliminated (you cannot allocate memory wrongly within Python, you cannot errorly
accessmemory cross array boundaries...)<br /><br />especially a) is a point to consider when standardizing on a
PostgreSQLblessed Python-Postgresql-driver. How will the blessing extend to Jython / Ironpython / PyPy?<br /><br
/>Harald<br/>-- <br />GHUM Harald Massa<br />persuadere et programmare<br /> Harald Armin Massa<br />Spielberger Straße
49<br/>70435 Stuttgart<br />0173/9409607<br />no fx, no carrier pigeon <br />-<br />%s is too gigantic of an industry
tobend to the whims of reality<br />

Re: Confusion over Python drivers

From

Greg Smith

Date:

08 February 2010, 04:37:08

Massa, Harald Armin wrote:
> I agree that there are some performance-challenges with pure-Python 
> drivers.
> And we should not forget to look for the reasons for the incubation of 
> that many pure-Python drivers:
> a) Python is no longer one-language, one-implementation. There are (at 
> least) cPython (the original), Jython (on JVM), IronPython (from 
> Microsoft on CLR), PyPy (Python on Python), Unladen Swallow (from 
> Google on LLVM). In addition the nearly-Pythons as in Cython, RPython 
> and ShedSkin...
> especially a) is a point to consider when standard, it's getting one 
> driver that satisfies the needs of the people most like
> izing on a PostgreSQL blessed Python-Postgresql-driver. How will the 
> blessing extend to Jython / Ironpython / PyPy?

The point isn't so much "standardizing".  Having a low performance 
Python driver turns into a PostgreSQL PR issue.  Last thing we need is 
the old "PostgreSQL is slow" meme to crop back up again via the Python 
community, if the driver suggested by the community isn't written with 
performance as a goal so that, say, PostgreSQL+Python looks really slow 
compared to MySQL+Python.  And if you're writing a database driver with 
performance as a goal, native Python is simply not an option.

Now, once *that* problem is under control, and there's a nicely 
licensed, well documented, major feature complete, and good performing 
driver, at that point it would be completely appropriate to ask "what 
about people who want support for other Python platforms and don't care 
if it's slower?".  And as you say, nurturing the "incubation" of such 
drivers is completely worthwhile.  I just fear that losing focus by 
wandering too far in that direction, before resolving the main problem 
here, is just going to extend resolving the parts of the Python driver 
situation I feel people are most displeased with.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com

Re: Confusion over Python drivers

From

Gabriele Bartolini

Date:

08 February 2010, 05:07:40

Hi there,

Greg Smith ha scritto:
>> Looks like the first action item is to talk with the Psycopg people 
>> about their license.
>
> Oh:  and I'm going to take care of this.  License changes can be a 
> very sensitive topic and I'm told that discussion probably needs to 
> happy in Italian too; I can arrange that.
>
I can try and help with this issue, given my role with the Italian 
PostgreSQL community and PostgreSQL business with 2ndQuadrant Italia. I 
have met Psycopg's developer a couple of times at open-source 
conferences. I have great respect for his work and his contribution in 
the open-source community, and I will be very happy to try and explain 
the situation to him.

I will keep you posted.

Ciao,
Gabriele

-- Gabriele Bartolini - 2ndQuadrant ItaliaPostgreSQL Training, Services and Supportgabriele.bartolini@2ndQuadrant.it |
www.2ndQuadrant.it

Re: Confusion over Python drivers

From

"Massa, Harald Armin"

Date:

08 February 2010, 07:55:28

Greg,

The point isn't so much "standardizing". Having a low performance Python driver turns into a PostgreSQL PR issue.

I totally agree.

>And if you're writing a database driver with performance as a goal, native Python is simply not >an option.

yes. Additionally: performance is not the only challenge. A native Python implementation, without using libpq, will have to reimplement much of libpq - just let me isolate proper escaping, and will have its own bugs.

Now, once *that* problem is under control, and there's a nicely licensed, well documented, major feature complete, and good performing driver, at that point it would be completely appropriate to ask "what about people who want support for other Python platforms and don't care if it's slower?".

Pure Pythondrivers do exist now; and they are allready discussed in the summaries - which is a good thing. With my remarks I just want to recommend that we at least should document a position for them; and a way ahead. And I need a place to point out that Python grew a FFI with ctypes. Maybe someone will think of a DBAPI2.0 compatible ctypes libpq wrapper ...

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality

Re: Confusion over Python drivers

From

"Greg Sabino Mullane"

Date:

08 February 2010, 10:32:29

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> I have written up a set of guidelines for driver development 
> based on what I learned working on ruby-pg:
>
> http://wiki.postgresql.org/wiki/Driver_development
...
> I would appreciate comments by anyone (Greg Sabino Mullane: I included
> you in the CC because I thought you may have some input).

Good page. I looked it over but have nothing to add at the moment. I may 
do so later once my head is in dbdpg mode (working on other project 
at the moment :)

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201002080931
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAktwIEYACgkQvJuQZxSWSsjczQCgkU5b6iHPREJYMtAdWlFRDkYI
cS4An3AMyc+O06HzN8MYkfq8HG62371y
=+WCV
-----END PGP SIGNATURE-----

Re: Confusion over Python drivers

From

Florian Weimer

Date:

08 February 2010, 15:29:58

* Jeff Davis:

> I have written up a set of guidelines for driver development based on
> what I learned working on ruby-pg:
>
> http://wiki.postgresql.org/wiki/Driver_development

Interesting, thanks.

I'm contemplating to create a new language binding for libpq (or, to
be more precise, turn an existing language binding into something that
can be published).  I've been agonizing a bit over how to create a
bridge between the host language type system and the PostgreSQL type
system.  If I understand you correctly, you suggest to leave
everything as strings.  This solution has the appeal of being
implemented easily.  It also sidesteps a lot of issues revolving
around different representation choices for numbers.

Do you really suggest to preserve the PQexecParams API verbatim, that
is, passing in three arrays containing type, value, and format?  That
seems to be a bit problematic.  I suspect the common case will be to
use unknown types, text format, and the default conversion from values
to strings.  Only for BYTEA values, something else is required, and
I'm wondering how to encode that (the host language doesn't offer a
distinction between text and binary strings).

Conversely, for result sets, I'm tempted to transparently decode
escaped BYTEA columns.

> Note that the ruby-pg driver doesn't 100% adhere to those standards
> (encoding is the primary problem, and that will be fixed).

Lack of Unicode support means that I can punt that to application
authors, I guess.

By the way, the downside of using strings everywhere is that your
binding API will most likely not work with SQLite (or any other
SQL-like database which lacks column type information).

Re: Confusion over Python drivers

From

Florian Weimer

Date:

08 February 2010, 15:50:53

* Andrew McNamara:

>>Any other suggestions before I turn the above into a roadmap page on the 
>>wiki?
>
> I got sick of the constant stream of escaping bugs impacting on psycopg
> and pyPgSQL, and wrote my own DB-API driver, using the more modern
> libpq/binary/protocol 3 APIs where ever possible. The result is BSD
> licensed:
>
>     http://code.google.com/p/ocpgdb/

I saw your note that you have to specify the types for date values
etc.  Is this really desirable or even necessary?  Can't you specify
the type as unknown (OID 705, I believe)?

At work, we recently used to typelessness of Perl's DBD::Pg with great
effect, introducing a more compact, type-safe representation for a few
columns, without having to change all the existing Perl scripts
accessing the database.  That's why I'm wondering...

(And we might be using Python instead of Perl today.  Lack of a decent
PostgreSQL module for Python meant it was very hard to argue against
using Perl ...)

Re: Confusion over Python drivers

From

Jeff Davis

Date:

08 February 2010, 16:39:03

On Mon, 2010-02-08 at 20:29 +0100, Florian Weimer wrote:
> I'm contemplating to create a new language binding for libpq (or, to
> be more precise, turn an existing language binding into something that
> can be published).  I've been agonizing a bit over how to create a
> bridge between the host language type system and the PostgreSQL type
> system.  If I understand you correctly, you suggest to leave
> everything as strings.  This solution has the appeal of being
> implemented easily.  It also sidesteps a lot of issues revolving
> around different representation choices for numbers.

Agreed. Ultimately, the conversion has to be done somewhere, but I don't
believe the driver is the place for it. Type conversions are always
going to be imperfect, and this has some important consequences:* The type conversion system will be endlessly tweaked
toimprove it* Developers will always run into problems with it in any complex

application, so we need to allow them to circumvent the system and do it
themselves when necessary.

Both of these things point to another layer on top of the driver itself.
It could be some extra convenience functions that come with the driver,
or an entirely separate layer (like ActiveRecord). But if we always let
the developer have access to the full power of libpq, it limits the
damage that can be done by a slightly-too-creative API on top of it.

> Do you really suggest to preserve the PQexecParams API verbatim, that
> is, passing in three arrays containing type, value, and format?  That
> seems to be a bit problematic.  I suspect the common case will be to
> use unknown types, text format, and the default conversion from values
> to strings.

I tried to address this specifically in the document:

"For example: it should be easy to pass parameters so that PQexecParams
(and others) can be used, avoiding SQL injection risks. The important
thing is to maintain close to a one-to-one mapping between libpq and the
driver's API, and to provide all of the functionality of libpq."

In ruby-pg, you can just do:
 conn.exec("INSERT INTO foo VALUES($1)", ["Jeff"])

And I think that's appropriate. What I'm saying is that there should
still exist some way to pass explicit types or formats (although that
should still be easier than it is in C ;). Here's the long form:
 conn.exec("INSERT INTO foo VALUES($1)",   [{:value => "Jeff", :format => 0, :type => 0}])

The nice thing about that format is that you can do the "easy" thing for
most of the parameters in a query, but then choose binary format for
that one BYTEA parameter. That's because, in ruby, you can mix strings
and hashes in the same array.

So I'm not saying we should make everyone code ruby that looks like C.
I'm saying that the job of the driver is to provide full access to
libpq, and anything beyond that should be an optional convenience
routine, and should be free of magic and cleverness (that's the job of a
higher layer).

> Conversely, for result sets, I'm tempted to transparently decode
> escaped BYTEA columns.

Consider the following ruby-pg program, where you have two empty tables
foo and bar, each with a single BYTEA column "b":
 conn = PGconn.connect(...) conn.exec("INSERT INTO foo VALUES($1)", ["\\\\000"])
 # copy the single value in foo into bar val = conn.exec("SELECT b FROM foo LIMIT 1")[0]["b"] conn.exec("INSERT INTO
barVALUES($1)", [val])

That copies value so that foo and bar have the same contents: a 4 byte
value "\000". What would happen though, if val was transparently
decoded? It would decode it once in ruby, and again inside of postgres
(in byteain), leaving you with a one byte value in bar, even though foo
has a four-byte value.

I really think that only higher layers should implement that kind of
magic, no matter how "obvious" it may seem that the user wants something
extra.

> > Note that the ruby-pg driver doesn't 100% adhere to those standards
> > (encoding is the primary problem, and that will be fixed).
> 
> Lack of Unicode support means that I can punt that to application
> authors, I guess.

Ruby 1.9+ and Python 3.0+ both have string encoding models that can't
just be ignored. We could punt it by always returning byte sequences
rather than strings, but I think that's a particularly extreme version
of my philosophy of not trying to convert between types.

> By the way, the downside of using strings everywhere is that your
> binding API will most likely not work with SQLite (or any other
> SQL-like database which lacks column type information).

I am trying to develop standards suitable for PostgreSQL drivers based
on libpq. These are not meant to be standards for a database-agnostic
API, standards for a high-level database adapter, or even standards for
a driver written against something other than libpq (like the JDBC
driver).

Thank you for your comments. I will try to integrate these thoughts into
the document.

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

08 February 2010, 18:16:12

>>     http://code.google.com/p/ocpgdb/
>
>I saw your note that you have to specify the types for date values
>etc.  Is this really desirable or even necessary?  Can't you specify
>the type as unknown (OID 705, I believe)?
>
>At work, we recently used to typelessness of Perl's DBD::Pg with great
>effect, introducing a more compact, type-safe representation for a few
>columns, without having to change all the existing Perl scripts
>accessing the database.  That's why I'm wondering...

I can't see how this would work with binary query parameters - the server
will see a blob of binary data and have no way to know what it represents.

I presume DBD::Pg is using text parameters, rather than binary.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Jeff Davis

Date:

08 February 2010, 19:34:47

On Tue, 2010-02-09 at 09:15 +1100, Andrew McNamara wrote:
> I can't see how this would work with binary query parameters - the server
> will see a blob of binary data and have no way to know what it represents.

Unknown is unknown, whether in binary or text format. As far as I know,
PostgreSQL never looks inside a literal of unknown type to try to
determine its type -- it only looks at the context (to what function is
it an argument?).

For instance:
 SELECT '5'; -- has no idea what type it is
 SELECT '5' + 1; -- it's an int
 SELECT 'a' + 1; -- it's still an int ERROR:  invalid input syntax for integer: "a" LINE 1: SELECT 'a' + 1;
 SELECT '5.0' + 1; -- still an int, bad input format ERROR:  invalid input syntax for integer: "5.0" LINE 1: SELECT
'5.0'+ 1;
 

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

08 February 2010, 19:47:08

>On Tue, 2010-02-09 at 09:15 +1100, Andrew McNamara wrote:
>> I can't see how this would work with binary query parameters - the server
>> will see a blob of binary data and have no way to know what it represents.
>
>Unknown is unknown, whether in binary or text format. As far as I know,
>PostgreSQL never looks inside a literal of unknown type to try to
>determine its type -- it only looks at the context (to what function is
>it an argument?).
>
>For instance:
>
>  SELECT '5'; -- has no idea what type it is
>
>  SELECT '5' + 1; -- it's an int
>
>  SELECT 'a' + 1; -- it's still an int
>  ERROR:  invalid input syntax for integer: "a"
>  LINE 1: SELECT 'a' + 1;
>
>  SELECT '5.0' + 1; -- still an int, bad input format
>  ERROR:  invalid input syntax for integer: "5.0"
>  LINE 1: SELECT '5.0' + 1;

The problem is deeper than that - when query parameters use the binary
option, the server has no way to decode the binary parameter without an
appropriate type OID.

As you say, postgres will cast types depending on context, however this
is stricter when binary parameters are used (because they only have one
valid interpretation, whereas a text parameter may have several).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Jeff Davis

Date:

08 February 2010, 19:51:15

On Mon, 2010-02-08 at 09:14 +0100, Massa, Harald Armin wrote:
> And we should not forget to look for the reasons for the incubation of
> that many pure-Python drivers:

All very good points. That's why the doc I wrote:

http://wiki.postgresql.org/wiki/Driver_development

is specifically targeted at libpq-based drivers (which is repeated
several times).

I think it would be valuable to have a complete, pure-python driver
available (like the JDBC driver). That's a large project, however.

People who use a different python implementation understand that
libraries might not be as plentiful. It will be a while before there are
as many pure-python libraries as there are pure-java libraries.

Right now what we need is a driver toward which we can confidently
direct cPython users. It's faster to wrap libpq than to write a complete
driver. And if we don't have such a driver, we risk alienating an
important community for postgresql growth. So, the cost is lower and the
benefits are higher for wrapping libpq for the cPython users. At least,
that seems to be the case right now; things may change in the future.

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Jeff Davis

Date:

08 February 2010, 20:02:51

On Tue, 2010-02-09 at 10:46 +1100, Andrew McNamara wrote:
> The problem is deeper than that - when query parameters use the binary
> option, the server has no way to decode the binary parameter without an
> appropriate type OID.

Postgres does not attempt to decode anything (text or binary format)
until it figures out what type it is.

> As you say, postgres will cast types depending on context, however this
> is stricter when binary parameters are used (because they only have one
> valid interpretation, whereas a text parameter may have several).

Type casts are a different matter; they are only done after the unknown
literals' types have been determined:
 create table n(i int);
 -- insert numeric literal, which is cast to int (assignment cast) insert into n values(5.0); -- succeeds
 -- insert unknown literal, which is inferred to be of type int insert into n values('5.0'); -- fails on integer type
inputfunction ERROR:  invalid input syntax for integer: "5.0" LINE 1: insert into n values('5.0');
 


Can you provide a concrete example in which the text versus binary
format changes the type inference behavior?

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

08 February 2010, 21:51:25

>On Tue, 2010-02-09 at 10:46 +1100, Andrew McNamara wrote:
>> The problem is deeper than that - when query parameters use the binary
>> option, the server has no way to decode the binary parameter without an
>> appropriate type OID.
>
>Postgres does not attempt to decode anything (text or binary format)
>until it figures out what type it is.

How does it figure out what type it is? Either by the type oid passed by
the caller, or by the context if the type oid is "unknown".  

Now, with the text format parameters, the parser usually does the right
thing, since text formats have plenty of hints for us humans.

However, with the binary format, unless the caller tells us, there's no way
to tell whether we're correctly parsing the data. If the context implies
one type, but the user passes another, we'll either get an ugly error or,
worse, silently misparse their data.

Generally this isn't a big problem with python, as we have good type
information available. It's only an issue because people have gotten
used to the text parameter parsing being so forgiving.

Using my ocpgdb module, and interacting directly with the libpq wrapping
code, you can see how postgres reacts to various inputs:
   >>> from oclibpq import *   >>> from ocpgdb import pgoid   >>> db=PgConnection('')

No parameters:
   >>> r=db.execute('select 1', ())   >>> r.status   PGRES_TUPLES_OK   >>> list(r)   [(<PyPgCell name '?column?', type
23,modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)]
 

Int4 parameter, type specified:
   >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')])   >>> r.status   PGRES_TUPLES_OK   >>> list(r)
[(<PyPgCellname '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)]
 

Int4 parameter, type unknown, can't be determined from context:
   >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')])   >>> r.status   PGRES_FATAL_ERROR   >>>
r.errorMessage  'ERROR:  could not determine data type of parameter $1\n'
 

Int4 parameter, type unknown, can be determined from context:
   >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')])   >>> r.status   PGRES_TUPLES_OK   >>>
list(r)  [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)]
 

Text parameter, type unknown, mismatching context - surprising:
   >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')])   >>> r.status   PGRES_TUPLES_OK   >>> list(r)
[(<PyPgCellname '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)]
 

Date parameter, type unknown, int context, the value gets misinterpreted:
   >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')])   >>> r.status   PGRES_TUPLES_OK   >>> list(r)
 [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)]
 

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Jeff Davis

Date:

08 February 2010, 22:39:10

On Tue, 2010-02-09 at 12:51 +1100, Andrew McNamara wrote:
> Now, with the text format parameters, the parser usually does the right
> thing, since text formats have plenty of hints for us humans.

The parser doesn't care whether it's text format or binary format when
determining the type.

> However, with the binary format, unless the caller tells us, there's no way
> to tell whether we're correctly parsing the data. If the context implies
> one type, but the user passes another, we'll either get an ugly error or,
> worse, silently misparse their data.

The difference between text and binary format is this: after it has
already determined the type of the parameter, (a) if the format is text, it passes it to the type input function     to
constructthe value; or (b) if the format is binary, it passes it to the type recv function     to construct the value.

The argument to the input or recv functions may: (a) be valid input; or (b) be invalid input, and be detected as an
errorby the input or     recv function; or (c) be invalid input, and not be detected as an error by the input     or
recvfunction.

For a given type, the input function may be more likely to catch an
input error than the recv function; or the reverse. Either way, it is
very type-specific, and the only difference is the whether the input is
misinterpreted (type error not caught; bad) or an error is thrown (type
error caught; better). 

> Using my ocpgdb module, and interacting directly with the libpq wrapping
> code, you can see how postgres reacts to various inputs:

None of the examples show a difference in the inferred type of a text
versus binary parameter for the same query.

> No parameters:
> 
>     >>> r=db.execute('select 1', ())
>     >>> r.status
>     PGRES_TUPLES_OK
>     >>> list(r)
>     [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)]

Expected, because the literal 1 (without quotes) is an integer literal,
not an unknown literal.

> Int4 parameter, type specified:
> 
>     >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')])
>     >>> r.status
>     PGRES_TUPLES_OK
>     >>> list(r)
>     [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)]

Expected, because you specified the type, and sent the binary data to
the integer recv function, and it was valid input.

> Int4 parameter, type unknown, can't be determined from context:
> 
>     >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')])
>     >>> r.status
>     PGRES_FATAL_ERROR
>     >>> r.errorMessage
>     'ERROR:  could not determine data type of parameter $1\n'

Expected -- there is no context to determine the type.

Why do you call it an int4 parameter? It's just bytes, and you never
told postgres what they are (as you did in the previous example).

> Int4 parameter, type unknown, can be determined from context:
> 
>     >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')])
>     >>> r.status
>     PGRES_TUPLES_OK
>     >>> list(r)
>     [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)]

Expected: the function + provides the context that allows the server to
interpret the left argument as an integer.

(Again, not an int4 parameter, it's unknown)

> Text parameter, type unknown, mismatching context - surprising:
> 
>     >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')])
>     >>> r.status
>     PGRES_TUPLES_OK
>     >>> list(r)
>     [(<PyPgCell name '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)]

Expected, because this is exactly the same as the previous one except
for the data you pass in. Notice that the same type is inferred (23).

Why do you call this "mismatching context" when the context is exactly
the same as above? The only difference is which 4 bytes you provide. You
never told postgres that the bytes were text bytes anywhere.

You may think that it's doing 1111 + 1, but it's actually doing addition
on the bytes. That is apparent in the next example:

> Date parameter, type unknown, int context, the value gets misinterpreted:
> 
>     >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')])
>     >>> r.status
>     PGRES_TUPLES_OK
>     >>> list(r)
>     [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)]
> 

Expected, because the only thing that could possibly detect the error is
the int4recv function, which happens to accept any 4-byte input (so it
will never fail on any 4 bytes of data).

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

08 February 2010, 22:57:04

>For a given type, the input function may be more likely to catch an
>input error than the recv function; or the reverse. Either way, it is
>very type-specific, and the only difference is the whether the input is
>misinterpreted (type error not caught; bad) or an error is thrown (type
>error caught; better). 

This is the crux of the matter: the type input functions are universally
more forgiving since, by their nature, text formats are designed for us
fuzzy humans, and users of adapters have come to expect this. 

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Jeff Davis

Date:

09 February 2010, 00:20:41

On Tue, 2010-02-09 at 13:56 +1100, Andrew McNamara wrote:
> >For a given type, the input function may be more likely to catch an
> >input error than the recv function; or the reverse. Either way, it is
> >very type-specific, and the only difference is the whether the input is
> >misinterpreted (type error not caught; bad) or an error is thrown (type
> >error caught; better). 
> 
> This is the crux of the matter: the type input functions are universally
> more forgiving since, by their nature, text formats are designed for us
> fuzzy humans, and users of adapters have come to expect this. 

Except that it's exactly the opposite with integers. Pass any 4 bytes to
in4recv(), and it will accept it. However, try passing '4.0' to
int4in(), and you get an error.

If I had to make an educated guess about the forgiveness of various type
input and type recv functions, I would say that the recv functions are
more forgiving. After all, you would expect the binary format to be less
redundant, and therefore less likely to catch inconsistencies. I don't
see much of a universal truth there, however.

This is getting pretty far off-topic, so let's just leave it at that.
The drivers should support both formats; the type inference logic
doesn't care at all about the contents of the unknown literals (text or
binary); and queries should be written in such a way that the types are
unambiguous and unsurprising.

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Jeff Davis

Date:

09 February 2010, 00:32:51

On Mon, 2010-02-08 at 20:50 +0100, Florian Weimer wrote:
> I saw your note that you have to specify the types for date values
> etc.  Is this really desirable or even necessary?  Can't you specify
> the type as unknown (OID 705, I believe)?

I believe the problem that Andrew is describing is that:
 SELECT $1 + 1;

will infer that $1 is of type int4. But if you really intended $1 to be
a date (which is also valid), it will cause a problem.

If the date is passed in text format, it will cause an error in
int4in(), because the text representation of a date isn't a valid text
representation for an integer.

If the date is passed in binary format, it will pass it to int4recv() --
but because the date is 4 bytes, and int4recv is defined for any 4-byte
input, it won't cause an error; it will produce a wrong result. In other
words, the binary representation for a date _is_ a valid binary
representation for an integer. The type inference has found the wrong
type, but the recv function still accepts it, which causes a problem.

The solution is to write the query in an unambiguous way:
 SELECT $1::date + 1;

which is good practice, anyway. If it's not obvious to the type
inference system, it's probably not obvious to you, and will probably
surprise you ;)

Or, as Andrew suggests, you can pass the type oid along with the
parameter so that postgresql knows the right type.

Either way, relying on a type input or a recv function to cause a type
error is much more fragile.

Regards,Jeff Davis

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

09 February 2010, 02:55:15

>If the date is passed in binary format, it will pass it to int4recv() --
>but because the date is 4 bytes, and int4recv is defined for any 4-byte
>input, it won't cause an error; it will produce a wrong result. In other
>words, the binary representation for a date _is_ a valid binary
>representation for an integer. The type inference has found the wrong
>type, but the recv function still accepts it, which causes a problem.

Yes - of the worst kind: silent data corruption.

>The solution is to write the query in an unambiguous way:
>
>  SELECT $1::date + 1;
>
>which is good practice, anyway. If it's not obvious to the type
>inference system, it's probably not obvious to you, and will probably
>surprise you ;)

That address this specific case, but it's ugly and not general. The right
thing is to set the correct type when you're marshalling the parameters...

>Or, as Andrew suggests, you can pass the type oid along with the
>parameter so that postgresql knows the right type.

That's right - if using the binary parameters, you *must* pass an
appropriate type oid for the data you send to the server. If you use the
"unknown" oid, bad things will happen (sooner or later).

While this is strictly true of both binary and text parameters, text
parameters have enough redundancy built into the format that it's rarely
a problem. Users have come to expect this leniency.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

11 February 2010, 01:27:21

>That's just a matter of prioritizing the issues.  Put the big ones at 
>the top, the trivia at the bottom, [...]

I'd like to see a requirement for the use of PQexecParams() over PQexec() - 
even when using libpq's PQescapeStringConn(), PQexec() makes me uneasy.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Tom Lane

Date:

11 February 2010, 01:46:51

Andrew McNamara <andrewm@object-craft.com.au> writes:
>> That's just a matter of prioritizing the issues.  Put the big ones at 
>> the top, the trivia at the bottom, [...]

> I'd like to see a requirement for the use of PQexecParams() over PQexec() - 
> even when using libpq's PQescapeStringConn(), PQexec() makes me uneasy.

Such a rule seems pretty entirely pointless, unless you have a way to
enforce that the query string passed to the function hasn't been
assembled from parts somewhere along the way.
        regards, tom lane

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

11 February 2010, 02:04:22

>> I'd like to see a requirement for the use of PQexecParams() over PQexec() - 
>> even when using libpq's PQescapeStringConn(), PQexec() makes me uneasy.
>
>Such a rule seems pretty entirely pointless, unless you have a way to
>enforce that the query string passed to the function hasn't been
>assembled from parts somewhere along the way.

The point is that if the driver is doing the right thing, the user of
the driver at least has to choice to do things safely.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

"Greg Sabino Mullane"

Date:

11 February 2010, 13:09:26

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


>>The solution is to write the query in an unambiguous way:
>>
>>  SELECT $1::date + 1;
>>
>>which is good practice, anyway. If it's not obvious to the type
>>inference system, it's probably not obvious to you, and will probably
>>surprise you ;)
>
> That address this specific case, but it's ugly and not general. The right
> thing is to set the correct type when you're marshalling the parameters...

Well, ugly is in the eye of the beholder, and it certainly is a general 
solution. Any query with ambiguity in its parameters should explicitly 
declare the types, inside the query itself. Having the driver indicate 
the type should be the exception, not the rule.

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201002091811
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

12 February 2010, 03:28:45

>>>The solution is to write the query in an unambiguous way:
>>>
>>>  SELECT $1::date + 1;
>>>
>>>which is good practice, anyway. If it's not obvious to the type
>>>inference system, it's probably not obvious to you, and will probably
>>>surprise you ;)
>>
>> That address this specific case, but it's ugly and not general. The right
>> thing is to set the correct type when you're marshalling the parameters...
>
>Well, ugly is in the eye of the beholder, and it certainly is a general 
>solution. Any query with ambiguity in its parameters should explicitly 
>declare the types, inside the query itself. Having the driver indicate 
>the type should be the exception, not the rule.

You are missing the point: this is not about what types the SQL execution
sees. It is about making sure the correct recv function is applied to
the binary parameter data.  The server cannot reliably infer which recv
function to use based in query context (although it tries). A wrong
guess can lead to silent data corruption, which is utterly unacceptable.

If the client (driver) sets the type OID to match the format of the
binary parameter it sends, the server can unambiguously decode the data
(and cast the type, if need be).

I would go as far as to suggest that postgres should not accept binary
parameters with an "unknown" OID - it's dangerous, unreliable and serves
no purpose.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Tom Lane

Date:

12 February 2010, 10:31:46

Andrew McNamara <andrewm@object-craft.com.au> writes:
>>> The solution is to write the query in an unambiguous way:
>>> SELECT $1::date + 1;

> You are missing the point: this is not about what types the SQL execution
> sees. It is about making sure the correct recv function is applied to
> the binary parameter data.

Indeed, and the above locution does set that.
        regards, tom lane

Re: Confusion over Python drivers

From

Andrew McNamara

Date:

12 February 2010, 21:07:59

>Andrew McNamara <andrewm@object-craft.com.au> writes:
>>>> The solution is to write the query in an unambiguous way:
>>>> SELECT $1::date + 1;
>
>> You are missing the point: this is not about what types the SQL execution
>> sees. It is about making sure the correct recv function is applied to
>> the binary parameter data.
>
>Indeed, and the above locution does set that.

Sure, but it requires the driver to modify the query - that isn't
reasonable or practical.  Expecting the user to the driver to know
and correct set the type the driver will ultimately see is a recipe
for disaster.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Re: Confusion over Python drivers

From

Florian Weimer

Date:

14 February 2010, 15:43:18

* Jeff Davis:

> Agreed. Ultimately, the conversion has to be done somewhere, but I don't
> believe the driver is the place for it. Type conversions are always
> going to be imperfect, and this has some important consequences:
>  * The type conversion system will be endlessly tweaked to improve it
>  * Developers will always run into problems with it in any complex
> application, so we need to allow them to circumvent the system and do it
> themselves when necessary.

The downside is that passing strings up to the application may have
distinctly worse performance characteristics than passing a number.

> In ruby-pg, you can just do:
>
>   conn.exec("INSERT INTO foo VALUES($1)", ["Jeff"])
>
> And I think that's appropriate. What I'm saying is that there should
> still exist some way to pass explicit types or formats (although that
> should still be easier than it is in C ;). Here's the long form:
>
>   conn.exec("INSERT INTO foo VALUES($1)",
>     [{:value => "Jeff", :format => 0, :type => 0}])

Okay, this isn't too bad an API.  I will use the same approach.  In my
case, it means no transparent support for arrays, but per your own
guidelines, this is okay.

> That copies value so that foo and bar have the same contents: a 4 byte
> value "\000". What would happen though, if val was transparently
> decoded? It would decode it once in ruby, and again inside of postgres
> (in byteain), leaving you with a one byte value in bar, even though foo
> has a four-byte value.

I've never viewed it from this angle, and I agree that it makes sense.

Thanks for your observations and explanations, they were helpful.

Re: Confusion over Python drivers

From

Jeff Davis

Date:

15 February 2010, 23:56:06

On Sun, 2010-02-14 at 20:43 +0100, Florian Weimer wrote:
> The downside is that passing strings up to the application may have
> distinctly worse performance characteristics than passing a number.

Yes, that is a good point. I tried to clarify this in the doc.

I think this would fall under the optional type conversion convenience
functions. As long as it's explicit that the conversion is happening, I
think it's OK.

Regards,Jeff Davis