Thread: libpq URL syntax vs SQLAlchemy

libpq URL syntax vs SQLAlchemy

From
Peter Eisentraut
Date:
I have been reviewing how our new libpq URL syntax compares against
existing implementations of URL syntaxes in other drivers or
higher-level access libraries.  In the case of SQLAlchemy, there is an
incompatibility regarding how Unix-domain sockets are specified.

First, here is the documentation on that:
http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html

The recommended way to access a server over a Unix-domain socket is to
leave off the host, as in:
   postgresql://user:password@/dbname

In libpq, this is parsed as host='/dbname', no database.

To specify a socket path in SQLAlchemy, you use:
   postgresql://user:password@/dbname?host=/var/lib/postgresql

This also works in libpq (bizarrely, perhaps, considering the previous
case).

This libpq behavior is a problem for several reasons:

- It's incompatible with a popular existing implementation.

- It violates RFC 3986, which doesn't allow slashes in the "authority" (host, port, user, password) part.

- As a consequence of this, URLs like this will be parsed differently (or will fail to be parsed) by existing URL
parsinglibraries (tried Perl URI and Python urllib, for instance).
 

- Moreover, if these libraries can't parse the URL, it might mean those drivers can't adopt that URL syntax.

- It's internally inconsistent, as shown above.

- In most places in PostgreSQL clients, no host means Unix-domain socket, but not here.

- It favors the case of non-default Unix-domain socket plus default database over default Unix-domain socket plus
non-defaultdatabase.
 

- It's not obvious how to get to the default Unix-domain socket at all. "postgresql:///dbname" doesn't work, but
"postgresql:///dbname?host="does.
 

I think this whole approach of using unescaped slashes in the "host"
part of the URL is going to cause lots of problems like this.  We should
consider one or more of:

- Requiring percent escapes

- Requiring specifying the socket path as a parameter, like in the above example

- Requiring some delimiters like for IPv6 addresses (which had the same problem of reusing a reserved character)
(probablya bad idea, since we can't make existing URL parsing libraries understand this)
 




Re: libpq URL syntax vs SQLAlchemy

From
Robert Haas
Date:
On Wed, May 9, 2012 at 2:17 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>    postgresql://user:password@/dbname
>
> In libpq, this is parsed as host='/dbname', no database.

That is flat wrong.

> - Requiring percent escapes

And this is, IMHO, the right fix.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: libpq URL syntax vs SQLAlchemy

From
Alex
Date:
Peter Eisentraut <peter_e@gmx.net> writes:

> I have been reviewing how our new libpq URL syntax compares against
> existing implementations of URL syntaxes in other drivers or
> higher-level access libraries.  In the case of SQLAlchemy, there is an
> incompatibility regarding how Unix-domain sockets are specified.
>
> First, here is the documentation on that:
> http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
>
> The recommended way to access a server over a Unix-domain socket is to
> leave off the host, as in:
>
>     postgresql://user:password@/dbname
>
> In libpq, this is parsed as host='/dbname', no database.

Ah, good catch: thanks for heads up.

I believe this was introduced lately in the dev cycle when we've noticed
that users will have to specify some defaults explicitly to be able to
override other defaults, while avoiding the whole "?keyword=value&..."
business.

I'll give this another look and will get back with a proposal to fix
this in form of a patch.

--
Regards,
Alex


Re: libpq URL syntax vs SQLAlchemy

From
Simon Riggs
Date:
On 9 May 2012 19:17, Peter Eisentraut <peter_e@gmx.net> wrote:

> I have been reviewing how our new libpq URL syntax compares against
> existing implementations of URL syntaxes in other drivers or
> higher-level access libraries.  In the case of SQLAlchemy, there is an
> incompatibility regarding how Unix-domain sockets are specified.

Is there an open standard that already defines this? If there is an
existing standard we should follow it, so we can quote "we now follow
standard X".

If there isn't one, can we create one? Can we propose an RFC that
works for many data stores?

If somebody can define that, I can push that through the relevant
processes. Not because I wish PostgreSQL syntax to be badged as a
standard, but because the world clearly needs a useful, open standard
here.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: libpq URL syntax vs SQLAlchemy

From
"Kevin Grittner"
Date:
>>Simon Riggs  wrote:
>>On 9 May 2012 19:17, Peter Eisentraut  wrote:
>>
>>> I have been reviewing how our new libpq URL syntax compares
>>> against existing implementations of URL syntaxes in other drivers
>>> or higher-level access libraries. In the case of SQLAlchemy,
>>> there is an incompatibility regarding how Unix-domain sockets are
>>> specified.
>>
>> Is there an open standard that already defines this?
There are many.  The most recent, as far as I know is RFC 3986, which
updates one previous RFC and obsoletes three others.
http://tools.ietf.org/html/rfc3986
We should also take the JDBC URL requirements into consideration. 
One unpleasant aspect of this is that what JDBC calls a "URL" is
"jdbc:" followed by what could be a valid URI; but I don't see how
the *whole thing* (including the leading "jdbc:" qualifies as a URI
or URL).  Unless someone has a better idea, I suggest that we make
what follows the "jdbc:" portion of the JDBC "URL" match what we use
for a URI for everything else.
>> If there is an existing standard we should follow it
+1

I don't know whether recent work on this has respected the standards.
I hope so.
-Kevin


Re: libpq URL syntax vs SQLAlchemy

From
Peter Eisentraut
Date:
On lör, 2012-05-12 at 10:32 +0100, Simon Riggs wrote:
> On 9 May 2012 19:17, Peter Eisentraut <peter_e@gmx.net> wrote:
> 
> > I have been reviewing how our new libpq URL syntax compares against
> > existing implementations of URL syntaxes in other drivers or
> > higher-level access libraries.  In the case of SQLAlchemy, there is
> an
> > incompatibility regarding how Unix-domain sockets are specified.
> 
> Is there an open standard that already defines this?

As I wrote upthread, RFC 3986 is the latest version of the standard for
URIs.  But it's a multileveled matter, because in the simplest instance,
a URI is

scheme:something

(compare mailto:), so in theory almost any URI can comply.  But now that
I read it once again, since our "something" starts with "//", we are
bound to the more specific syntax defined there, and that makes our
current implementation just plain invalid on the matter that I
complained about in my earlier message.




Re: libpq URL syntax vs SQLAlchemy

From
Alex Shulgin
Date:
Alex <ash@commandprompt.com> writes:

> Peter Eisentraut <peter_e@gmx.net> writes:
>
>> I have been reviewing how our new libpq URL syntax compares against
>> existing implementations of URL syntaxes in other drivers or
>> higher-level access libraries.  In the case of SQLAlchemy, there is an
>> incompatibility regarding how Unix-domain sockets are specified.
>>
>> First, here is the documentation on that:
>> http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
>>
>> The recommended way to access a server over a Unix-domain socket is to
>> leave off the host, as in:
>>
>>     postgresql://user:password@/dbname
>>
>> In libpq, this is parsed as host='/dbname', no database.
>
> Ah, good catch: thanks for heads up.
>
> I believe this was introduced lately in the dev cycle when we've noticed
> that users will have to specify some defaults explicitly to be able to
> override other defaults, while avoiding the whole "?keyword=value&..."
> business.
>
> I'll give this another look and will get back with a proposal to fix
> this in form of a patch.

Upon closer inspection of the issue I came to believe that the proper
fix is to drop support for special treatment of "host part" starting
with slash altogether.

Attached is a patch to do that.

While the following type of URIs is still valid, the interpretation is
now different and is standards-conforming: the part after the
double-slash is treated as "path" specification and not authority
("host".)

  postgres:///path-spec

Since the path from a URI translates into dbname connection option, the
only meaningful use of this type of URIs is the following:

  postgres:///mydb

The host part in this case is empty (it is "hidden" between the "//" and
the following "/",) thus local socket connection is employed for this
type of URIs.  To specify non-standard path to the local sockets
directory use the familiar URI parameter:

  postgres:///db?host=/path/to/socket/dir


Also, if my reading of the RFC is correct, the username, password and
port specifiers may be omitted even if the corresponding designators are
present in URI, so we need to remove some checks on empty URI parts.

The test input and expected output files, the docs and code comments are
also updated, of course.

Finally, to complete the RFC compliance, I've added code to properly
handle percent-encoding in query parameter keywords.


At this point, I feel rather silly that we've produced and committed an
"almost" compliant version, which still requires quite a bit of
patching to become an RFC-conforming implementation...

--
Regards,
Alex


Attachment

Re: libpq URL syntax vs SQLAlchemy

From
karavelov@mail.bg
Date:
----- Цитат от Alex Shulgin (ash@commandprompt.com), на 14.05.2012 в 18:16 ----- <br /><br />> Alex  writes: <br
/>><br />> <br />> The host part in this case is empty (it is "hidden" between the "//" and <br />> the
following"/",) thus local socket connection is employed for this <br />> type of URIs. To specify non-standard path
tothe local sockets <br />> directory use the familiar URI parameter: <br />> <br />>
postgres:///db?host=/path/to/socket/dir<br />> <br /><br />And why are we calling "host" the parameter that
specifiesthe path for socket <br />dir - it is not host and could be confused with the host part of the URI (the <br
/>partbetween // and /). Why do not call it "path" ? So it will become: <br /><br
/>postgres:///db?path=/path/to/socket/dir<br /><br />Best regards <br /><br />-- <br />Luben Karavelov 

Re: libpq URL syntax vs SQLAlchemy

From
Alex
Date:
karavelov@mail.bg writes:

> ----- Цитат от Alex Shulgin (ash@commandprompt.com), на 14.05.2012 в 18:16 -----
>
>> Alex <ash@commandprompt.com> writes:
>>
>>
>> The host part in this case is empty (it is "hidden" between the "//" and
>> the following "/",) thus local socket connection is employed for this
>> type of URIs.  To specify non-standard path to the local sockets
>> directory use the familiar URI parameter:
>>
>>   postgres:///db?host=/path/to/socket/dir
>>
>
> And why are we calling "host" the parameter that specifies the path for socket
> dir - it is not host and could be confused with the  host part of the URI (the
> part between // and /). Why do not call it "path" ? So it will become:
>
> postgres:///db?path=/path/to/socket/dir

We call it that way since we rely on existing libpq code to interpret
the value of every parameter in the URI (well, almost: with notable
exception of translating "ssl=true" for JDBC compatibility.)

I don't think anyone would confuse host part of the URI with URI
parameter "?host=..." if we care to express things clearly in the
documentation (which we do I believe.)

Existing implementations, like that mentioned by Peter in the top
message of this thread (SQLAlchemy or was it psycopg2?) already use this
notation, so I don't think we can or should do anything about this,
i.e. there's little point in renaming to "path" or merely supporting it
as an alternative syntax.

--
Alex



Re: libpq URL syntax vs SQLAlchemy

From
Alex
Date:
Alex Shulgin <ash@commandprompt.com> writes:
>
> Upon closer inspection of the issue I came to believe that the proper
> fix is to drop support for special treatment of "host part" starting
> with slash altogether.
>
> Attached is a patch to do that.

Well, I understand I might be asking for too much, but did anyone had a
chance to look at the correcting patch?  We're having the first
CommitFest of 9.3 in three weeks, so this better be dealt with before
it's too late.

I believe the correcting patch makes our implementation comply to RFC
3986.

I can produce a patch against 9.1 for improved readability: the removal
of special handling of '/' at the start of URI created a bit of mess in
the correcting patch, so it might be easier to look at the combined
effect of the committed and this one.

Comments please?

--
Regards,
Alex


Re: libpq URL syntax vs SQLAlchemy

From
Peter Eisentraut
Date:
On mån, 2012-05-14 at 18:16 +0300, Alex Shulgin wrote:
> Upon closer inspection of the issue I came to believe that the proper
> fix is to drop support for special treatment of "host part" starting
> with slash altogether.
> 
> Attached is a patch to do that. 

Committed.

I also updated the documentation and tests to show that percent-encoding
a host part starting with slash also works, as discussed upthread.




Re: libpq URL syntax vs SQLAlchemy

From
Alex
Date:
Peter Eisentraut <peter_e@gmx.net> writes:

> On mån, 2012-05-14 at 18:16 +0300, Alex Shulgin wrote:
>> Upon closer inspection of the issue I came to believe that the proper
>> fix is to drop support for special treatment of "host part" starting
>> with slash altogether.
>>
>> Attached is a patch to do that.
>
> Committed.

Many thanks!

> I also updated the documentation and tests to show that percent-encoding
> a host part starting with slash also works, as discussed upthread.

Yes, that's a side-effect, but still might be useful to know.

--
Regards,
Alex