Thread: libpq URL syntax vs SQLAlchemy
I have been reviewing how our new libpq URL syntax compares against existing implementations of URL syntaxes in other drivers or higher-level access libraries. In the case of SQLAlchemy, there is an incompatibility regarding how Unix-domain sockets are specified. First, here is the documentation on that: http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html The recommended way to access a server over a Unix-domain socket is to leave off the host, as in: postgresql://user:password@/dbname In libpq, this is parsed as host='/dbname', no database. To specify a socket path in SQLAlchemy, you use: postgresql://user:password@/dbname?host=/var/lib/postgresql This also works in libpq (bizarrely, perhaps, considering the previous case). This libpq behavior is a problem for several reasons: - It's incompatible with a popular existing implementation. - It violates RFC 3986, which doesn't allow slashes in the "authority" (host, port, user, password) part. - As a consequence of this, URLs like this will be parsed differently (or will fail to be parsed) by existing URL parsinglibraries (tried Perl URI and Python urllib, for instance). - Moreover, if these libraries can't parse the URL, it might mean those drivers can't adopt that URL syntax. - It's internally inconsistent, as shown above. - In most places in PostgreSQL clients, no host means Unix-domain socket, but not here. - It favors the case of non-default Unix-domain socket plus default database over default Unix-domain socket plus non-defaultdatabase. - It's not obvious how to get to the default Unix-domain socket at all. "postgresql:///dbname" doesn't work, but "postgresql:///dbname?host="does. I think this whole approach of using unescaped slashes in the "host" part of the URL is going to cause lots of problems like this. We should consider one or more of: - Requiring percent escapes - Requiring specifying the socket path as a parameter, like in the above example - Requiring some delimiters like for IPv6 addresses (which had the same problem of reusing a reserved character) (probablya bad idea, since we can't make existing URL parsing libraries understand this)
On Wed, May 9, 2012 at 2:17 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > postgresql://user:password@/dbname > > In libpq, this is parsed as host='/dbname', no database. That is flat wrong. > - Requiring percent escapes And this is, IMHO, the right fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Peter Eisentraut <peter_e@gmx.net> writes: > I have been reviewing how our new libpq URL syntax compares against > existing implementations of URL syntaxes in other drivers or > higher-level access libraries. In the case of SQLAlchemy, there is an > incompatibility regarding how Unix-domain sockets are specified. > > First, here is the documentation on that: > http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html > > The recommended way to access a server over a Unix-domain socket is to > leave off the host, as in: > > postgresql://user:password@/dbname > > In libpq, this is parsed as host='/dbname', no database. Ah, good catch: thanks for heads up. I believe this was introduced lately in the dev cycle when we've noticed that users will have to specify some defaults explicitly to be able to override other defaults, while avoiding the whole "?keyword=value&..." business. I'll give this another look and will get back with a proposal to fix this in form of a patch. -- Regards, Alex
On 9 May 2012 19:17, Peter Eisentraut <peter_e@gmx.net> wrote: > I have been reviewing how our new libpq URL syntax compares against > existing implementations of URL syntaxes in other drivers or > higher-level access libraries. In the case of SQLAlchemy, there is an > incompatibility regarding how Unix-domain sockets are specified. Is there an open standard that already defines this? If there is an existing standard we should follow it, so we can quote "we now follow standard X". If there isn't one, can we create one? Can we propose an RFC that works for many data stores? If somebody can define that, I can push that through the relevant processes. Not because I wish PostgreSQL syntax to be badged as a standard, but because the world clearly needs a useful, open standard here. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
>>Simon Riggs wrote: >>On 9 May 2012 19:17, Peter Eisentraut wrote: >> >>> I have been reviewing how our new libpq URL syntax compares >>> against existing implementations of URL syntaxes in other drivers >>> or higher-level access libraries. In the case of SQLAlchemy, >>> there is an incompatibility regarding how Unix-domain sockets are >>> specified. >> >> Is there an open standard that already defines this? There are many. The most recent, as far as I know is RFC 3986, which updates one previous RFC and obsoletes three others. http://tools.ietf.org/html/rfc3986 We should also take the JDBC URL requirements into consideration. One unpleasant aspect of this is that what JDBC calls a "URL" is "jdbc:" followed by what could be a valid URI; but I don't see how the *whole thing* (including the leading "jdbc:" qualifies as a URI or URL). Unless someone has a better idea, I suggest that we make what follows the "jdbc:" portion of the JDBC "URL" match what we use for a URI for everything else. >> If there is an existing standard we should follow it +1 I don't know whether recent work on this has respected the standards. I hope so. -Kevin
On lör, 2012-05-12 at 10:32 +0100, Simon Riggs wrote: > On 9 May 2012 19:17, Peter Eisentraut <peter_e@gmx.net> wrote: > > > I have been reviewing how our new libpq URL syntax compares against > > existing implementations of URL syntaxes in other drivers or > > higher-level access libraries. In the case of SQLAlchemy, there is > an > > incompatibility regarding how Unix-domain sockets are specified. > > Is there an open standard that already defines this? As I wrote upthread, RFC 3986 is the latest version of the standard for URIs. But it's a multileveled matter, because in the simplest instance, a URI is scheme:something (compare mailto:), so in theory almost any URI can comply. But now that I read it once again, since our "something" starts with "//", we are bound to the more specific syntax defined there, and that makes our current implementation just plain invalid on the matter that I complained about in my earlier message.
Alex <ash@commandprompt.com> writes: > Peter Eisentraut <peter_e@gmx.net> writes: > >> I have been reviewing how our new libpq URL syntax compares against >> existing implementations of URL syntaxes in other drivers or >> higher-level access libraries. In the case of SQLAlchemy, there is an >> incompatibility regarding how Unix-domain sockets are specified. >> >> First, here is the documentation on that: >> http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html >> >> The recommended way to access a server over a Unix-domain socket is to >> leave off the host, as in: >> >> postgresql://user:password@/dbname >> >> In libpq, this is parsed as host='/dbname', no database. > > Ah, good catch: thanks for heads up. > > I believe this was introduced lately in the dev cycle when we've noticed > that users will have to specify some defaults explicitly to be able to > override other defaults, while avoiding the whole "?keyword=value&..." > business. > > I'll give this another look and will get back with a proposal to fix > this in form of a patch. Upon closer inspection of the issue I came to believe that the proper fix is to drop support for special treatment of "host part" starting with slash altogether. Attached is a patch to do that. While the following type of URIs is still valid, the interpretation is now different and is standards-conforming: the part after the double-slash is treated as "path" specification and not authority ("host".) postgres:///path-spec Since the path from a URI translates into dbname connection option, the only meaningful use of this type of URIs is the following: postgres:///mydb The host part in this case is empty (it is "hidden" between the "//" and the following "/",) thus local socket connection is employed for this type of URIs. To specify non-standard path to the local sockets directory use the familiar URI parameter: postgres:///db?host=/path/to/socket/dir Also, if my reading of the RFC is correct, the username, password and port specifiers may be omitted even if the corresponding designators are present in URI, so we need to remove some checks on empty URI parts. The test input and expected output files, the docs and code comments are also updated, of course. Finally, to complete the RFC compliance, I've added code to properly handle percent-encoding in query parameter keywords. At this point, I feel rather silly that we've produced and committed an "almost" compliant version, which still requires quite a bit of patching to become an RFC-conforming implementation... -- Regards, Alex
Attachment
----- Цитат от Alex Shulgin (ash@commandprompt.com), на 14.05.2012 в 18:16 ----- <br /><br />> Alex writes: <br />><br />> <br />> The host part in this case is empty (it is "hidden" between the "//" and <br />> the following"/",) thus local socket connection is employed for this <br />> type of URIs. To specify non-standard path tothe local sockets <br />> directory use the familiar URI parameter: <br />> <br />> postgres:///db?host=/path/to/socket/dir<br />> <br /><br />And why are we calling "host" the parameter that specifiesthe path for socket <br />dir - it is not host and could be confused with the host part of the URI (the <br />partbetween // and /). Why do not call it "path" ? So it will become: <br /><br />postgres:///db?path=/path/to/socket/dir<br /><br />Best regards <br /><br />-- <br />Luben Karavelov
karavelov@mail.bg writes: > ----- Цитат от Alex Shulgin (ash@commandprompt.com), на 14.05.2012 в 18:16 ----- > >> Alex <ash@commandprompt.com> writes: >> >> >> The host part in this case is empty (it is "hidden" between the "//" and >> the following "/",) thus local socket connection is employed for this >> type of URIs. To specify non-standard path to the local sockets >> directory use the familiar URI parameter: >> >> postgres:///db?host=/path/to/socket/dir >> > > And why are we calling "host" the parameter that specifies the path for socket > dir - it is not host and could be confused with the host part of the URI (the > part between // and /). Why do not call it "path" ? So it will become: > > postgres:///db?path=/path/to/socket/dir We call it that way since we rely on existing libpq code to interpret the value of every parameter in the URI (well, almost: with notable exception of translating "ssl=true" for JDBC compatibility.) I don't think anyone would confuse host part of the URI with URI parameter "?host=..." if we care to express things clearly in the documentation (which we do I believe.) Existing implementations, like that mentioned by Peter in the top message of this thread (SQLAlchemy or was it psycopg2?) already use this notation, so I don't think we can or should do anything about this, i.e. there's little point in renaming to "path" or merely supporting it as an alternative syntax. -- Alex
Alex Shulgin <ash@commandprompt.com> writes: > > Upon closer inspection of the issue I came to believe that the proper > fix is to drop support for special treatment of "host part" starting > with slash altogether. > > Attached is a patch to do that. Well, I understand I might be asking for too much, but did anyone had a chance to look at the correcting patch? We're having the first CommitFest of 9.3 in three weeks, so this better be dealt with before it's too late. I believe the correcting patch makes our implementation comply to RFC 3986. I can produce a patch against 9.1 for improved readability: the removal of special handling of '/' at the start of URI created a bit of mess in the correcting patch, so it might be easier to look at the combined effect of the committed and this one. Comments please? -- Regards, Alex
On mån, 2012-05-14 at 18:16 +0300, Alex Shulgin wrote: > Upon closer inspection of the issue I came to believe that the proper > fix is to drop support for special treatment of "host part" starting > with slash altogether. > > Attached is a patch to do that. Committed. I also updated the documentation and tests to show that percent-encoding a host part starting with slash also works, as discussed upthread.
Peter Eisentraut <peter_e@gmx.net> writes: > On mån, 2012-05-14 at 18:16 +0300, Alex Shulgin wrote: >> Upon closer inspection of the issue I came to believe that the proper >> fix is to drop support for special treatment of "host part" starting >> with slash altogether. >> >> Attached is a patch to do that. > > Committed. Many thanks! > I also updated the documentation and tests to show that percent-encoding > a host part starting with slash also works, as discussed upthread. Yes, that's a side-effect, but still might be useful to know. -- Regards, Alex