Re: WIP: URI connection string support for libpq - Mailing list pgsql-hackers

From Alexander Shulgin
Subject Re: WIP: URI connection string support for libpq
Date
Msg-id 1323852045-sup-7596@moon
Whole thread Raw
In response to Re: WIP: URI connection string support for libpq  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: WIP: URI connection string support for libpq
List pgsql-hackers
Excerpts from Greg Smith's message of Wed Dec 14 02:54:14 +0200 2011:
>
> Initial quick review of your patch:  you suggested this as the general form:
>
> psql -d postgresql://user@pw:host:port/dbname?param1=value1¶m2=value2...
>
> That's presumably supposed to be:
>
> psql -d postgresql://user:pw@host:port/dbname?param1=value1¶m2=value2...

Yes, that was clearly a typo, so "user:pw@host:port".

> If we had to pick one URI prefix, it should be "postgres".  But given
> the general name dysfunction around this project, I can't see how anyone
> would complain if we squat on "postgresql" too.

That'd be true if we've started afresh in the absence of any existing URI implementations.

IMO, what makes a connection URI useful is: a) it keeps all the connection parameters in a single string, so you can
easilysend it to other people to use, and b) it works everywhere, so the people who've got the URI can use it and
expectto get the same results as you do. 

(Well, not without some quirks, like effects of locally-set environment variables or presence of .pgpass or service
files,or different nameserver opinions about which hostname resolves to which IP address, but that is pretty much the
casewith any sort of URIs.) 

This is not in objection to what you say, but rather an important thing to keep in mind for the purpose of this
discussion.

Whatever decision we make here, the libpq-binding connectors are going to be compatible with each other automatically
ifthey just pass the URI to libpq.  However, should we stick to using "postgresql://" URI prefix exclusively, these
mightneed to massage the URI a bit before passing further (like replacing "postgres://" with "postgresql://", also
acceptingthe latter should be reasonable.)  With proper recommendations from our side, the new client code will use the
longerprefix, thus achieving compatibility with the only(?) driver not based on libpq (that is, JDBC) in the long run. 

> Attached patch modifies
> yours to prove we can trivially support both, in hopes of detonating
> this argument before it rages on further.  Tested like this:
>
> $ psql -d postgres://gsmith@localhost:5432/gsmith
>
> And that works too now.  I doubt either of us like what I did to the
> handoff between conninfo_uri_parse and conninfo_uri_parse_options to
> achieve that, but this feature is still young.

Yes, the caller could just do the pointer arithmetics itself, since the exact URI prefix is known at the time, then
passit to conninfo_uri_parse. 

> After this bit of tinkering with the code, it feels to me like this
> really wants a split() function to break out the two sides of a string
> across a delimiter, eating it in the process.  Adding the level of
> paranoia I'd like around every bit of code I see that does that type of
> operation right now would take a while.  Refactoring in terms of split
> and perhaps a few similarly higher-level string parsing operations,
> targeted for this job, might make it easier to focus on fortifying those
> library routines instead.  For example, instead of the gunk I just added
> that moves past either type of protocol prefix, I'd like to just say
> "split(buf,"://",&left,&right) and then move on with processing the
> right side.

A search with cscope over my repo clone doesn't give any results for "split", so I assume you're talking about a new
functionwith a signature similar to the following: 

split(char *buf, const char *delim, char **left, char **right)

Note, there should be no need for parameter "left", since that will be pointing to the start of "buf".  Also, we might
justreturn "right" as a function's value instead of using out-parameter, with NULL meaning delimiter was not found in
thebuffer. 

Now, if you look carefully at the patch's code, there are numerous places where it accepts either of two delimiting
charactersand needs to examine one before zeroing it out, so it'll need something more like this: 

char *need_a_good_name_for_this(char *buf, const char *possible_delims, char *actual_delim)

where it will store a copy of encountered delimiting char in *actual_delim before modifying the buffer.

> I agree with your comment that we need to add some sort of regression
> tests for this.  Given how the parsing is done right now, we'd want to
> come up with some interesting invalid strings too.  Making sure this
> fails gracefully (and not in a buffer overflow way) might even use
> something like fuzz testing too.  Around here we've just been building
> some Python scripting to do that sort of thing, tests that aren't
> practical to do with pg_regress.

I'd appreciate if you could point me to any specific example of such existing tests to take some inspiration from.

--
Regards,
Alex


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: pg_dump --exclude-table-data
Next
From: Tom Lane
Date:
Subject: Re: LibreOffice driver 1: Building libpq with Mozilla LDAP instead of OpenLDAP