Thread: WIP: URI connection string support for libpq
Hello Hackers, Attached is a work-in-progress patch for URI connection string syntax support in libpq. The recent discussion (also pointingto the original one) is here: http://archives.postgresql.org/message-id/1321899990-sup-1235@moon The patch adds support for the following syntax in psql, by adding special handling of dbname parameter, when it starts with"postgresql://", e.g: psql -d postgresql://user@pw:host:port/dbname?param1=value1¶m2=value2... Virtually every component of the above syntax is optional, ultimately allowing for, e.g: psql -d postgresql:/// to specify local connection via Unix socket, with default port, user name, dbname, etc. URI percent-encoding is handled, in particular, allowing to include special symbols in the embedded password string, or tospecify non-standard Unix socket location, like the following: psql -d postgresql://%2Fvar%2Fpgsql%2Ftmp/mydb The patch applies cleanly against the master branch and compiles w/o errors or warnings. No tests were broken by this patchon my box, as far as I can tell. The patch follows design initially proposed and tries to address feedback gathered from the recent discussion. Special provisionwas made to improve compatibility with JDBC's connection URIs, by treating "ssl=true" parameter as equivalent of"sslmode=require". The patch intentionally omits documentation changes, to focus on the desired behavior and new code design. I've put reasonable effort into testing the new code by feeding various parameters to "psql -d". However, if there's a facilityfor writing formal regression tests against psql, I'd be happy to use that. I'm also adding this to the next open CommitFest: 2012-01. -- Regards, Alex
Attachment
On Mon, Dec 12, 2011 at 2:06 PM, Alexander Shulgin <ash@commandprompt.com> wrote: > > > psql -d postgresql://user@pw:host:port/dbname?param1=value1¶m2=value2... > I'd like to make the controversial proposal that the URL prefix should be "postgres:" instead of "postgresql:". Postgres is a widely accepted nickname for the project, and is eminently more pronounceable. Once the url is established it will be essentially impossible to change later, but right now only a nearly insurmountable mailing list thread prevents it. Excluding references to the "postgresql.org" domain, there are already 5x as many references in the source code to "postgres" (2583 lines) than to "postgresql" (539 lines). Taking into account that the name of the binary and the usual Unix user are already postgres, having one less place which would eventually need changing seems like a good plan overall. Here is, for those who have understandably blocked this argument from their memory, a link to the existing wiki document on the pros and cons of the two names: http://wiki.postgresql.org/wiki/Postgres Over at Heroku decided to side with Tom's assessment that "arguably, the 1996 decision to call it PostgreSQL instead of reverting to plain Postgres was the single worst mistake this project ever made." (And we at Heroku have also frustratingly split our references and occasionally used the "SQL" form.) Although I do not have the stomach to push for a full renaming blitz, I felt I must at least make a case for not making the situation any worse. My apologies in advance for re-opening this can of worms. Best regards, -pvh PS: It is not in any way shape or form relevant to my argument, nor do I claim that anyone else should care, but in the spirit of full disclosure, and depending on how you count, we currently have somewhere between 250,000 and 500,000 URLs which begin with postgres:// in our care. -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Dec 12, 2011, at 3:55 PM, Peter van Hardenberg wrote: > I'd like to make the controversial proposal that the URL prefix should > be "postgres:" instead of "postgresql:". Postgres is a widely accepted > nickname for the project, and is eminently more pronounceable. Once > the url is established it will be essentially impossible to change > later, but right now only a nearly insurmountable mailing list thread > prevents it. What happened to SexQL? David
On Mon, Dec 12, 2011 at 5:05 PM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 12, 2011, at 3:55 PM, Peter van Hardenberg wrote: >> only a nearly insurmountable mailing list thread >> prevents it. > > What happened to SexQL? > Case in point. -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Mon, Dec 12, 2011 at 6:55 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: > I'd like to make the controversial proposal that the URL prefix should > be "postgres:" instead of "postgresql:". Postgres is a widely accepted > nickname for the project, and is eminently more pronounceable. Once > the url is established it will be essentially impossible to change > later, but right now only a nearly insurmountable mailing list thread > prevents it. That, and the fact the JDBC is already doing it the other way. A reasonable compromise might be to accept either one. AIUI, part of what Alexander was aiming for here was to "unite the clans", so to speak, and it would seem a bit unfriendly (and certainly counter-productive as regards that goal) to pull the rug out from him by refusing to support that syntax over what is basically a supermassive bikeshed. However, being generous in what we accept won't cost anything, so why not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of Tue Dec 13 23:31:32 +0200 2011: > > On Mon, Dec 12, 2011 at 6:55 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: > > I'd like to make the controversial proposal that the URL prefix should > > be "postgres:" instead of "postgresql:". Postgres is a widely accepted > > nickname for the project, and is eminently more pronounceable. Once > > the url is established it will be essentially impossible to change > > later, but right now only a nearly insurmountable mailing list thread > > prevents it. > > That, and the fact the JDBC is already doing it the other way. A > reasonable compromise might be to accept either one. AIUI, part of > what Alexander was aiming for here was to "unite the clans", so to > speak, and it would seem a bit unfriendly (and certainly > counter-productive as regards that goal) to pull the rug out from him > by refusing to support that syntax over what is basically a > supermassive bikeshed. However, being generous in what we accept > won't cost anything, so why not? (oops, misfired... now sending to the list) I was going to put a remark about "adding to the soup" here, but realized that if this is actually committed, "the soup"is gonna be like this: libpq-supported syntax vs. everything else (think JDBC, or is there any other driver in the wildnot using libpq?) This is in the ideal world, where every binding is updated to embrace the new syntax and users haveupdated all of their systems, etc. Before that, why don't also accept "psql://", "pgsql://", "postgre://" and anything else? Or wait, aren't we adding to thesoup again (or rather putting the soup right into libpq?)
On 12/13/2011 05:45 PM, Alexander Shulgin wrote: > Before that, why don't also accept "psql://", "pgsql://", "postgre://" > and anything else? Or wait, aren't we adding to the soup again (or > rather putting the soup right into libpq?) There are multiple URI samples within PostgreSQL drivers in the field, here are two I know of what I believe to be a larger number of samples that all match in this regard: http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html These two are using "postgres". One of the hopes in adding URI support was to make it possible for the libpq spec to look similar to the ones already floating around, so that they'd all converge. Using a different prefix than the most popular ones have already adopted isn't a good way to start that. Now, whenever the URI discussion wanders off into copying the JDBC driver I wonder again why that's relevant. But making the implementation look like what people have already deployed surely is, particularly if there's no downside to doing that. Initial quick review of your patch: you suggested this as the general form: psql -d postgresql://user@pw:host:port/dbname?param1=value1¶m2=value2... That's presumably supposed to be: psql -d postgresql://user:pw@host:port/dbname?param1=value1¶m2=value2... This variation worked here: $ psql -d postgresql://gsmith@localhost:5432/gsmith If we had to pick one URI prefix, it should be "postgres". But given the general name dysfunction around this project, I can't see how anyone would complain if we squat on "postgresql" too. Attached patch modifies yours to prove we can trivially support both, in hopes of detonating this argument before it rages on further. Tested like this: $ psql -d postgres://gsmith@localhost:5432/gsmith And that works too now. I doubt either of us like what I did to the handoff between conninfo_uri_parse and conninfo_uri_parse_options to achieve that, but this feature is still young. After this bit of tinkering with the code, it feels to me like this really wants a split() function to break out the two sides of a string across a delimiter, eating it in the process. Adding the level of paranoia I'd like around every bit of code I see that does that type of operation right now would take a while. Refactoring in terms of split and perhaps a few similarly higher-level string parsing operations, targeted for this job, might make it easier to focus on fortifying those library routines instead. For example, instead of the gunk I just added that moves past either type of protocol prefix, I'd like to just say "split(buf,"://",&left,&right) and then move on with processing the right side. I agree with your comment that we need to add some sort of regression tests for this. Given how the parsing is done right now, we'd want to come up with some interesting invalid strings too. Making sure this fails gracefully (and not in a buffer overflow way) might even use something like fuzz testing too. Around here we've just been building some Python scripting to do that sort of thing, tests that aren't practical to do with pg_regress. Probably be better from the project's perspective if such things were in Perl instead; so far no one has ever paid me enough to stomach writing non-trivial things in Perl. Perhaps you are more diverse. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Attachment
On 12/13/2011 04:54 PM, Greg Smith wrote: > On 12/13/2011 05:45 PM, Alexander Shulgin wrote: >> Before that, why don't also accept "psql://", "pgsql://", "postgre://" >> and anything else? Or wait, aren't we adding to the soup again (or >> rather putting the soup right into libpq?) > > There are multiple URI samples within PostgreSQL drivers in the field, > here are two I know of what I believe to be a larger number of samples > that all match in this regard: > > http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html > http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html > > These two are using "postgres". One of the hopes in adding URI support > was to make it possible for the libpq spec to look similar to the ones > already floating around, so that they'd all converge. Using a different > prefix than the most popular ones have already adopted isn't a good way > to start that. Now, whenever the URI discussion wanders off into copying > the JDBC driver I wonder again why that's relevant. Because the use of Java/JDBC dwarfs both of your examples combined. Don't get me wrong, I love Python (everyone knows this) but in terms of where the work is being done it is still in Java for the most part, by far. That said, I am not really arguing against your other points except to answer your question. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579
On 12/13/2011 08:11 PM, Joshua D. Drake wrote: > Because the use of Java/JDBC dwarfs both of your examples combined. > Don't get me wrong, I love Python (everyone knows this) but in terms > of where the work is being done it is still in Java for the most part, > by far. I was talking about better targeting a new userbase, and I think that one is quite a bit larger than the current PostgreSQL+JDBC one. I just don't see any value in feeding them any Java inspired cruft. As for total size, Peter's comment mentioned having >250,000 installations using URIs already. While they support other platforms now, I suspect the majority of those are still running Heroku's original Ruby product offering. The first link I pointed at was one of the Ruby URI examples. While I do still have more Java-based customers here, there's enough Rails ones mixed in that I wouldn't say JDBC dwarfs them anymore even even for me. As for the rest of the world, I direct you toward https://github.com/erh/mongo-jdbc as a sign of the times. "experimental" because there's not much demand for JDBC in web app land anymore. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Tue, Dec 13, 2011 at 07:54:14PM -0500, Greg Smith wrote: > After this bit of tinkering with the code, it feels to me like this > really wants a split() function to break out the two sides of a > string across a delimiter, eating it in the process. Adding the > level of paranoia I'd like around every bit of code I see that does > that type of operation right now would take a while. Refactoring in > terms of split and perhaps a few similarly higher-level string > parsing operations, targeted for this job, might make it easier to > focus on fortifying those library routines instead. For example, > instead of the gunk I just added that moves past either type of > protocol prefix, I'd like to just say "split(buf,"://",&left,&right) > and then move on with processing the right side. FWIW, python calls this operation "partition", as in: left, delim, right = buf.partition("://") Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
Excerpts from Greg Smith's message of Wed Dec 14 02:54:14 +0200 2011: > > Initial quick review of your patch: you suggested this as the general form: > > psql -d postgresql://user@pw:host:port/dbname?param1=value1¶m2=value2... > > That's presumably supposed to be: > > psql -d postgresql://user:pw@host:port/dbname?param1=value1¶m2=value2... Yes, that was clearly a typo, so "user:pw@host:port". > If we had to pick one URI prefix, it should be "postgres". But given > the general name dysfunction around this project, I can't see how anyone > would complain if we squat on "postgresql" too. That'd be true if we've started afresh in the absence of any existing URI implementations. IMO, what makes a connection URI useful is: a) it keeps all the connection parameters in a single string, so you can easilysend it to other people to use, and b) it works everywhere, so the people who've got the URI can use it and expectto get the same results as you do. (Well, not without some quirks, like effects of locally-set environment variables or presence of .pgpass or service files,or different nameserver opinions about which hostname resolves to which IP address, but that is pretty much the casewith any sort of URIs.) This is not in objection to what you say, but rather an important thing to keep in mind for the purpose of this discussion. Whatever decision we make here, the libpq-binding connectors are going to be compatible with each other automatically ifthey just pass the URI to libpq. However, should we stick to using "postgresql://" URI prefix exclusively, these mightneed to massage the URI a bit before passing further (like replacing "postgres://" with "postgresql://", also acceptingthe latter should be reasonable.) With proper recommendations from our side, the new client code will use the longerprefix, thus achieving compatibility with the only(?) driver not based on libpq (that is, JDBC) in the long run. > Attached patch modifies > yours to prove we can trivially support both, in hopes of detonating > this argument before it rages on further. Tested like this: > > $ psql -d postgres://gsmith@localhost:5432/gsmith > > And that works too now. I doubt either of us like what I did to the > handoff between conninfo_uri_parse and conninfo_uri_parse_options to > achieve that, but this feature is still young. Yes, the caller could just do the pointer arithmetics itself, since the exact URI prefix is known at the time, then passit to conninfo_uri_parse. > After this bit of tinkering with the code, it feels to me like this > really wants a split() function to break out the two sides of a string > across a delimiter, eating it in the process. Adding the level of > paranoia I'd like around every bit of code I see that does that type of > operation right now would take a while. Refactoring in terms of split > and perhaps a few similarly higher-level string parsing operations, > targeted for this job, might make it easier to focus on fortifying those > library routines instead. For example, instead of the gunk I just added > that moves past either type of protocol prefix, I'd like to just say > "split(buf,"://",&left,&right) and then move on with processing the > right side. A search with cscope over my repo clone doesn't give any results for "split", so I assume you're talking about a new functionwith a signature similar to the following: split(char *buf, const char *delim, char **left, char **right) Note, there should be no need for parameter "left", since that will be pointing to the start of "buf". Also, we might justreturn "right" as a function's value instead of using out-parameter, with NULL meaning delimiter was not found in thebuffer. Now, if you look carefully at the patch's code, there are numerous places where it accepts either of two delimiting charactersand needs to examine one before zeroing it out, so it'll need something more like this: char *need_a_good_name_for_this(char *buf, const char *possible_delims, char *actual_delim) where it will store a copy of encountered delimiting char in *actual_delim before modifying the buffer. > I agree with your comment that we need to add some sort of regression > tests for this. Given how the parsing is done right now, we'd want to > come up with some interesting invalid strings too. Making sure this > fails gracefully (and not in a buffer overflow way) might even use > something like fuzz testing too. Around here we've just been building > some Python scripting to do that sort of thing, tests that aren't > practical to do with pg_regress. I'd appreciate if you could point me to any specific example of such existing tests to take some inspiration from. -- Regards, Alex
This submission has turned into a bit of a mess. I did the closest thing to a review the day after it was submitted; follow-up review attempts had issues applying the patch. And it's been stuck there. The patch is still fine, I just tested it out to pick this back up myself again. I think this one is a good advocacy feature, and most of the hard work is done already. Smooth some edge cases and this will be ready to go. First thing: the URI prefix. It is possible to connect using a URI in Python+SQL Alchemy, which was mentioned before as not too relevant due to their also requiring a driver name. As documented at http://docs.sqlalchemy.org/en/latest/core/engines.html and demonstrated at http://packages.python.org/Flask-SQLAlchemy/config.html , it is possible to leave off the driver part of the connection string. That assumes the default driver, such that postgresql:// does the same as postgresql+psycopg2:// , sensibly. That means we absolutely have an installed base of URI speaking developers split between postgresql:// (Python) and postgres:// (Ruby). Given that, there really isn't a useful path forward that helps out all those developers without supporting both prefixes. That's where this left off before, I just wanted to emphasize how clear that need seems now. Next thing, also mentioned at that Flask page. SQLite has standardized the idea that sqlite:////absolute/path/to/foo.db is a URI pointing to a file. Given that, I wonder if Alex's syntax for specifying a socket file name might adopt that syntax, rather than requiring the hex encoding: postgresql://%2Fvar%2Fpgsql%2Ftmp/mydb It's not a big deal, but it would smooth another rough edge toward making the Postgres URI implementation look as close as possible to others. So far I've found only one syntax that I expected this to handle that it rejects: psql -d postgresql://gsmith@localhost It's picky about needing that third slash, but that shouldn't be hard to fix. I started collecting up all the variants that do work as an initial shell script regression test, so that changes don't break something that already works. Here are all the variations that already work, setup so that a series of "1" outputs is passing: psql -d postgresql://gsmith@localhost:5432/gsmith -At -c "SELECT 1" psql -d postgresql://gsmith@localhost/gsmith -At -c "SELECT 1" psql -d postgresql://localhost:5432/gsmith -At -c "SELECT 1" psql -d postgresql://localhost/gsmith -At -c "SELECT 1" psql -d postgresql://gsmith@localhost:5432/ -At -c "SELECT 1" psql -d postgresql://gsmith@localhost/ -At -c "SELECT 1" psql -d postgresql://localhost:5432/ -At -c "SELECT 1" psql -d postgresql://localhost/gsmith -At -c "SELECT 1" psql -d postgresql://localhost/ -At -c "SELECT 1" psql -d postgresql:/// -At -c "SELECT 1" psql -d postgresql://%6Cocalhost/ -At -c "SELECT 1" psql -d postgresql://localhost/gsmith?user=gsmith -At -c "SELECT 1" psql -d postgresql://localhost/gsmith?user=gsmith&port=5432 -At -c "SELECT 1" psql -d postgresql://localhost/gsmith?user=gsmith\&port=5432 -At -c "SELECT 1" Replace all the "gsmith" with $USER to make this usable for others. My eyes are starting to cross when I look at URI now, so that's enough for today. If Alex wants to rev this soon, great; if not I have a good idea what I'd like to do with this next, regardless of that. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Greg Smith <greg@2ndQuadrant.com> writes: Thank you for the review, Greg! > Given that, there really isn't a useful path forward that helps out > all those developers without supporting both prefixes. That's where > this left off before, I just wanted to emphasize how clear that need > seems now. OK, I've used the code from your earlier review to support the short prefix. I sincerely hope we don't make the situation any worse by being flexible about the prefix... > Next thing, also mentioned at that Flask page. SQLite has > standardized the idea that sqlite:////absolute/path/to/foo.db is a URI > pointing to a file. Given that, I wonder if Alex's syntax for > specifying a socket file name might adopt that syntax, rather than > requiring the hex encoding: postgresql://%2Fvar%2Fpgsql%2Ftmp/mydb > It's not a big deal, but it would smooth another rough edge toward > making the Postgres URI implementation look as close as possible to > others. Yeah, this is really appealing, however how do you tell if the part after the last slash is a socket directory name or a dbname? E.g: psql postgres:///path/to/different/socket/dir (default dbname) psql postgres:///path/to/different/socket/dir/other (dbname=other ?) If we treat the whole URI string as the path to the socket dir (which I find the most intuitive way to do it,) the only way to specify a non-default dbname is to use query parameter: psql postgres:///path/to/different/socket/dir?dbname=other or pass another -d flag to psql *after* the URI: psql [-d] postgres:///path/to/different/socket/dir -d other Reasonable? > So far I've found only one syntax that I expected this to handle that > it rejects: > > psql -d postgresql://gsmith@localhost > > It's picky about needing that third slash, but that shouldn't be hard > to fix. Yeah, good that you've spotted it. If my reading of the URI RFC (2396) is correct, the question mark and query parameters may follow the hostname, w/o that slash too, like this: psql -d postgresql://localhost?user=gsmith So this made me relax some checks and rewrite the code a bit. > I started collecting up all the variants that do work as an initial > shell script regression test, so that changes don't break something > that already works. Here are all the variations that already work, > setup so that a series of "1" outputs is passing: > [snip] Yes, the original code was just a bit too picky about URI component separators. Attached also is a simplified test shell script. I have also added a warning message for when a query parameter is not recognized and being ignored. Not sure if plain fprintf to stderr is accepted practice for libpq, please correct if you have better idea. -- Regards, Alex *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *************** *** 282,287 **** static const PQEnvironmentOption EnvironmentOptions[] = --- 282,290 ---- } }; + /* The connection URI must start with one the following designators: */ + static const char uri_designator[] = "postgresql://"; + static const char short_uri_designator[] = "postgres://"; static bool connectOptions1(PGconn *conn, const char *conninfo); static bool connectOptions2(PGconn *conn); *************** *** 297,303 **** static PQconninfoOption *conninfo_parse(const char *conninfo, static PQconninfoOption *conninfo_array_parse(const char *const * keywords, const char *const * values, PQExpBuffer errorMessage, bool use_defaults, int expand_dbname); ! static char *conninfo_getval(PQconninfoOption *connOptions, const char *keyword); static void defaultNoticeReceiver(void *arg, const PGresult *res); static void defaultNoticeProcessor(void *arg, const char *message); --- 300,325 ---- static PQconninfoOption *conninfo_array_parse(const char *const * keywords, const char *const * values, PQExpBuffer errorMessage, bool use_defaults, int expand_dbname); ! static PQconninfoOption *conninfo_uri_parse(const char *uri, ! PQExpBuffer errorMessage); ! static bool conninfo_uri_parse_options(PQconninfoOption *options, ! const char *uri, char *buf, ! PQExpBuffer errorMessage); ! static bool conninfo_uri_parse_params(char *params, ! PQconninfoOption *connOptions, ! PQExpBuffer errorMessage); ! static char *conninfo_uri_decode(const char *str, PQExpBuffer errorMessage); ! static bool get_hexdigit(char digit, int *value); ! static const char *conninfo_getval(PQconninfoOption *connOptions, ! const char *keyword); ! static PQconninfoOption *conninfo_storeval(PQconninfoOption *connOptions, ! const char *keyword, const char *value, ! PQExpBuffer errorMessage, bool ignoreMissing); ! static PQconninfoOption *conninfo_store_uri_encoded_value( ! PQconninfoOption *connOptions, ! const char *keyword, const char *encoded_value, ! PQExpBuffer errorMessage, bool ignoreMissing); ! static PQconninfoOption *conninfo_find(PQconninfoOption *connOptions, const char *keyword); static void defaultNoticeReceiver(void *arg, const PGresult *res); static void defaultNoticeProcessor(void *arg, const char *message); *************** *** 580,586 **** PQconnectStart(const char *conninfo) static void fillPGconn(PGconn *conn, PQconninfoOption *connOptions) { ! char *tmp; /* * Move option values into conn structure --- 602,608 ---- static void fillPGconn(PGconn *conn, PQconninfoOption *connOptions) { ! const char *tmp; /* * Move option values into conn structure *************** *** 3739,3745 **** ldapServiceLookup(const char *purl, PQconninfoOption *options, static int parseServiceInfo(PQconninfoOption *options, PQExpBuffer errorMessage) { ! char *service = conninfo_getval(options, "service"); char serviceFile[MAXPGPATH]; char *env; bool group_found = false; --- 3761,3767 ---- static int parseServiceInfo(PQconninfoOption *options, PQExpBuffer errorMessage) { ! const char *service = conninfo_getval(options, "service"); char serviceFile[MAXPGPATH]; char *env; bool group_found = false; *************** *** 4129,4161 **** conninfo_parse(const char *conninfo, PQExpBuffer errorMessage, } /* ! * Now we have the name and the value. Search for the param record. */ ! for (option = options; option->keyword != NULL; option++) ! { ! if (strcmp(option->keyword, pname) == 0) ! break; ! } ! if (option->keyword == NULL) ! { ! printfPQExpBuffer(errorMessage, ! libpq_gettext("invalid connection option \"%s\"\n"), ! pname); ! PQconninfoFree(options); ! free(buf); ! return NULL; ! } ! ! /* ! * Store the value ! */ ! if (option->val) ! free(option->val); ! option->val = strdup(pval); ! if (!option->val) { - printfPQExpBuffer(errorMessage, - libpq_gettext("out of memory\n")); PQconninfoFree(options); free(buf); return NULL; --- 4151,4160 ---- } /* ! * Now that we have the name and the value, store the record. */ ! if (!conninfo_storeval(options, pname, pval, errorMessage, false)) { PQconninfoFree(options); free(buf); return NULL; *************** *** 4276,4283 **** conninfo_array_parse(const char *const * keywords, const char *const * values, /* first find "dbname" if any */ if (strcmp(pname, "dbname") == 0) { /* next look for "=" in the value */ ! if (pvalue && strchr(pvalue, '=')) { /* * Must be a conninfo string, so parse it, but do not use --- 4275,4297 ---- /* first find "dbname" if any */ if (strcmp(pname, "dbname") == 0) { + /* + * Look for URI prefix. This has to come before the check for "=", + * since URI might as well contain "=" if extra parameters are + * given. + */ + if (pvalue && ( + (strncmp(pvalue, uri_designator, + sizeof(uri_designator) - 1) == 0) || + (strncmp(pvalue, short_uri_designator, + sizeof(short_uri_designator) - 1) == 0))) + { + str_options = conninfo_uri_parse(pvalue, errorMessage); + if (str_options == NULL) + return NULL; + } /* next look for "=" in the value */ ! else if (pvalue && strchr(pvalue, '=')) { /* * Must be a conninfo string, so parse it, but do not use *************** *** 4452,4467 **** conninfo_array_parse(const char *const * keywords, const char *const * values, return options; } static char * conninfo_getval(PQconninfoOption *connOptions, const char *keyword) { PQconninfoOption *option; for (option = connOptions; option->keyword != NULL; option++) { if (strcmp(option->keyword, keyword) == 0) ! return option->val; } return NULL; --- 4466,4955 ---- return options; } + /* + * Connection URI parser routine + * + * If successful, a malloc'd PQconninfoOption array is returned. + * If not successful, NULL is returned and an error message is + * left in errorMessage. + * + * This is only a wrapper for conninfo_uri_parse_options, which does all of the + * heavy lifting, while this function handles proper (de-)allocation of memory + * buffers. + */ + static PQconninfoOption * + conninfo_uri_parse(const char *uri, PQExpBuffer errorMessage) + { + PQconninfoOption *options; + char *buf; + + resetPQExpBuffer(errorMessage); + + /* Make a working copy of PQconninfoOptions */ + options = malloc(sizeof(PQconninfoOptions)); + if (options == NULL) + { + printfPQExpBuffer(errorMessage, libpq_gettext("out of memory\n")); + return NULL; + } + memcpy(options, PQconninfoOptions, sizeof(PQconninfoOptions)); + + /* Make a writable copy of the URI buffer */ + buf = strdup(uri); + if (buf == NULL) + { + printfPQExpBuffer(errorMessage, libpq_gettext("out of memory\n")); + free(options); + return NULL; + } + + if (!conninfo_uri_parse_options(options, uri, buf, errorMessage)) + { + free(options); + options = NULL; + } + + free(buf); + return options; + } + + /* + * Connection URI parser: actual parser routine + * + * If successful, returns true while the options array is filled with parsed + * options from the passed URI + * If not successful, returns false and fills errorMessage accordingly. + * + * Parses the connection URI string in 'buf' (while destructively modifying + * it,) according to the URI syntax below. The 'uri' parameter is only used + * for error reporting and 'buf' should initially contain a writable copy of + * 'uri'. + * + * The general form for connection URI is the following: + * + * postgresql://user:pw@host:port/database + * + * To specify a IPv6 host address, enclose the address in square brackets: + * + * postgresql://[::1]/database + * + * Connection parameters may follow the base URI using this syntax: + * + * postgresql://host/database?param1=value1¶m2=value2&... + */ + static bool + conninfo_uri_parse_options(PQconninfoOption *options, const char* uri, + char *buf, PQExpBuffer errorMessage) + { + const char *host; + + char *start = buf; + char *p; + char lastc; + + /* Assume URI prefix is already verified by the caller */ + if (strncmp(start, uri_designator, sizeof(uri_designator) - 1) == 0) + start += sizeof(uri_designator) - 1; + else + start += sizeof(short_uri_designator) - 1; + p = start; + + /* Look ahead for possible user credentials designator */ + while (*p && *p != '@' && *p != '/') + ++p; + if (*p != '@') + { + /* Reset to start of URI and parse as hostname/addr instead */ + p = start; + } + else + { + char *user = start; + + p = user; + while (*p != ':' && *p != '@') + ++p; + + /* Save last char and cut off at end of user name */ + lastc = *p; + *p = '\0'; + + if (!conninfo_store_uri_encoded_value(options, "user", user, + errorMessage, false)) + return false; + + if (lastc == ':') + { + const char *password = p + 1; + + while (*p != '@') + ++p; + *p = '\0'; + + if (!conninfo_store_uri_encoded_value(options, "password", password, + errorMessage, false)) + return false; + } + + /* Advance past end of parsed user name or password token */ + ++p; + } + + /* Look for IPv6 address */ + if (*p == '[') + { + host = ++p; + while (*p && *p != ']') + ++p; + if (!*p) + { + printfPQExpBuffer(errorMessage, + libpq_gettext("end of string reached when looking for matching ']' in IPv6 host address inURI: %s\n"), + uri); + return false; + } + if (p == host) + { + printfPQExpBuffer(errorMessage, + libpq_gettext("IPv6 host address may not be empty in URI: %s\n"), + uri); + return false; + } + + /* Cut off the bracket and advance */ + *(p++) = '\0'; + + /* + * The address must be followed by a port specifier or a slash or a + * query. + */ + if (*p && *p != ':' && *p != '/' && *p != '?') + { + printfPQExpBuffer(errorMessage, + libpq_gettext("unexpected '%c' at position %td in URI (expecting ':' or '/'): %s\n"), + *p, p - buf + 1, uri); + return false; + } + } + else /* no IPv6 address */ + { + host = p; + /* + * Look for port specifier (colon) or end of host specifier (slash), or + * query (question mark). + */ + while (*p && *p != ':' && *p != '/' && *p != '?') + ++p; + } + + /* Save the hostname terminator before we null it */ + lastc = *p; + *p = '\0'; + + if (!conninfo_store_uri_encoded_value(options, "host", host, errorMessage, + false)) + return false; + + if (lastc == ':') + { + const char *port = ++p; /* advance past host terminator */ + + while (*p && *p != '/' && *p != '?') + ++p; + + lastc = *p; + *p = '\0'; + + if (!conninfo_store_uri_encoded_value(options, "port", port, + errorMessage, false)) + return false; + } + + if (lastc && lastc != '?') + { + const char *dbname = ++p; /* advance past host terminator */ + + /* Look for query parameters */ + while (*p && *p != '?') + ++p; + + lastc = *p; + *p = '\0'; + + if (!conninfo_store_uri_encoded_value(options, "dbname", dbname, + errorMessage, false)) + return false; + } + + if (lastc) + { + ++p; /* advance past terminator */ + + if (!conninfo_uri_parse_params(p, options, errorMessage)) + return false; + } + + return true; + } + + /* + * Connection URI parameters parser routine + * + * If successful, returns true while connOptions is filled with parsed + * parameters. + * If not successful, returns false and fills errorMessage accordingly. + * + * Destructively modifies 'params' buffer. + */ + static bool + conninfo_uri_parse_params(char *params, + PQconninfoOption *connOptions, + PQExpBuffer errorMessage) + { + char *token; + char *savep; + + while ((token = strtok_r(params, "&", &savep))) + { + const char *keyword; + const char *value; + char *p = strchr(token, '='); + + if (p == NULL) + { + printfPQExpBuffer(errorMessage, + libpq_gettext("missing key/value separator '=' in URI query parameter: %s\n"), + token); + return false; + } + + /* Cut off keyword and advance to value */ + *(p++) = '\0'; + + keyword = token; + value = p; + + /* Special keyword handling for improved JDBC compatibility */ + if (strcmp(keyword, "ssl") == 0 && + strcmp(value, "true") == 0) + { + keyword = "sslmode"; + value = "require"; + } + + /* + * Store the value if corresponding option was found -- ignore + * otherwise. + * + * In theory, the keyword might be also percent-encoded, but hardly + * that's ever needed by anyone. + */ + if (!conninfo_store_uri_encoded_value(connOptions, keyword, value, + errorMessage, true)) + { + fprintf(stderr, + libpq_gettext("WARNING: ignoring unrecognized URI query parameter: %s\n"), + keyword); + } + + params = NULL; /* proceed to the next token with strtok_r */ + } + + return true; + } + + /* + * Connection URI decoder routine + * + * If successful, returns the malloc'd decoded string. + * If not successful, returns NULL and fills errorMessage accordingly. + * + * The string is decoded by replacing any percent-encoded tokens with + * corresponding characters, while preserving any non-encoded characters. A + * percent-encoded token is a character triplet: a percent sign, followed by a + * pair of hexadecimal digits (0-9A-F), where lower- and upper-case letters are + * treated identically. + */ static char * + conninfo_uri_decode(const char *str, PQExpBuffer errorMessage) + { + char *buf = malloc(strlen(str) + 1); + char *p = buf; + const char *q = str; + + if (buf == NULL) + { + printfPQExpBuffer(errorMessage, libpq_gettext("out of memory\n")); + return NULL; + } + + for (;;) + { + if (*q != '%') + { + /* copy and check for NUL terminator */ + if (!(*(p++) = *(q++))) + break; + } + else + { + int hi; + int lo; + + ++q; /* skip the percent sign itself */ + + if (!(get_hexdigit(*q++, &hi) && get_hexdigit(*q++, &lo))) + { + printfPQExpBuffer(errorMessage, + libpq_gettext("invalid percent-encoded token (a pair of hexadecimal digits expected afterevery '%%' symbol, use '%%25' to encode percent symbol itself): %s\n"), + str); + free(buf); + return NULL; + } + + *(p++) = (hi << 4) | lo; + } + } + + return buf; + } + + /* + * Convert hexadecimal digit character to it's integer value. + * + * If successful, returns true and value is filled with digit's base 16 value. + * If not successful, returns false. + * + * Lower- and upper-case letters in the range A-F are treated identically. + */ + static bool + get_hexdigit(char digit, int *value) + { + if ('0' <= digit && digit <= '9') + { + *value = digit - '0'; + } + else if ('A' <= digit && digit <= 'F') + { + *value = digit - 'A' + 10; + } + else if ('a' <= digit && digit <= 'f') + { + *value = digit - 'a' + 10; + } + else + { + return false; + } + + return true; + } + + /* + * Find an option value corresponding to the keyword in the connOptions array. + * + * If successful, returns a pointer to the corresponding option's value. + * If not successful, returns NULL. + */ + static const char * conninfo_getval(PQconninfoOption *connOptions, const char *keyword) { + PQconninfoOption *option = conninfo_find(connOptions, keyword); + + return option ? option->val : NULL; + } + + /* + * Store a (new) value for an option corresponding to the keyword in + * connOptions array. + * + * If successful, returns a pointer to the corresponding PQconninfoOption, + * which value is replaced with a strdup'd copy of the passed value string. + * The existing value for the option is free'd before replacing, if any. + * + * If not successful, returns NULL and fills errorMessage accordingly. + */ + static PQconninfoOption * + conninfo_storeval(PQconninfoOption *connOptions, + const char *keyword, const char *value, + PQExpBuffer errorMessage, bool ignoreMissing) + { + PQconninfoOption *option = conninfo_find(connOptions, keyword); + char *value_copy; + + if (option == NULL) + { + if (!ignoreMissing) + printfPQExpBuffer(errorMessage, + libpq_gettext("invalid connection option \"%s\"\n"), + keyword); + return NULL; + } + + value_copy = strdup(value); + if (value_copy == NULL) + { + printfPQExpBuffer(errorMessage, libpq_gettext("out of memory\n")); + return NULL; + } + + if (option->val) + free(option->val); + option->val = value_copy; + + return option; + } + + /* + * Store a (new) possibly URI-encoded value for an option corresponding to the + * keyword in connOptions array. + * + * If successful, returns a pointer to the corresponding PQconninfoOption, + * which value is replaced with a URI-decoded copy of the passed value string. + * The existing value for the option is free'd before replacing, if any. + * + * If not successful, returns NULL and fills errorMessage accordingly. If the + * corresponding option was not found, but ignoreMissing is true: doesn't fill + * errorMessage. + * + * See also conninfo_storeval. + */ + static PQconninfoOption * + conninfo_store_uri_encoded_value(PQconninfoOption *connOptions, + const char *keyword, const char *encoded_value, + PQExpBuffer errorMessage, bool ignoreMissing) + { + PQconninfoOption *option; + char *decoded_value = conninfo_uri_decode(encoded_value, errorMessage); + + if (decoded_value == NULL) + return NULL; + + option = conninfo_storeval(connOptions, keyword, decoded_value, + errorMessage, ignoreMissing); + free(decoded_value); + + return option; + } + + /* + * Find a PQconninfoOption option corresponding to the keyword in the + * connOptions array. + * + * If successful, returns a pointer to the corresponding PQconninfoOption + * structure. + * If not successful, returns NULL. + */ + static PQconninfoOption * + conninfo_find(PQconninfoOption *connOptions, const char *keyword) + { PQconninfoOption *option; for (option = connOptions; option->keyword != NULL; option++) { if (strcmp(option->keyword, keyword) == 0) ! return option; } return NULL; #!/bin/sh while read uri; do psql -d ${uri} -At -c "SELECT 1"; done <<EOF postgresql://${USER}@localhost:5432/${USER} postgresql://${USER}@localhost/${USER} postgresql://localhost:5432/${USER} postgresql://localhost/${USER} postgresql://${USER}@localhost:5432/ postgresql://${USER}@localhost/ postgresql://localhost:5432/ postgresql://localhost:5432 postgresql://localhost/${USER} postgresql://localhost/ postgresql://localhost postgresql:/// postgresql:// postgresql://%6Cocalhost/ postgresql://localhost/${USER}?user=${USER} postgresql://localhost/${USER}?user=${USER}&port=5432 postgresql://localhost/${USER}?user=${USER}&port=5432 postgresql://localhost:5432?user=${USER} postgresql://localhost?user=${USER} postgresql://localhost?uzer= postgresql://localhost? postgresql://[::1]:5432/${USER} postgresql://[::1]/${USER} postgresql://[::1]/ postgresql://[::1] postgres:// EOF
* Alex Shulgin: > Yeah, this is really appealing, however how do you tell if the part > after the last slash is a socket directory name or a dbname? E.g: > > psql postgres:///path/to/different/socket/dir (default dbname) > psql postgres:///path/to/different/socket/dir/other (dbname=other ?) The HTTP precent is to probe the file system until you find something. Most HTTP servers have something similar to the PATH_INFO variable which captures trailing path segments. It's ugly, but it's standard practice, and seems better than a separate -d parameter (which sort of defeats the purpose of URIs). -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Florian Weimer <fweimer@bfk.de> writes: > * Alex Shulgin: > >> Yeah, this is really appealing, however how do you tell if the part >> after the last slash is a socket directory name or a dbname? E.g: >> >> psql postgres:///path/to/different/socket/dir (default dbname) >> psql postgres:///path/to/different/socket/dir/other (dbname=other ?) > > The HTTP precent is to probe the file system until you find something. > Most HTTP servers have something similar to the PATH_INFO variable which > captures trailing path segments. > > It's ugly, but it's standard practice, and seems better than a separate > -d parameter (which sort of defeats the purpose of URIs). Hm, do you see anything what's wrong with "?dbname=other" if you don't like a separate -d? -- Alex
* Alex Shulgin: >> It's ugly, but it's standard practice, and seems better than a separate >> -d parameter (which sort of defeats the purpose of URIs). > > Hm, do you see anything what's wrong with "?dbname=other" if you don't > like a separate -d? It's not nice URI syntax, but it's better than an out-of-band mechanism. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Le vendredi 24 février 2012 14:18:44, Florian Weimer a écrit : > * Alex Shulgin: > >> It's ugly, but it's standard practice, and seems better than a separate > >> -d parameter (which sort of defeats the purpose of URIs). > > > > Hm, do you see anything what's wrong with "?dbname=other" if you don't > > like a separate -d? > > It's not nice URI syntax, but it's better than an out-of-band mechanism. I've not followed all the mails about this feature but I don't find it is a nice syntax too. "?dbname=other" looks like dbname is an argument, but dbname is a requirement for postgresql connexion. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On 02/25/2012 09:37 PM, Cédric Villemain wrote: > > I've not followed all the mails about this feature but I don't find it is a > nice syntax too. > > "?dbname=other" looks like dbname is an argument, but dbname is a requirement > for postgresql connexion. Ugh, not really. AFAIK, dbname is a connection option which defaults to $USER, unless overridden on command line or in the environment (or via a service file.) -- Alex
Le vendredi 24 février 2012 14:18:44, Florian Weimer a écrit : > * Alex Shulgin: > >> It's ugly, but it's standard practice, and seems better than a separate > >> -d parameter (which sort of defeats the purpose of URIs). > > > > Hm, do you see anything what's wrong with "?dbname=other" if you don't > > like a separate -d? > > It's not nice URI syntax, but it's better than an out-of-band mechanism. I've not followed all the mails about this feature but I don't find it is a nice syntax too. "?dbname=other" looks like dbname is an argument, but dbname is a requirement for postgresql connexion. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On 02/24/2012 03:18 PM, Florian Weimer wrote: > > * Alex Shulgin: > >>> It's ugly, but it's standard practice, and seems better than a separate >>> -d parameter (which sort of defeats the purpose of URIs). >> >> Hm, do you see anything what's wrong with "?dbname=other" if you don't >> like a separate -d? > > It's not nice URI syntax, but it's better than an out-of-band mechanism. Attached is v5 of the patch, adding support for local Unix socket directory specification w/o the need to percent-encode path separators. The path to directory must start with forward slash, like so: postgres:///path/to/socket/dir To specify non-default dbname use URI query parameters: postgres:///path/to/socket/dir?dbname=other Username/password should be also specified on query parameters in this case, as opposed to "user:pw@host" syntax supported by host URIs. -- Alex
Attachment
On ons, 2012-02-22 at 12:26 -0500, Greg Smith wrote: > I started collecting up all the variants that do work as an > initial shell script regression test, so that changes don't break > something that already works. Here are all the variations that > already work, setup so that a series of "1" outputs is passing: Let's please add something like this to the patch. Otherwise, I foresee a lot of potential to break corner cases in the future.
On 03/06/2012 01:09 AM, Peter Eisentraut wrote: > > On ons, 2012-02-22 at 12:26 -0500, Greg Smith wrote: >> I started collecting up all the variants that do work as an >> initial shell script regression test, so that changes don't break >> something that already works. Here are all the variations that >> already work, setup so that a series of "1" outputs is passing: > > Let's please add something like this to the patch. Otherwise, I foresee > a lot of potential to break corner cases in the future. I've included a (separate) test shell script based on Greg's cases in one of the updates. What would be preferred place to plug it in? Override installcheck in libpq Makefile? -- Alex
On tis, 2012-03-06 at 10:11 +0200, Alexander Shulgin wrote: > On 03/06/2012 01:09 AM, Peter Eisentraut wrote: > > > > On ons, 2012-02-22 at 12:26 -0500, Greg Smith wrote: > >> I started collecting up all the variants that do work as an > >> initial shell script regression test, so that changes don't break > >> something that already works. Here are all the variations that > >> already work, setup so that a series of "1" outputs is passing: > > > > Let's please add something like this to the patch. Otherwise, I foresee > > a lot of potential to break corner cases in the future. > > I've included a (separate) test shell script based on Greg's cases in > one of the updates. What would be preferred place to plug it in? > Override installcheck in libpq Makefile? I think that would be the right place.
Peter Eisentraut <peter_e@gmx.net> writes: >> I've included a (separate) test shell script based on Greg's cases in >> one of the updates. What would be preferred place to plug it in? >> Override installcheck in libpq Makefile? > > I think that would be the right place. I figured that adding this right into src/interfaces/libpq is polluting the source dir, so I've used src/test instead. Attached v6 adds src/test/uri directory complete with the test script, expected output file and a Makefile which responds to installcheck. README file also included. -- Alex
Attachment
On ons, 2012-03-07 at 18:31 +0200, Alex Shulgin wrote: > I figured that adding this right into src/interfaces/libpq is > polluting the source dir, so I've used src/test instead. I would prefer src/interfaces/libpq/test, to keep it close to the code.
On 03/07/2012 08:03 PM, Peter Eisentraut wrote: > > On ons, 2012-03-07 at 18:31 +0200, Alex Shulgin wrote: >> I figured that adding this right into src/interfaces/libpq is >> polluting the source dir, so I've used src/test instead. > > I would prefer src/interfaces/libpq/test, to keep it close to the code. Hm, actually that makes more sense and is not unprecedented (I see ecpg has it's own 'test' subdir.) Apparently I was under false impression that all regression tests are concentrated under $(topdir)/src/test. I'll post an updated patch shortly (unless someone like to argue to keep the tests where they are now.)
On 03/07/2012 09:16 PM, Alexander Shulgin wrote: > >> I would prefer src/interfaces/libpq/test, to keep it close to the code. > > Hm, actually that makes more sense and is not unprecedented (I see ecpg > has it's own 'test' subdir.) Apparently I was under false impression > that all regression tests are concentrated under $(topdir)/src/test. > > I'll post an updated patch shortly (unless someone like to argue to keep > the tests where they are now.) And here it is attached (v7.) The test code now lives under libpq/test. A colleague of mine also pointed out that expanded PGUSER/PGPORT vars slipped into the expected.out file in the previous version, so that was not really useful for testing. The new version addresses the above issue by expanding shell vars in a separate step. The test lines moved to separate file 'regress.in,' since we are expanding the variables manually now (no need to use heredoc.) After moving the test lines to separate file I've noticed that it was identical to the expected output file. So I've thought it would be nice to add some failing URIs as well (improves code coverage.) I did that and one test highlighted a minor bug, which I've also fixed. For that, I decided to move previously extracted parts of code back to the main parser routine, as it was getting too ugly to pass all the required local vars to them. I hope this won't confuse the ones who had a chance to review previous version too much. -- Regards, Alex