Thread: Inconsistency in libpq connection parameters, and extension thereof

Inconsistency in libpq connection parameters, and extension thereof

From
Daniel Farina
Date:
Hello list,

I have been playing with the URI connection strings in the bleeding
edge 9.2 and noticed an inconsistency with the old connection string
behavior:

$ psql 'host=/var/run/postgresql dbname=postgres arbitrary=property'
psql: invalid connection option "arbitrary"

(psql exits with an error code immediately)

vs.

$ psql postgres://%2Fvar%2Frun%2Fpostgresql/?arbitrary=property
WARNING: ignoring unrecognized URI query parameter: arbitrary

(subsequent successful connection)

Both of these appear to be checked entirely client-side in libpq.

I view both of these approaches as problematic, because what I really
want for a more ambitious set of projects is to pass arbitrary
properties in the startup packet from libpq, as so that FEBE proxies
and potentially Postgres backend hooks/extensions can have a chance to
process the extra startup properties.

The clear downside of deferring some error messages until
post-connection is that typos would be harder to catch, so it seems
that warning should be retained for un-processed properties, but
instead be issued by the server.  That might also make
forward-compatibility of libpq more complete, although I know that is
not an express goal (consider cases like when application_name was
added as a valid connection parameter, though).

This could also be useful for Postgres extensions.

Please send objections on the basis of principle.

If objections on the basis of principle are not heard or are
addressable, I'll follow-up with a summary of places where we might
need changes and ask for objections on mechanism next.

-- 
fdr


Daniel Farina <daniel@heroku.com> writes:
> I have been playing with the URI connection strings in the bleeding
> edge 9.2 and noticed an inconsistency with the old connection string
> behavior:

> $ psql 'host=/var/run/postgresql dbname=postgres arbitrary=property'
> psql: invalid connection option "arbitrary"

> vs.

> $ psql postgres://%2Fvar%2Frun%2Fpostgresql/?arbitrary=property
> WARNING: ignoring unrecognized URI query parameter: arbitrary

Um.  We oughta fix that.  I'm not necessarily wedded to the old
throw-an-error definition, but there seems no good reason for these
two syntaxes to act inconsistently.

> I view both of these approaches as problematic, because what I really
> want for a more ambitious set of projects is to pass arbitrary
> properties in the startup packet from libpq, as so that FEBE proxies
> and potentially Postgres backend hooks/extensions can have a chance to
> process the extra startup properties.

The connection string does not really seem like the right place for that.

> The clear downside of deferring some error messages until
> post-connection is that typos would be harder to catch,

Or, indeed, impossible to catch.  What if the typo causes libpq to fail
to contact the server at all?  Good luck getting any assistance from
libpq in spotting your typo, if that happens and we've adopted this
sort of approach.

We already have a mechanism (PGOPTIONS, aka options=foo) for passing
through settings that will not be interpreted by libpq.  I think that
stuff that is meant to be handled at the server end should be confined
to that.
        regards, tom lane


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Daniel Farina
Date:
On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniel Farina <daniel@heroku.com> writes:
>> I have been playing with the URI connection strings in the bleeding
>> edge 9.2 and noticed an inconsistency with the old connection string
>> behavior:
>
>> $ psql 'host=/var/run/postgresql dbname=postgres arbitrary=property'
>> psql: invalid connection option "arbitrary"
>
>> vs.
>
>> $ psql postgres://%2Fvar%2Frun%2Fpostgresql/?arbitrary=property
>> WARNING: ignoring unrecognized URI query parameter: arbitrary
>
> Um.  We oughta fix that.  I'm not necessarily wedded to the old
> throw-an-error definition, but there seems no good reason for these
> two syntaxes to act inconsistently.

I agree with that.  The URIs may have been done this way as a
concession to some small fragmentation that may have taken place
before URIs were standardized, but perhaps the author can speak to
that (he has been put on the To: list for this mail).

>> The clear downside of deferring some error messages until
>> post-connection is that typos would be harder to catch,
>
> Or, indeed, impossible to catch.  What if the typo causes libpq to fail
> to contact the server at all?  Good luck getting any assistance from
> libpq in spotting your typo, if that happens and we've adopted this
> sort of approach.
>
> We already have a mechanism (PGOPTIONS, aka options=foo) for passing
> through settings that will not be interpreted by libpq.  I think that
> stuff that is meant to be handled at the server end should be confined
> to that.

Interesting. Okay, I did not know about that, and didn't want to go so
far as to suggest a new thing to handle such options, but since there
is an older thing, that's handy.  However, it seems like as-is it may
be too ugly to have is nested in an additional layer of quoting when
not being specified via environment variable. Does that seem like a
generally agreeable statement?

If that is the case, is there a convention we can use to separate the
parts of the connection string (in both representations) into the
parts sent to the server and the part that the client needs?  We
already abuse this a little bit because URI syntax (in general, not
just our rendition of it) leaves little room for extension for
parameters on the client side.  Consider ?sslmode=require.

In both representations, the net effect of a typo would be that
instead of magically reading some properties on the client side,
they'd be sent to the server.  How often is this going  to be so wrong
that one cannot send a response from the server indicating to the user
their error?  On casual inspection it doesn't seem like prohibitively
often, but I haven't mulled over that for very long.

On that topic, the description of it in the libpq source is a little
bit mystifying, as I did see it but passed over it:
{"options", "PGOPTIONS", DefaultOption, NULL,"Backend-Debug-Options", "D", 40},

If they're just options, what does this have to do with debugging? I
presumed some more specific purpose.

To end my email with a request for a pony, I think it'd be reasonable
to send these kinds of options again even while remaining connected.
One might be able appropriate "SET" statements to that purpose, but I
am not sure.  This is mostly to the purpose of routing software, but I
could imagine other uses...

--
fdr


Daniel Farina <daniel@heroku.com> writes:
> On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We already have a mechanism (PGOPTIONS, aka options=foo) for passing
>> through settings that will not be interpreted by libpq. �I think that
>> stuff that is meant to be handled at the server end should be confined
>> to that.

> Interesting. Okay, I did not know about that, and didn't want to go so
> far as to suggest a new thing to handle such options, but since there
> is an older thing, that's handy.  However, it seems like as-is it may
> be too ugly to have is nested in an additional layer of quoting when
> not being specified via environment variable. Does that seem like a
> generally agreeable statement?

Yeah, I was wondering the same thing.  The current syntax for options
certainly carries plenty of historical baggage.  The main point here
IMO is that libpq should have some way of telling parameters-for-the-
server from things that are meant to be its own parameters.

> If that is the case, is there a convention we can use to separate the
> parts of the connection string (in both representations) into the
> parts sent to the server and the part that the client needs?

Rather than imagining this as two parts of the connection string,
I think it would be nicer if we could have a convention that modifies
the names of server-side parameters.  In the connection string syntax
we could perhaps do something like
'host=localhost dbname=foo x:param=value1 x:anotherparam=value2'

This preserves the benefits of order-independence of the items, and
seems at least a little bit more able to recognize typos than your
original concept.  I don't know if the specific notation "x:" would
translate well into URI notation, but if not that, maybe there's
something else that would.

> In both representations, the net effect of a typo would be that
> instead of magically reading some properties on the client side,
> they'd be sent to the server.  How often is this going  to be so wrong
> that one cannot send a response from the server indicating to the user
> their error?

If you misspell "host" for instance, you've got a problem.  More
generally, if you misspell "user" or "dbname" you're likely to get
an authentication-related error that would not be at all helpful at
leading your mind to the problem.  (And it's not like these things
are so consistently named that nobody ever gets 'em wrong...)

I would not expect the server to start examining optional parameters
until the connection has been successfully authenticated, so any
typo in the basic connection properties is going to lead to unhelpful
behavior if it causes us to think the item is an optional parameter
rather than a misspelled one.
        regards, tom lane


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Daniel Farina
Date:
On Tue, Jun 5, 2012 at 8:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The main point here
> IMO is that libpq should have some way of telling parameters-for-the-
> server from things that are meant to be its own parameters.

I agree with this.

>> If that is the case, is there a convention we can use to separate the
>> parts of the connection string (in both representations) into the
>> parts sent to the server and the part that the client needs?
>
> Rather than imagining this as two parts of the connection string,
> I think it would be nicer if we could have a convention that modifies
> the names of server-side parameters.  In the connection string syntax
> we could perhaps do something like
>
>        'host=localhost dbname=foo x:param=value1 x:anotherparam=value2'
>
> This preserves the benefits of order-independence of the items, and
> seems at least a little bit more able to recognize typos than your
> original concept.  I don't know if the specific notation "x:" would
> translate well into URI notation, but if not that, maybe there's
> something else that would.

How about "x-param=foo&x-anotherparam=bar"?  I think this could work
in the connection syntax as well: "host=localhost x-param=value1".

It doesn't scan as visibly as ":", but probably will cause less
escaping pain across the board.  It also looks a lot like a
lower-cased HTTP header or email-header, so people might make the
right association right away.

>> In both representations, the net effect of a typo would be that
>> instead of magically reading some properties on the client side,
>> they'd be sent to the server.  How often is this going  to be so wrong
>> that one cannot send a response from the server indicating to the user
>> their error?
>
> If you misspell "host" for instance, you've got a problem.  More
> generally, if you misspell "user" or "dbname" you're likely to get
> an authentication-related error that would not be at all helpful at
> leading your mind to the problem.  (And it's not like these things
> are so consistently named that nobody ever gets 'em wrong...)

Well, I wonder if there's a way to mitigate that, but I think as long
as we're satisfied thus far with a prefix there's no need to talk
about this particular thorn, as it can be considered sidestepped.

If we go so far as to namespace with something like "x:" or "x-" and
to have such a rigorous concept of extension in protocol related
matters, we're in a good place to have a nice cohesive expansion (at
some later time) into more "x-..." headers exchanged between host in
client mid-communication, beyond just startup.  At some later date.

Would these hypothetical extension-pairs be using the "options" device
at startup time, or something else (possibly brand new)?  I'm not
ideally informed as to contemplate on if reuse of an existing thing
makes sense.

--
fdr


Daniel Farina <daniel@heroku.com> writes:
> Would these hypothetical extension-pairs be using the "options" device
> at startup time, or something else (possibly brand new)?

I'd argue for just translating them into "options", at least in the
near term.  If they use some new mechanism then they would only work
with new servers, and it's generally not good for libpq to assume much
about what version of server it's working with, especially not when
sending an initial connection packet (when, by definition, it can't
know that for sure).

As far as changing such settings later in the session is concerned,
isn't that what SET is for?
        regards, tom lane


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Daniel Farina
Date:
On Tue, Jun 5, 2012 at 9:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniel Farina <daniel@heroku.com> writes:
>> Would these hypothetical extension-pairs be using the "options" device
>> at startup time, or something else (possibly brand new)?
>
> I'd argue for just translating them into "options", at least in the
> near term.  If they use some new mechanism then they would only work
> with new servers, and it's generally not good for libpq to assume much
> about what version of server it's working with, especially not when
> sending an initial connection packet (when, by definition, it can't
> know that for sure).
>
> As far as changing such settings later in the session is concerned,
> isn't that what SET is for?

Yeah; I mentioned that upthread.  The only semantic difference I can
think of -- and this is a persnickety detail -- is that startup-time
options are evaluated all together.  This is not unlike email or HTTP
headers as well.

The result is that if one expects to get several extension headers in
one shot that life is made more difficult.  This can make validation
more tricky, but perhaps it could be fixed with a multi-set command,
should one not already exist.

Another issue is that it becomes more painful for proxies to
efficiently and easily pick out session value adjustments, as is
useful when composing them, so part of me would prefer to have a
protocol-level-only construct, but yet another part of me would really
like to just type "SET" into psql and try stuff.  I guess it could be
a backslashified command, though...I wasn't looking to design this at
this time, but as long as you are willing to think and write about it,
I'm happy for the opportunity.

In the interest of unblocking a bit of exploratory work while
discussing this, perhaps we think a convention around "x-" or whatnot
and packing things into the PGOPTIONS makes sense?

--
fdr


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Magnus Hagander
Date:
On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina <daniel@heroku.com> wrote:
> On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Daniel Farina <daniel@heroku.com> writes:
> If that is the case, is there a convention we can use to separate the
> parts of the connection string (in both representations) into the
> parts sent to the server and the part that the client needs?  We
> already abuse this a little bit because URI syntax (in general, not
> just our rendition of it) leaves little room for extension for
> parameters on the client side.  Consider ?sslmode=require.
>
> In both representations, the net effect of a typo would be that
> instead of magically reading some properties on the client side,
> they'd be sent to the server.  How often is this going  to be so wrong
> that one cannot send a response from the server indicating to the user
> their error?  On casual inspection it doesn't seem like prohibitively
> often, but I haven't mulled over that for very long.

I think that's an excellent example of this being a bad idea. If you
mis-spell sslmode=require, that should absolutely result in an error
on the client side. Otherwise, you might end up sending your password
(or other details that are not as sensitive, but still sensitive) over
an unencrypted connection. If you wait for the error from the server,
it's too late.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Daniel Farina
Date:
On Wed, Jun 6, 2012 at 1:09 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina <daniel@heroku.com> wrote:
>> On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Daniel Farina <daniel@heroku.com> writes:
>> If that is the case, is there a convention we can use to separate the
>> parts of the connection string (in both representations) into the
>> parts sent to the server and the part that the client needs?  We
>> already abuse this a little bit because URI syntax (in general, not
>> just our rendition of it) leaves little room for extension for
>> parameters on the client side.  Consider ?sslmode=require.
>>
>> In both representations, the net effect of a typo would be that
>> instead of magically reading some properties on the client side,
>> they'd be sent to the server.  How often is this going  to be so wrong
>> that one cannot send a response from the server indicating to the user
>> their error?  On casual inspection it doesn't seem like prohibitively
>> often, but I haven't mulled over that for very long.
>
> I think that's an excellent example of this being a bad idea. If you
> mis-spell sslmode=require, that should absolutely result in an error
> on the client side. Otherwise, you might end up sending your password
> (or other details that are not as sensitive, but still sensitive) over
> an unencrypted connection. If you wait for the error from the server,
> it's too late.

That is an excellent point.  Is there enough time in the day to gripe
about how sslmode=require is not the default?

Well, this seems pretty obviated by the prefix-naming convention, but
it's an iron clad example of how the older idea was a bad one.

--
fdr


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Magnus Hagander
Date:
On Wed, Jun 6, 2012 at 6:58 PM, Daniel Farina <daniel@heroku.com> wrote:
> On Wed, Jun 6, 2012 at 1:09 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina <daniel@heroku.com> wrote:
>>> On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Daniel Farina <daniel@heroku.com> writes:
>>> If that is the case, is there a convention we can use to separate the
>>> parts of the connection string (in both representations) into the
>>> parts sent to the server and the part that the client needs?  We
>>> already abuse this a little bit because URI syntax (in general, not
>>> just our rendition of it) leaves little room for extension for
>>> parameters on the client side.  Consider ?sslmode=require.
>>>
>>> In both representations, the net effect of a typo would be that
>>> instead of magically reading some properties on the client side,
>>> they'd be sent to the server.  How often is this going  to be so wrong
>>> that one cannot send a response from the server indicating to the user
>>> their error?  On casual inspection it doesn't seem like prohibitively
>>> often, but I haven't mulled over that for very long.
>>
>> I think that's an excellent example of this being a bad idea. If you
>> mis-spell sslmode=require, that should absolutely result in an error
>> on the client side. Otherwise, you might end up sending your password
>> (or other details that are not as sensitive, but still sensitive) over
>> an unencrypted connection. If you wait for the error from the server,
>> it's too late.
>
> That is an excellent point.  Is there enough time in the day to gripe
> about how sslmode=require is not the default?

Well, you'll get me first in line to back that the current default is stupid.

But I'm not sure sslmode=require is a proper default either. Because
then the connection will fail completely to the vast majority of
servers, which simply don't have SSL support.


> Well, this seems pretty obviated by the prefix-naming convention, but
> it's an iron clad example of how the older idea was a bad one.

Yeah, a prefix based solution would fix this, since we can keep throwing errors.

However, not throwing errors on the URL syntax should be considered a
bug, I think.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Robert Haas
Date:
On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander <magnus@hagander.net> wrote:
> However, not throwing errors on the URL syntax should be considered a
> bug, I think.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Daniel Farina
Date:
On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> However, not throwing errors on the URL syntax should be considered a
>> bug, I think.
>
> +1.

+1

Here's a patch that just makes the thing an error.  Of course we could
revert it if it makes the URI feature otherwise unusable...but I don't
see a huge and terrible blocker ATM.  A major question mark for me any
extra stuff in JDBC URLs.

Nevertheless, here it is.

--
fdr

Attachment

Re: Inconsistency in libpq connection parameters, and extension thereof

From
Robert Haas
Date:
On Wed, Jun 6, 2012 at 6:04 PM, Daniel Farina <daniel@heroku.com> wrote:
> On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> However, not throwing errors on the URL syntax should be considered a
>>> bug, I think.
>>
>> +1.
>
> +1
>
> Here's a patch that just makes the thing an error.  Of course we could
> revert it if it makes the URI feature otherwise unusable...but I don't
> see a huge and terrible blocker ATM.  A major question mark for me any
> extra stuff in JDBC URLs.

It looks like the answer is "yes".

http://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

...but I'm inclined to think we should make this change anyway.  If
JDBC used libpq, then it might be nice to let JDBC parse out bits of
the URL and then pass the whole thing, unmodified, through to libpq,
without having libpq spit up.  But it doesn't.  And even if someone
were inclined to try to do something of that type, the warnings we're
omitting now would presumably discourage them.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Magnus Hagander
Date:
On Fri, Jun 8, 2012 at 1:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 6, 2012 at 6:04 PM, Daniel Farina <daniel@heroku.com> wrote:
>> On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> However, not throwing errors on the URL syntax should be considered a
>>>> bug, I think.
>>>
>>> +1.
>>
>> +1
>>
>> Here's a patch that just makes the thing an error.  Of course we could
>> revert it if it makes the URI feature otherwise unusable...but I don't
>> see a huge and terrible blocker ATM.  A major question mark for me any
>> extra stuff in JDBC URLs.
>
> It looks like the answer is "yes".
>
> http://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
>
> ...but I'm inclined to think we should make this change anyway.  If
> JDBC used libpq, then it might be nice to let JDBC parse out bits of
> the URL and then pass the whole thing, unmodified, through to libpq,
> without having libpq spit up.  But it doesn't.  And even if someone
> were inclined to try to do something of that type, the warnings we're
> omitting now would presumably discourage them.
>
> Thoughts?

I think we *have* to make the change for regular parameters, for
security reasons.

What we do with "prefixed parameters" can be debated... But we'll have
to pass those to the server anyway for validation, so it might be an
uninteresting case.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Inconsistency in libpq connection parameters, and extension thereof

From
Robert Haas
Date:
On Fri, Jun 8, 2012 at 7:53 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Fri, Jun 8, 2012 at 1:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jun 6, 2012 at 6:04 PM, Daniel Farina <daniel@heroku.com> wrote:
>>> On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>> However, not throwing errors on the URL syntax should be considered a
>>>>> bug, I think.
>>>>
>>>> +1.
>>>
>>> +1
>>>
>>> Here's a patch that just makes the thing an error.  Of course we could
>>> revert it if it makes the URI feature otherwise unusable...but I don't
>>> see a huge and terrible blocker ATM.  A major question mark for me any
>>> extra stuff in JDBC URLs.
>>
>> It looks like the answer is "yes".
>>
>> http://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
>>
>> ...but I'm inclined to think we should make this change anyway.  If
>> JDBC used libpq, then it might be nice to let JDBC parse out bits of
>> the URL and then pass the whole thing, unmodified, through to libpq,
>> without having libpq spit up.  But it doesn't.  And even if someone
>> were inclined to try to do something of that type, the warnings we're
>> omitting now would presumably discourage them.
>>
>> Thoughts?
>
> I think we *have* to make the change for regular parameters, for
> security reasons.
>
> What we do with "prefixed parameters" can be debated... But we'll have
> to pass those to the server anyway for validation, so it might be an
> uninteresting case.

OK, committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Daniel Farina <daniel@heroku.com> writes:

> On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Um.  We oughta fix that.  I'm not necessarily wedded to the old
>> throw-an-error definition, but there seems no good reason for these
>> two syntaxes to act inconsistently.
>
> I agree with that.  The URIs may have been done this way as a
> concession to some small fragmentation that may have taken place
> before URIs were standardized, but perhaps the author can speak to
> that (he has been put on the To: list for this mail).

Sorry for the silence.

The original intent was to not error out on any extra parameters from
JDBC or other existing URI implementations.  The example of a possible
typo in sslmode=require clearly demonstrates that this was not
a well-thought decision.

Anyway, I can see you've already sorted this out.

--
Alex