Thread: "unix_socket_directories" should be GUC_LIST_INPUT?

"unix_socket_directories" should be GUC_LIST_INPUT?

From
Ian Lawrence Barwick
Date:
Hi

Not that I've ever had to do this (or would want to do it on a production
system), but this error message seems incorrect:

    postgres=# ALTER SYSTEM SET unix_socket_directories  =
'/tmp/sock1','/tmp/sock2';
    ERROR:  SET unix_socket_directories takes only one argument

Trivial patch attached.

Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com

Attachment

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Michael Paquier
Date:
On Fri, Oct 23, 2020 at 11:34:06AM +0900, Ian Lawrence Barwick wrote:
> Not that I've ever had to do this (or would want to do it on a production
> system), but this error message seems incorrect:
>
>     postgres=# ALTER SYSTEM SET unix_socket_directories  =
> '/tmp/sock1','/tmp/sock2';
>     ERROR:  SET unix_socket_directories takes only one argument
>
> Trivial patch attached.

I have never seen that case, but I think that you are right.  Still,
that's not the end of it, see by yourself what the following command
generates with only your patch, which is fancy:
ALTER SYSTEM SET unix_socket_directories = '/tmp/sock1','/tmp/, sock2';

We need an extra GUC_LIST_QUOTE on top of what you are proposing.
--
Michael

Attachment

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Ian Lawrence Barwick
Date:
2020年10月23日(金) 12:12 Michael Paquier <michael@paquier.xyz>:
>
> On Fri, Oct 23, 2020 at 11:34:06AM +0900, Ian Lawrence Barwick wrote:
> > Not that I've ever had to do this (or would want to do it on a production
> > system), but this error message seems incorrect:
> >
> >     postgres=# ALTER SYSTEM SET unix_socket_directories  =
> > '/tmp/sock1','/tmp/sock2';
> >     ERROR:  SET unix_socket_directories takes only one argument
> >
> > Trivial patch attached.
>
> I have never seen that case, but I think that you are right.  Still,
> that's not the end of it, see by yourself what the following command
> generates with only your patch, which is fancy:
> ALTER SYSTEM SET unix_socket_directories = '/tmp/sock1','/tmp/, sock2';
>
> We need an extra GUC_LIST_QUOTE on top of what you are proposing.

Ah yes, good point.

Updated version attached.

Regards

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

Attachment

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Michael Paquier
Date:
On Fri, Oct 23, 2020 at 12:23:28PM +0900, Ian Lawrence Barwick wrote:
> Updated version attached.

LGTM.  Looking at c9b0cbe and the relevant thread it looks like this
point was not really covered, so my guess is that this was just
forgotten:
https://www.postgresql.org/message-id/4FCF6040.5030408@redhat.com

I'll look again at that in the next couple of days and double-check
the relevant areas of the code, just in case.  It is Friday afternoon
here, and I suspect that my mind is missing something obvious.
--
Michael

Attachment

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> I'll look again at that in the next couple of days and double-check
> the relevant areas of the code, just in case.  It is Friday afternoon
> here, and I suspect that my mind is missing something obvious.

Indeed.  The patch fails to update pg_dump.c's
variable_is_guc_list_quote(), which exposes the real problem here:
changing an existing variable's GUC_LIST_QUOTE property is an API break.

Getting pg_dump to cope with such a situation would be a research project.
The easy part of it would be to make variable_is_guc_list_quote() be
version-aware; the hard part would be figuring out what to emit so that
SET clauses will load correctly regardless of which PG version they will
be loaded into.

I suspect you're right that this variable should have been marked as a
list to start with, but I'm afraid changing it at this point would be
way more trouble than it's worth.

            regards, tom lane



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Ian Lawrence Barwick
Date:
2020年10月23日(金) 12:56 Tom Lane <tgl@sss.pgh.pa.us>:
>
> Michael Paquier <michael@paquier.xyz> writes:
> > I'll look again at that in the next couple of days and double-check
> > the relevant areas of the code, just in case.  It is Friday afternoon
> > here, and I suspect that my mind is missing something obvious.
>
> Indeed.  The patch fails to update pg_dump.c's
> variable_is_guc_list_quote(), which exposes the real problem here:
> changing an existing variable's GUC_LIST_QUOTE property is an API break.

Aha, noted.

> Getting pg_dump to cope with such a situation would be a research project.
> The easy part of it would be to make variable_is_guc_list_quote() be
> version-aware; the hard part would be figuring out what to emit so that
> SET clauses will load correctly regardless of which PG version they will
> be loaded into.
>
> I suspect you're right that this variable should have been marked as a
> list to start with, but I'm afraid changing it at this point would be
> way more trouble than it's worth.

The use-case is admittedly extremely marginal, and presumably hasn't attracted
any other reports until now. I only noticed as I was poking around in
the area and
it looked inconsistent.

How about adding a comment along the lines of

/*
 * GUC_LIST_INPUT not set here as the use-case is marginal and modifying it
 * would require an API change.
 */

to clarify why it's like that and prevent someone else trying to "fix"
the same issue
in a few year's time?

Regards

Ian Barwick


--
EnterpriseDB: https://www.enterprisedb.com



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Tom Lane
Date:
Ian Lawrence Barwick <barwick@gmail.com> writes:
> How about adding a comment along the lines of

A comment seems reasonable, but I'd probably write it more like

/*
 * unix_socket_directories should have been marked GUC_LIST_INPUT |
 * GUC_LIST_QUOTE, but it's too late to change it without creating
 * big compatibility problems for pg_dump.
 */

            regards, tom lane



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Tom Lane
Date:
I wrote:
>  * unix_socket_directories should have been marked GUC_LIST_INPUT |
>  * GUC_LIST_QUOTE, but it's too late to change it without creating
>  * big compatibility problems for pg_dump.

Although ... just to argue against myself for a moment, how likely
is it that pg_dump is going to be faced with the need to dump a
value for unix_socket_directories?

Generally speaking, the value of that variable is sufficiently
embedded into builds that you aren't going to mess with it.
It's close to being part of the FE/BE protocol, since whatever
build of libpq you use is going to know about one or another
of those directories, and the only reason to have more than one
is if you have other clients that hard-wire some other directory.

Even ignoring that point, since it's PGC_POSTMASTER, you certainly
aren't going to have cases like function SET clauses or ALTER
USER/DATABASE SET commands to dump.  Unless pg_dumpall worries
about postgresql.auto.conf, which I don't think it does, the actual
use-case for it to dump a value for unix_socket_directories is nil
--- and even having the variable's value set in postgresql.auto.conf
seems like a seriously niche use-case.

So maybe we could get away with just changing it.  It'd be good to
verify though that this doesn't break existing string values for
the variable, assuming they contain no unlikely characters that'd
need quoting.

            regards, tom lane



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Michael Paquier
Date:
On Fri, Oct 23, 2020 at 12:49:57AM -0400, Tom Lane wrote:
> Although ... just to argue against myself for a moment, how likely
> is it that pg_dump is going to be faced with the need to dump a
> value for unix_socket_directories?

I am trying to think about some scenarios here, but honestly I
cannot..

> So maybe we could get away with just changing it.  It'd be good to
> verify though that this doesn't break existing string values for
> the variable, assuming they contain no unlikely characters that'd
> need quoting.

Yeah, that's the kind of things I wanted to check anyway before
considering doing the switch.
--
Michael

Attachment

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Peter Eisentraut
Date:
On 2020-10-23 10:02, Michael Paquier wrote:
>> So maybe we could get away with just changing it.  It'd be good to
>> verify though that this doesn't break existing string values for
>> the variable, assuming they contain no unlikely characters that'd
>> need quoting.
> 
> Yeah, that's the kind of things I wanted to check anyway before
> considering doing the switch.

If we're going to change it I think we need an updated patch that covers 
pg_dump.  (Even if we argue that pg_dump would not normally dump this 
variable, keeping it up to date with respect to GUC_LIST_QUOTE seems 
proper.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> If we're going to change it I think we need an updated patch that covers 
> pg_dump.  (Even if we argue that pg_dump would not normally dump this 
> variable, keeping it up to date with respect to GUC_LIST_QUOTE seems 
> proper.)

Right, I was definitely assuming that that would happen.

            regards, tom lane



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Robert Haas
Date:
On Tue, Oct 27, 2020 at 9:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > If we're going to change it I think we need an updated patch that covers
> > pg_dump.  (Even if we argue that pg_dump would not normally dump this
> > variable, keeping it up to date with respect to GUC_LIST_QUOTE seems
> > proper.)
>
> Right, I was definitely assuming that that would happen.

If we change this, is it going to be a compatibility break for the
contents of postgresql.conf files?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If we change this, is it going to be a compatibility break for the
> contents of postgresql.conf files?

I think not, at least not for the sorts of values you'd ordinarily
find in that variable, say '/tmp, /var/run/postgresql'.  Possibly
the behavior would change for pathnames containing spaces or the
like, but it is probably kinda broken for such cases today anyway.

In any case, Michael had promised to test this aspect before committing.

            regards, tom lane



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Michael Paquier
Date:
On Tue, Oct 27, 2020 at 12:23:22PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If we change this, is it going to be a compatibility break for the
>> contents of postgresql.conf files?
>
> I think not, at least not for the sorts of values you'd ordinarily
> find in that variable, say '/tmp, /var/run/postgresql'.  Possibly
> the behavior would change for pathnames containing spaces or the
> like, but it is probably kinda broken for such cases today anyway.
>
> In any case, Michael had promised to test this aspect before committing.

Paths with spaces or commas would be fine as long as we associate
GUC_LIST_QUOTE with GUC_LIST_INPUT so as commas within quoted entries
are handled consistently.  postmaster.c uses SplitDirectoriesString()
with a comma do decide how to split things.  This discards leading and
trailing whitespaces, requires a double-quote to have a matching
double-quote where trailing/leading whitespaces are allowed, and
nothing to escape quotes.  Those patterns fail:
"/tmp/repo1,"/tmp/repo2
"/tmp/repo1,/tmp/repo2
These are split as a single entry:
"/tmp/repo1,/tmp/repo2"
"/tmp/ repo1,/tmp/ repo2"
"/tmp/ repo1 , /tmp/ repo2 "
These are split as two entries:
"/tmp/repo1,",/tmp/repo2
/tmp /repo1 , /tmp/ repo2
"/tmp/""sock1", "/tmp/, sock2" (here first path has one double quote)

If we use GUC_LIST_INPUT | GUC_LIST_QUOTE, paths are handled the same
way as the original, but we would run into problems if not using
GUC_LIST_QUOTE as mentioned upthread.

Anyway, we have a compatibility problem once we use ALTER SYSTEM.
Just take the following command:
alter system set unix_socket_directories = '/tmp/sock1, /tmp/sock2';

On HEAD, this would be treated and parsed as two items.  However, with
the patch, this becomes one item as this is considered as one single
element of the list of paths, as that's written to
postgresql.auto.conf as '"/tmp/sock1, /tmp/sock2"'.

This last argument would be IMO a reason enough to not do the switch.
Even if I have never seen cases where ALTER SYSTEM was used with
unix_socket_directories, we cannot say either that nobody relies on
the existing behavior (perhaps some failover solutions care?).  So at
least we should add a comment as proposed in
https://postgr.es/m/122596.1603427777@sss.pgh.pa.us.

Thoughts?
--
Michael

Attachment

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> Anyway, we have a compatibility problem once we use ALTER SYSTEM.
> Just take the following command: 
> alter system set unix_socket_directories = '/tmp/sock1, /tmp/sock2';

> On HEAD, this would be treated and parsed as two items.  However, with
> the patch, this becomes one item as this is considered as one single
> element of the list of paths, as that's written to
> postgresql.auto.conf as '"/tmp/sock1, /tmp/sock2"'.

> This last argument would be IMO a reason enough to not do the switch.

I do not think that that's a fatal objection.  I doubt anyone has
applications that are automatically issuing that sort of command and
would be broken by a change.  I think backwards compatibility is
sufficiently met if the behavior remains the same for existing
postgresql.conf entries, which AFAICT it would.

Arguably, the whole point of doing something here is to make ALTER
SYSTEM handle this variable more sensibly.  In that context,
'/tmp/sock1, /tmp/sock2' *should* be taken as one item IMO.
We can't change the behavior without, um, changing the behavior.

            regards, tom lane



Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Michael Paquier
Date:
On Wed, Nov 04, 2020 at 10:47:43AM -0500, Tom Lane wrote:
> I do not think that that's a fatal objection.  I doubt anyone has
> applications that are automatically issuing that sort of command and
> would be broken by a change.  I think backwards compatibility is
> sufficiently met if the behavior remains the same for existing
> postgresql.conf entries, which AFAICT it would.

OK.  As far as I know, we parse this variable the same way, so this
case would be satisfied.

> Arguably, the whole point of doing something here is to make ALTER
> SYSTEM handle this variable more sensibly.  In that context,
> '/tmp/sock1, /tmp/sock2' *should* be taken as one item IMO.
> We can't change the behavior without, um, changing the behavior.

No arguments against this point either.  If you consider all that, the
switch can be done with the attached, with the change for pg_dump
included.  I have reorganized the list in variable_is_guc_list_quote()
alphabetically while on it.

Robert, is your previous question answered?
--
Michael

Attachment

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

From
Michael Paquier
Date:
On Thu, Nov 05, 2020 at 09:16:10AM +0900, Michael Paquier wrote:
> No arguments against this point either.  If you consider all that, the
> switch can be done with the attached, with the change for pg_dump
> included.  I have reorganized the list in variable_is_guc_list_quote()
> alphabetically while on it.

Hearing nothing, applied on HEAD.
--
Michael

Attachment