Thread: Expand the use of check_canonical_path() for more GUCs

Expand the use of check_canonical_path() for more GUCs

From
Michael Paquier
Date:
Hi all,

While digging into my backlog, I have found this message from Peter E
mentioning about $subject:
https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9ab68@2ndquadrant.com

It seems to me that it would be a good idea to make those checks more
consistent, and attached is a patch.

Thoughts?
--
Michael

Attachment

Re: Expand the use of check_canonical_path() for more GUCs

From
Peter Eisentraut
Date:
On 2020-05-19 09:09, Michael Paquier wrote:
> While digging into my backlog, I have found this message from Peter E
> mentioning about $subject:
> https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9ab68@2ndquadrant.com
> 
> It seems to me that it would be a good idea to make those checks more
> consistent, and attached is a patch.

That thread didn't resolve why check_canonical_path() is necessary 
there.  Maybe the existing uses could be removed?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Expand the use of check_canonical_path() for more GUCs

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> While digging into my backlog, I have found this message from Peter E
> mentioning about $subject:
> https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9ab68@2ndquadrant.com

> It seems to me that it would be a good idea to make those checks more
> consistent, and attached is a patch.

Hm, I'm pretty certain that data_directory does not need this because
canonicalization is done elsewhere; the most that you could accomplish
there is to cause problems.  Dunno about the rest.

            regards, tom lane



Re: Expand the use of check_canonical_path() for more GUCs

From
Michael Paquier
Date:
On Tue, May 19, 2020 at 09:32:15AM -0400, Tom Lane wrote:
> Hm, I'm pretty certain that data_directory does not need this because
> canonicalization is done elsewhere; the most that you could accomplish
> there is to cause problems.  Dunno about the rest.

Hmm.  I missed that this is getting done in SelectConfigFiles() first
by the postmaster so that's not necessary, which also does the work
for hba_file and ident_file.  config_file does not need that either as
AbsoluteConfigLocation() does the same work via ParseConfigFile().  So
perhaps we could add a comment or such about that?  Attached is an
idea.

The rest is made of PromoteTriggerFile, pg_krb_server_keyfile,
ssl_cert_file, ssl_key_file, ssl_ca_file, ssl_crl_file and
ssl_dh_params_file where loaded values are taken as-is, so applying
canonicalization would be helpful there, no?
--
Michael

Attachment

Re: Expand the use of check_canonical_path() for more GUCs

From
Michael Paquier
Date:
On Tue, May 19, 2020 at 01:02:12PM +0200, Peter Eisentraut wrote:
> That thread didn't resolve why check_canonical_path() is necessary there.
> Maybe the existing uses could be removed?

This would impact log_directory, external_pid_file,
stats_temp_directory, where it is still useful to show to the user
cleaned up names, no?  See for example 2594cf0.
--
Michael

Attachment

Re: Expand the use of check_canonical_path() for more GUCs

From
Peter Eisentraut
Date:
On 2020-05-20 09:13, Michael Paquier wrote:
> On Tue, May 19, 2020 at 01:02:12PM +0200, Peter Eisentraut wrote:
>> That thread didn't resolve why check_canonical_path() is necessary there.
>> Maybe the existing uses could be removed?
> 
> This would impact log_directory, external_pid_file,
> stats_temp_directory, where it is still useful to show to the user
> cleaned up names, no?  See for example 2594cf0.

I don't understand why we need to alter the file names specified by the 
user.  They presumably wrote them that way for a reason and they 
probably like them that way.

There are specific situations where we need to do that to know whether a 
path is in the data directory or the same as some other one etc.  But 
unless there is a reason like that, I think we should just leave them.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Expand the use of check_canonical_path() for more GUCs

From
Kyotaro Horiguchi
Date:
At Wed, 20 May 2020 10:05:29 +0200, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in 
> On 2020-05-20 09:13, Michael Paquier wrote:
> > On Tue, May 19, 2020 at 01:02:12PM +0200, Peter Eisentraut wrote:
> >> That thread didn't resolve why check_canonical_path() is necessary
> >> there.
> >> Maybe the existing uses could be removed?
> > This would impact log_directory, external_pid_file,
> > stats_temp_directory, where it is still useful to show to the user
> > cleaned up names, no?  See for example 2594cf0.
> 
> I don't understand why we need to alter the file names specified by
> the user.  They presumably wrote them that way for a reason and they
> probably like them that way.
> 
> There are specific situations where we need to do that to know whether
> a path is in the data directory or the same as some other one etc.
> But unless there is a reason like that, I think we should just leave
> them.

I completely agree to Peter here.  I would be surprised that I see
system views show different strings from what I wrote in config
files. I also think that it ought to be documented if we store tweaked
string for a user input.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Expand the use of check_canonical_path() for more GUCs

From
Mark Dilger
Date:

> On May 20, 2020, at 12:03 AM, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, May 19, 2020 at 09:32:15AM -0400, Tom Lane wrote:
>> Hm, I'm pretty certain that data_directory does not need this because
>> canonicalization is done elsewhere; the most that you could accomplish
>> there is to cause problems.  Dunno about the rest.
>
> Hmm.  I missed that this is getting done in SelectConfigFiles() first
> by the postmaster so that's not necessary, which also does the work
> for hba_file and ident_file.  config_file does not need that either as
> AbsoluteConfigLocation() does the same work via ParseConfigFile().  So
> perhaps we could add a comment or such about that?  Attached is an
> idea.
>
> The rest is made of PromoteTriggerFile, pg_krb_server_keyfile,
> ssl_cert_file, ssl_key_file, ssl_ca_file, ssl_crl_file and
> ssl_dh_params_file where loaded values are taken as-is, so applying
> canonicalization would be helpful there, no?

Before this patch, there are three GUCs that get check_canonical_path treatment:

   log_directory
   external_pid_file
   stats_temp_directory

After the patch, these also get the treatment (though Peter seems to be objecting to the change):

   promote_trigger_file
   krb_server_keyfile
   ssl_cert_file
   ssl_key_file
   ssl_ca_file
   ssl_crl_file
   ssl_dh_params_file

and these still don't, with comments about how they are already canonicalized when the config file is loaded:

   data_directory
   config_file
   hba_file
   ident_file

A little poking around shows that in SelectConfigFiles(), these four directories were set by SetConfigOption().  I
don'tsee a problem with the code, but the way this stuff is spread around makes it easy for somebody adding a new GUC
filepath to do it wrong.  I don't have much opinion about Peter's preference that paths be left alone, but I'd prefer
somecomments in guc.c explaining it all.  The only cleanup that occurs to me is to reorder ConfigureNamesString[] to
haveall the path options back-to-back, with the four that are set by SelectConfigFiles() at the top with comments about
whythey are special, and the rest after that with comments about why they need or do not need canonicalization. 


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Expand the use of check_canonical_path() for more GUCs

From
Robert Haas
Date:
On Tue, May 19, 2020 at 7:02 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> That thread didn't resolve why check_canonical_path() is necessary
> there.  Maybe the existing uses could be removed?

This first sentence of this reply seems worthy of particualr
attention. We have to know what problem this is intended to fix before
we try to decide in which cases it's needed. Otherwise, whether we add
it everywhere or remove it everywhere, we'll only know that it's
consistent, not that it's correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Expand the use of check_canonical_path() for more GUCs

From
Alvaro Herrera
Date:
On 2020-May-28, Mark Dilger wrote:

> A little poking around shows that in SelectConfigFiles(), these four
> directories were set by SetConfigOption().  I don't see a problem with
> the code, but the way this stuff is spread around makes it easy for
> somebody adding a new GUC file path to do it wrong.  I don't have much
> opinion about Peter's preference that paths be left alone, but I'd
> prefer some comments in guc.c explaining it all.  The only cleanup
> that occurs to me is to reorder ConfigureNamesString[] to have all the
> path options back-to-back, with the four that are set by
> SelectConfigFiles() at the top with comments about why they are
> special, and the rest after that with comments about why they need or
> do not need canonicalization.

No need for reorganization, I think, just have a comment on top of each
entry that doesn't use canonicalization such as "no canonicalization,
as explained in ..." where that refers to a single largish comment that
explains what canonicalization is, why you use it, and why you don't.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Expand the use of check_canonical_path() for more GUCs

From
Peter Eisentraut
Date:
On 2020-05-29 19:24, Robert Haas wrote:
> On Tue, May 19, 2020 at 7:02 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> That thread didn't resolve why check_canonical_path() is necessary
>> there.  Maybe the existing uses could be removed?
> 
> This first sentence of this reply seems worthy of particualr
> attention. We have to know what problem this is intended to fix before
> we try to decide in which cases it's needed. Otherwise, whether we add
> it everywhere or remove it everywhere, we'll only know that it's
> consistent, not that it's correct.

The archeology reveals that these calls where originally added to 
canonicalize the data_directory and config_file settings (7b0f060d54), 
but that was then moved out of guc.c to be done early during postmaster 
startup (337ffcddba).  The remaining calls of check_canonical_path() in 
guc.c appear to be leftovers from a previous regime.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Expand the use of check_canonical_path() for more GUCs

From
Robert Haas
Date:
On Tue, Jun 2, 2020 at 5:04 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> The archeology reveals that these calls where originally added to
> canonicalize the data_directory and config_file settings (7b0f060d54),
> but that was then moved out of guc.c to be done early during postmaster
> startup (337ffcddba).  The remaining calls of check_canonical_path() in
> guc.c appear to be leftovers from a previous regime.

Thanks for looking into it. Sounds like it can just be ripped out,
then, unless someone knows of a reason to do otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Expand the use of check_canonical_path() for more GUCs

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 2, 2020 at 5:04 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> The archeology reveals that these calls where originally added to
>> canonicalize the data_directory and config_file settings (7b0f060d54),
>> but that was then moved out of guc.c to be done early during postmaster
>> startup (337ffcddba).  The remaining calls of check_canonical_path() in
>> guc.c appear to be leftovers from a previous regime.

> Thanks for looking into it. Sounds like it can just be ripped out,
> then, unless someone knows of a reason to do otherwise.

In the abstract, I agree with Peter's point that we shouldn't alter
user-given strings without need.  However, I think there's strong
reason for canonicalizing the data directory and config file locations.
We access those both before and after chdir'ing into the datadir, so
we'd better have absolute paths to them --- and at least for the
datadir, it's documented that you can initially give it as a path
relative to wherever you started the postmaster from.  If the other
files are only accessed after the chdir happens then we could likely
do without canonicalizing them.  But ... do we know which directory
the user (thought he) specified them with reference to?  Forced
canonicalization does have the advantage that it's clear to all
onlookers how we are interpreting the paths.

            regards, tom lane



Re: Expand the use of check_canonical_path() for more GUCs

From
Michael Paquier
Date:
On Wed, Jun 03, 2020 at 02:45:50PM -0400, Tom Lane wrote:
> In the abstract, I agree with Peter's point that we shouldn't alter
> user-given strings without need.  However, I think there's strong
> reason for canonicalizing the data directory and config file locations.
> We access those both before and after chdir'ing into the datadir, so
> we'd better have absolute paths to them --- and at least for the
> datadir, it's documented that you can initially give it as a path
> relative to wherever you started the postmaster from.  If the other
> files are only accessed after the chdir happens then we could likely
> do without canonicalizing them.  But ... do we know which directory
> the user (thought he) specified them with reference to?  Forced
> canonicalization does have the advantage that it's clear to all
> onlookers how we are interpreting the paths.

Even with the last point...  It looks like there is little love for
this patch.  So it seems to me that this brings the discussion down to
two points: shouldn't we document why canonicalization is not done for
data_directory, config_file, hba_file and ident_file with some
comments in guc.c?  Then, why do we apply it to external_pid_file,
Log_directory and stats_temp_directory knowing that we chdir to PGDATA
in the postmaster before they get used (as far as I can see)?
--
Michael

Attachment

Re: Expand the use of check_canonical_path() for more GUCs

From
Peter Eisentraut
Date:
On 2020-06-03 20:45, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 2, 2020 at 5:04 AM Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> The archeology reveals that these calls where originally added to
>>> canonicalize the data_directory and config_file settings (7b0f060d54),
>>> but that was then moved out of guc.c to be done early during postmaster
>>> startup (337ffcddba).  The remaining calls of check_canonical_path() in
>>> guc.c appear to be leftovers from a previous regime.
> 
>> Thanks for looking into it. Sounds like it can just be ripped out,
>> then, unless someone knows of a reason to do otherwise.
> 
> In the abstract, I agree with Peter's point that we shouldn't alter
> user-given strings without need.  However, I think there's strong
> reason for canonicalizing the data directory and config file locations.

It is not proposed to change that.  It is only debated whether the same 
canonicalization should be applied to other GUCs that represent paths.

> We access those both before and after chdir'ing into the datadir, so
> we'd better have absolute paths to them --- and at least for the
> datadir, it's documented that you can initially give it as a path
> relative to wherever you started the postmaster from.  If the other
> files are only accessed after the chdir happens then we could likely
> do without canonicalizing them.  But ... do we know which directory
> the user (thought he) specified them with reference to?  Forced
> canonicalization does have the advantage that it's clear to all
> onlookers how we are interpreting the paths.

This (and some other messages in this thread) appears to assume that 
canonicalize_path() turns relative paths into absolute paths, but 
AFAICT, it does not do that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Expand the use of check_canonical_path() for more GUCs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> This (and some other messages in this thread) appears to assume that 
> canonicalize_path() turns relative paths into absolute paths, but 
> AFAICT, it does not do that.

Ah, fair point --- I'd been assuming that we were applying
canonicalize_path as cleanup for an absolute-ification operation,
but you are right that check_canonical_path does not do that.

Digging around, though, I notice a different motivation.
In assign_pgstat_temp_directory we have

    /* check_canonical_path already canonicalized newval for us */
    ...
    tname = guc_malloc(ERROR, strlen(newval) + 12); /* /global.tmp */
    sprintf(tname, "%s/global.tmp", newval);
    fname = guc_malloc(ERROR, strlen(newval) + 13); /* /global.stat */
    sprintf(fname, "%s/global.stat", newval);

and I believe what the comment is on about is that these path derivation
operations are unreliable if newval isn't in canonical form.  I seem
to remember for example that in some Windows configurations, mixing
slashes and backslashes doesn't work.

So the real point here is that we could use the user's string unmodified
as long as we only use it exactly as-is, but cases where we derive other
pathnames from it require more work.

Of course, we could leave the GUC string alone and only canonicalize while
forming derived paths, but that seems mighty error-prone.  In any case,
just ripping out the check_canonical_path usages without any other mop-up
will break things.

            regards, tom lane



Re: Expand the use of check_canonical_path() for more GUCs

From
Daniel Gustafsson
Date:
Re-reading this thread it seems to me that the conclusion is to mark the patch
Returned with Feedback in this commitfest, and possibly expand documentation or
comments on path canonicalization in the code at some point.

Does that seem fair?

cheers ./daniel


Re: Expand the use of check_canonical_path() for more GUCs

From
Michael Paquier
Date:
On Thu, Jul 09, 2020 at 02:19:06PM +0200, Daniel Gustafsson wrote:
> Re-reading this thread it seems to me that the conclusion is to mark the patch
> Returned with Feedback in this commitfest, and possibly expand documentation or
> comments on path canonicalization in the code at some point.
>
> Does that seem fair?

Yes, that's fair as there is visibly a consensus that we could perhaps
remove some of the canonicalization, but not expand it.  There is
nothing preventing to live with the current things in place either, so
done.
--
Michael

Attachment