Thread: Expand the use of check_canonical_path() for more GUCs
Hi all, While digging into my backlog, I have found this message from Peter E mentioning about $subject: https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9ab68@2ndquadrant.com It seems to me that it would be a good idea to make those checks more consistent, and attached is a patch. Thoughts? -- Michael
Attachment
On 2020-05-19 09:09, Michael Paquier wrote: > While digging into my backlog, I have found this message from Peter E > mentioning about $subject: > https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9ab68@2ndquadrant.com > > It seems to me that it would be a good idea to make those checks more > consistent, and attached is a patch. That thread didn't resolve why check_canonical_path() is necessary there. Maybe the existing uses could be removed? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Michael Paquier <michael@paquier.xyz> writes: > While digging into my backlog, I have found this message from Peter E > mentioning about $subject: > https://www.postgresql.org/message-id/e6aac026-174c-9952-689f-6bee76f9ab68@2ndquadrant.com > It seems to me that it would be a good idea to make those checks more > consistent, and attached is a patch. Hm, I'm pretty certain that data_directory does not need this because canonicalization is done elsewhere; the most that you could accomplish there is to cause problems. Dunno about the rest. regards, tom lane
On Tue, May 19, 2020 at 09:32:15AM -0400, Tom Lane wrote: > Hm, I'm pretty certain that data_directory does not need this because > canonicalization is done elsewhere; the most that you could accomplish > there is to cause problems. Dunno about the rest. Hmm. I missed that this is getting done in SelectConfigFiles() first by the postmaster so that's not necessary, which also does the work for hba_file and ident_file. config_file does not need that either as AbsoluteConfigLocation() does the same work via ParseConfigFile(). So perhaps we could add a comment or such about that? Attached is an idea. The rest is made of PromoteTriggerFile, pg_krb_server_keyfile, ssl_cert_file, ssl_key_file, ssl_ca_file, ssl_crl_file and ssl_dh_params_file where loaded values are taken as-is, so applying canonicalization would be helpful there, no? -- Michael
Attachment
On Tue, May 19, 2020 at 01:02:12PM +0200, Peter Eisentraut wrote: > That thread didn't resolve why check_canonical_path() is necessary there. > Maybe the existing uses could be removed? This would impact log_directory, external_pid_file, stats_temp_directory, where it is still useful to show to the user cleaned up names, no? See for example 2594cf0. -- Michael
Attachment
On 2020-05-20 09:13, Michael Paquier wrote: > On Tue, May 19, 2020 at 01:02:12PM +0200, Peter Eisentraut wrote: >> That thread didn't resolve why check_canonical_path() is necessary there. >> Maybe the existing uses could be removed? > > This would impact log_directory, external_pid_file, > stats_temp_directory, where it is still useful to show to the user > cleaned up names, no? See for example 2594cf0. I don't understand why we need to alter the file names specified by the user. They presumably wrote them that way for a reason and they probably like them that way. There are specific situations where we need to do that to know whether a path is in the data directory or the same as some other one etc. But unless there is a reason like that, I think we should just leave them. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Wed, 20 May 2020 10:05:29 +0200, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in > On 2020-05-20 09:13, Michael Paquier wrote: > > On Tue, May 19, 2020 at 01:02:12PM +0200, Peter Eisentraut wrote: > >> That thread didn't resolve why check_canonical_path() is necessary > >> there. > >> Maybe the existing uses could be removed? > > This would impact log_directory, external_pid_file, > > stats_temp_directory, where it is still useful to show to the user > > cleaned up names, no? See for example 2594cf0. > > I don't understand why we need to alter the file names specified by > the user. They presumably wrote them that way for a reason and they > probably like them that way. > > There are specific situations where we need to do that to know whether > a path is in the data directory or the same as some other one etc. > But unless there is a reason like that, I think we should just leave > them. I completely agree to Peter here. I would be surprised that I see system views show different strings from what I wrote in config files. I also think that it ought to be documented if we store tweaked string for a user input. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
> On May 20, 2020, at 12:03 AM, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, May 19, 2020 at 09:32:15AM -0400, Tom Lane wrote: >> Hm, I'm pretty certain that data_directory does not need this because >> canonicalization is done elsewhere; the most that you could accomplish >> there is to cause problems. Dunno about the rest. > > Hmm. I missed that this is getting done in SelectConfigFiles() first > by the postmaster so that's not necessary, which also does the work > for hba_file and ident_file. config_file does not need that either as > AbsoluteConfigLocation() does the same work via ParseConfigFile(). So > perhaps we could add a comment or such about that? Attached is an > idea. > > The rest is made of PromoteTriggerFile, pg_krb_server_keyfile, > ssl_cert_file, ssl_key_file, ssl_ca_file, ssl_crl_file and > ssl_dh_params_file where loaded values are taken as-is, so applying > canonicalization would be helpful there, no? Before this patch, there are three GUCs that get check_canonical_path treatment: log_directory external_pid_file stats_temp_directory After the patch, these also get the treatment (though Peter seems to be objecting to the change): promote_trigger_file krb_server_keyfile ssl_cert_file ssl_key_file ssl_ca_file ssl_crl_file ssl_dh_params_file and these still don't, with comments about how they are already canonicalized when the config file is loaded: data_directory config_file hba_file ident_file A little poking around shows that in SelectConfigFiles(), these four directories were set by SetConfigOption(). I don'tsee a problem with the code, but the way this stuff is spread around makes it easy for somebody adding a new GUC filepath to do it wrong. I don't have much opinion about Peter's preference that paths be left alone, but I'd prefer somecomments in guc.c explaining it all. The only cleanup that occurs to me is to reorder ConfigureNamesString[] to haveall the path options back-to-back, with the four that are set by SelectConfigFiles() at the top with comments about whythey are special, and the rest after that with comments about why they need or do not need canonicalization. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 19, 2020 at 7:02 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > That thread didn't resolve why check_canonical_path() is necessary > there. Maybe the existing uses could be removed? This first sentence of this reply seems worthy of particualr attention. We have to know what problem this is intended to fix before we try to decide in which cases it's needed. Otherwise, whether we add it everywhere or remove it everywhere, we'll only know that it's consistent, not that it's correct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2020-May-28, Mark Dilger wrote: > A little poking around shows that in SelectConfigFiles(), these four > directories were set by SetConfigOption(). I don't see a problem with > the code, but the way this stuff is spread around makes it easy for > somebody adding a new GUC file path to do it wrong. I don't have much > opinion about Peter's preference that paths be left alone, but I'd > prefer some comments in guc.c explaining it all. The only cleanup > that occurs to me is to reorder ConfigureNamesString[] to have all the > path options back-to-back, with the four that are set by > SelectConfigFiles() at the top with comments about why they are > special, and the rest after that with comments about why they need or > do not need canonicalization. No need for reorganization, I think, just have a comment on top of each entry that doesn't use canonicalization such as "no canonicalization, as explained in ..." where that refers to a single largish comment that explains what canonicalization is, why you use it, and why you don't. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-05-29 19:24, Robert Haas wrote: > On Tue, May 19, 2020 at 7:02 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> That thread didn't resolve why check_canonical_path() is necessary >> there. Maybe the existing uses could be removed? > > This first sentence of this reply seems worthy of particualr > attention. We have to know what problem this is intended to fix before > we try to decide in which cases it's needed. Otherwise, whether we add > it everywhere or remove it everywhere, we'll only know that it's > consistent, not that it's correct. The archeology reveals that these calls where originally added to canonicalize the data_directory and config_file settings (7b0f060d54), but that was then moved out of guc.c to be done early during postmaster startup (337ffcddba). The remaining calls of check_canonical_path() in guc.c appear to be leftovers from a previous regime. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 2, 2020 at 5:04 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > The archeology reveals that these calls where originally added to > canonicalize the data_directory and config_file settings (7b0f060d54), > but that was then moved out of guc.c to be done early during postmaster > startup (337ffcddba). The remaining calls of check_canonical_path() in > guc.c appear to be leftovers from a previous regime. Thanks for looking into it. Sounds like it can just be ripped out, then, unless someone knows of a reason to do otherwise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jun 2, 2020 at 5:04 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> The archeology reveals that these calls where originally added to >> canonicalize the data_directory and config_file settings (7b0f060d54), >> but that was then moved out of guc.c to be done early during postmaster >> startup (337ffcddba). The remaining calls of check_canonical_path() in >> guc.c appear to be leftovers from a previous regime. > Thanks for looking into it. Sounds like it can just be ripped out, > then, unless someone knows of a reason to do otherwise. In the abstract, I agree with Peter's point that we shouldn't alter user-given strings without need. However, I think there's strong reason for canonicalizing the data directory and config file locations. We access those both before and after chdir'ing into the datadir, so we'd better have absolute paths to them --- and at least for the datadir, it's documented that you can initially give it as a path relative to wherever you started the postmaster from. If the other files are only accessed after the chdir happens then we could likely do without canonicalizing them. But ... do we know which directory the user (thought he) specified them with reference to? Forced canonicalization does have the advantage that it's clear to all onlookers how we are interpreting the paths. regards, tom lane
On Wed, Jun 03, 2020 at 02:45:50PM -0400, Tom Lane wrote: > In the abstract, I agree with Peter's point that we shouldn't alter > user-given strings without need. However, I think there's strong > reason for canonicalizing the data directory and config file locations. > We access those both before and after chdir'ing into the datadir, so > we'd better have absolute paths to them --- and at least for the > datadir, it's documented that you can initially give it as a path > relative to wherever you started the postmaster from. If the other > files are only accessed after the chdir happens then we could likely > do without canonicalizing them. But ... do we know which directory > the user (thought he) specified them with reference to? Forced > canonicalization does have the advantage that it's clear to all > onlookers how we are interpreting the paths. Even with the last point... It looks like there is little love for this patch. So it seems to me that this brings the discussion down to two points: shouldn't we document why canonicalization is not done for data_directory, config_file, hba_file and ident_file with some comments in guc.c? Then, why do we apply it to external_pid_file, Log_directory and stats_temp_directory knowing that we chdir to PGDATA in the postmaster before they get used (as far as I can see)? -- Michael
Attachment
On 2020-06-03 20:45, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jun 2, 2020 at 5:04 AM Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> The archeology reveals that these calls where originally added to >>> canonicalize the data_directory and config_file settings (7b0f060d54), >>> but that was then moved out of guc.c to be done early during postmaster >>> startup (337ffcddba). The remaining calls of check_canonical_path() in >>> guc.c appear to be leftovers from a previous regime. > >> Thanks for looking into it. Sounds like it can just be ripped out, >> then, unless someone knows of a reason to do otherwise. > > In the abstract, I agree with Peter's point that we shouldn't alter > user-given strings without need. However, I think there's strong > reason for canonicalizing the data directory and config file locations. It is not proposed to change that. It is only debated whether the same canonicalization should be applied to other GUCs that represent paths. > We access those both before and after chdir'ing into the datadir, so > we'd better have absolute paths to them --- and at least for the > datadir, it's documented that you can initially give it as a path > relative to wherever you started the postmaster from. If the other > files are only accessed after the chdir happens then we could likely > do without canonicalizing them. But ... do we know which directory > the user (thought he) specified them with reference to? Forced > canonicalization does have the advantage that it's clear to all > onlookers how we are interpreting the paths. This (and some other messages in this thread) appears to assume that canonicalize_path() turns relative paths into absolute paths, but AFAICT, it does not do that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > This (and some other messages in this thread) appears to assume that > canonicalize_path() turns relative paths into absolute paths, but > AFAICT, it does not do that. Ah, fair point --- I'd been assuming that we were applying canonicalize_path as cleanup for an absolute-ification operation, but you are right that check_canonical_path does not do that. Digging around, though, I notice a different motivation. In assign_pgstat_temp_directory we have /* check_canonical_path already canonicalized newval for us */ ... tname = guc_malloc(ERROR, strlen(newval) + 12); /* /global.tmp */ sprintf(tname, "%s/global.tmp", newval); fname = guc_malloc(ERROR, strlen(newval) + 13); /* /global.stat */ sprintf(fname, "%s/global.stat", newval); and I believe what the comment is on about is that these path derivation operations are unreliable if newval isn't in canonical form. I seem to remember for example that in some Windows configurations, mixing slashes and backslashes doesn't work. So the real point here is that we could use the user's string unmodified as long as we only use it exactly as-is, but cases where we derive other pathnames from it require more work. Of course, we could leave the GUC string alone and only canonicalize while forming derived paths, but that seems mighty error-prone. In any case, just ripping out the check_canonical_path usages without any other mop-up will break things. regards, tom lane
Re-reading this thread it seems to me that the conclusion is to mark the patch Returned with Feedback in this commitfest, and possibly expand documentation or comments on path canonicalization in the code at some point. Does that seem fair? cheers ./daniel
On Thu, Jul 09, 2020 at 02:19:06PM +0200, Daniel Gustafsson wrote: > Re-reading this thread it seems to me that the conclusion is to mark the patch > Returned with Feedback in this commitfest, and possibly expand documentation or > comments on path canonicalization in the code at some point. > > Does that seem fair? Yes, that's fair as there is visibly a consensus that we could perhaps remove some of the canonicalization, but not expand it. There is nothing preventing to live with the current things in place either, so done. -- Michael