Thread: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
While working on a patch, I noticed this pre-existing behavior, which seems to be new since v11, maybe due to changes to SRF. |postgres=# SELECT pg_ls_dir('.') LIMIT 1; |WARNING: 1 temporary files and directories not closed at end-of-transaction |pg_ls_dir | pg_dynshmem |postgres=# SELECT pg_ls_waldir() LIMIT 1; |WARNING: 1 temporary files and directories not closed at end-of-transaction |-[ RECORD 1 ]+------------------------------------------------------------- |pg_ls_waldir | (00000001000031920000007B,16777216,"2020-03-08 03:50:34-07") Note, that doesn't happen with "SELECT * FROM". I'm not sure what the solution is to that, but my patch was going to make it worse rather than better for pg_ls_tmpdir. -- Justin
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > While working on a patch, I noticed this pre-existing behavior, which seems to > be new since v11, maybe due to changes to SRF. > |postgres=# SELECT pg_ls_dir('.') LIMIT 1; > |WARNING: 1 temporary files and directories not closed at end-of-transaction Hmm, actually it looks to me like pg_ls_dir has been broken forever. The reason the warning didn't show up before v11 is that CleanupTempFiles didn't bleat about leaked "allocated" directories before that (cf 9cb7db3f0). I guess we ought to change that function to use returns-a-tuplestore protocol instead of thinking it can hold a directory open across calls. It's not hard to think of use-cases where the existing behavior would cause issues worse than a nanny-ish WARNING, especially on platforms with tight "ulimit -n" limits. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > > While working on a patch, I noticed this pre-existing behavior, which seems to > > be new since v11, maybe due to changes to SRF. > > > |postgres=# SELECT pg_ls_dir('.') LIMIT 1; > > |WARNING: 1 temporary files and directories not closed at end-of-transaction > > Hmm, actually it looks to me like pg_ls_dir has been broken forever. > The reason the warning didn't show up before v11 is that CleanupTempFiles > didn't bleat about leaked "allocated" directories before that > (cf 9cb7db3f0). > > I guess we ought to change that function to use returns-a-tuplestore > protocol instead of thinking it can hold a directory open across calls. > It's not hard to think of use-cases where the existing behavior would > cause issues worse than a nanny-ish WARNING, especially on platforms > with tight "ulimit -n" limits. Thanks for the analysis. Do you mean it should enumerate all files during the initial SRF call, or use something other than the SRF_* macros ? -- Justin
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote: >> I guess we ought to change that function to use returns-a-tuplestore >> protocol instead of thinking it can hold a directory open across calls. >> It's not hard to think of use-cases where the existing behavior would >> cause issues worse than a nanny-ish WARNING, especially on platforms >> with tight "ulimit -n" limits. > Thanks for the analysis. > Do you mean it should enumerate all files during the initial SRF call, or use > something other than the SRF_* macros ? It has to enumerate all the files during the first call. I suppose it could do that and then still hand back the results one-at-a-time, but there seems little point compared to filling a tuplestore immediately. So probably the SRF_ macros are useless here. Another possible solution is to register an exprstate-shutdown hook to ensure the resource is cleaned up, but I'm not very happy with that because it does nothing to prevent the hazard of overrunning the available resources if you have several of these active at once. I've just finished scanning the source code and concluding that all of these functions are similarly broken: pg_ls_dir pg_ls_dir_files pg_tablespace_databases pg_logdir_ls_internal pg_timezone_names pgrowlocks The first five risk leaking an open directory, the last risks leaking an active tablescan and open relation. I don't see anything in the documentation (either funcapi.h or xfunc.sgml) warning that the function might not get run to completion, either ... regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
I wrote: > I've just finished scanning the source code and concluding that all > of these functions are similarly broken: > pg_ls_dir > pg_ls_dir_files > pg_tablespace_databases > pg_logdir_ls_internal > pg_timezone_names > pgrowlocks BTW, another thing I noticed while looking around is that some of the functions using SRF_RETURN_DONE() think they should clean up memory beforehand. This is a waste of code/cycles, as long as the memory was properly allocated in funcctx->multi_call_memory_ctx, because funcapi.c takes care of deleting that context. We should probably document that *any* manual cleanup before SRF_RETURN_DONE() is an antipattern. If you have to have cleanup, it needs to be done via RegisterExprContextCallback instead. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Sun, Mar 08, 2020 at 03:40:09PM -0400, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > > On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote: > >> I guess we ought to change that function to use returns-a-tuplestore > >> protocol instead of thinking it can hold a directory open across calls. > > Thanks for the analysis. > > Do you mean it should enumerate all files during the initial SRF call, or use > > something other than the SRF_* macros ? > It has to enumerate all the files during the first call. I suppose it > I've just finished scanning the source code and concluding that all > of these functions are similarly broken: > pg_ls_dir_files I patched this one to see what it looks like and to allow /hopefully/ moving forward one way or another with the pg_ls_tmpfile() patch set (or at least avoid trying to do anything there which is too inconsistent with this fix). > I don't see anything in the documentation (either funcapi.h or > xfunc.sgml) warning that the function might not get run to completion, > either ... Also, at first glance, these seem to be passing constant "randomAccess=true" rather than (bool) (rsinfo->allowedModes&SFRM_Materialize_Random) $ git grep -wl SFRM_Materialize |xargs grep -l 'tuplestore_begin_heap(true' contrib/dblink/dblink.c contrib/pageinspect/brinfuncs.c contrib/pg_stat_statements/pg_stat_statements.c src/backend/access/transam/xlogfuncs.c src/backend/commands/event_trigger.c src/backend/commands/extension.c src/backend/foreign/foreign.c src/backend/replication/logical/launcher.c src/backend/replication/logical/logicalfuncs.c src/backend/replication/logical/origin.c src/backend/replication/slotfuncs.c src/backend/replication/walsender.c src/backend/storage/ipc/shmem.c src/backend/utils/adt/pgstatfuncs.c src/backend/utils/misc/guc.c src/backend/utils/misc/pg_config.c -- Justin
Attachment
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: >>> On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote: >>>> I guess we ought to change that function to use returns-a-tuplestore >>>> protocol instead of thinking it can hold a directory open across calls. > I patched this one to see what it looks like and to allow /hopefully/ moving > forward one way or another with the pg_ls_tmpfile() patch set (or at least > avoid trying to do anything there which is too inconsistent with this fix). I reviewed this, added some test cases, and pushed it, so that we can see if the buildfarm finds anything wrong. (I'm not expecting that, because this should all be pretty portable, but you never know.) Assuming not, we need to fix the other functions similarly, and then do something about revising the documentation to warn against this coding style. Do you want to have a go at that? > Also, at first glance, these seem to be passing constant "randomAccess=true" > rather than (bool) (rsinfo->allowedModes&SFRM_Materialize_Random) Hm. Not a bug, but possibly a performance issue, if the tuplestore gets big enough for that to matter. (I think it doesn't matter until we start spilling to temp files.) regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Wed, Mar 11, 2020 at 03:32:38PM -0400, Tom Lane wrote: > > I patched this one to see what it looks like and to allow /hopefully/ moving > > forward one way or another with the pg_ls_tmpfile() patch set (or at least > > avoid trying to do anything there which is too inconsistent with this fix). > > I reviewed this, added some test cases, and pushed it, so that we can see Thanks, tests were on my periphery.. | In passing, fix bogus error report for stat() failure: it was | whining about the directory when it should be fingering the | individual file. Doubtless a copy-and-paste error. Thanks again ; that was my 0001 patch on the other thread. No rebase conflict even ;) https://www.postgresql.org/message-id/20191228101650.GG12890%40telsasoft.com > Do you want to have a go at that? First draft attached. Note that I handled pg_ls_dir, even though I'm proposing on the other thread to collapse/merge/meld it with pg_ls_dir_files [0]. Possibly that's a bad idea with tuplestore, due to returning a scalar vs a row and needing to conditionally call CreateTemplateTupleDesc vs get_call_result_type. I'll rebase that patch later today. I didn't write test cases yet. Also didn't look for functions not on your list. I noticed this doesn't actually do anything, but kept it for now...except in pg_ls_dir error case: src/include/utils/tuplestore.h:/* tuplestore_donestoring() used to be required, but is no longer used */ src/include/utils/tuplestore.h:#define tuplestore_donestoring(state) ((void) 0) I found a few documentation bits that I think aren't relevant, but could possibly be misread to encourage the bad coding practice. This is about *sql* functions: |37.5.8. SQL Functions Returning Sets |When an SQL function is declared as returning SETOF sometype, the function's |final query is executed TO COMPLETION, and each row it outputs is returned as |an element of the result set. |... |Set-returning functions in the select list are always evaluated as though they |are on the inside of a nested-loop join with the rest of the FROM clause, so |that the function(s) are run TO COMPLETION before the next row from the FROM |clause is considered. -- Justin [0] https://www.postgresql.org/message-id/20200310183037.GA29065%40telsasoft.com v9-0008-generalize-pg_ls_dir_files-and-retire-pg_ls_dir.patch
Attachment
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Sun, Mar 08, 2020 at 04:30:44PM -0400, Tom Lane wrote: > BTW, another thing I noticed while looking around is that some of > the functions using SRF_RETURN_DONE() think they should clean up > memory beforehand. This is a waste of code/cycles, as long as the > memory was properly allocated in funcctx->multi_call_memory_ctx, > because funcapi.c takes care of deleting that context. > > We should probably document that *any* manual cleanup before > SRF_RETURN_DONE() is an antipattern. If you have to have cleanup, > it needs to be done via RegisterExprContextCallback instead. This part appears to be already in place since e4186762ffaa4188e16702e8f4f299ea70988b96: |The memory context that is current when the SRF is called is a transient |context that will be cleared between calls. This means that you do not need to |call pfree on everything you allocated using palloc; it will go away anyway. |However, if you want to allocate any data structures to live across calls, you |need to put them somewhere else. The memory context referenced by |multi_call_memory_ctx is a suitable location for any data that needs to survive |until the SRF is finished running. In most cases, this means that you should |switch into multi_call_memory_ctx while doing the first-call setup. -- Justin
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Alvaro Herrera
Date:
Nitpick: please see c4dcd9144ba6. > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Wed, 11 Mar 2020 10:09:18 -0500 > Subject: [PATCH] SRF: avoid leaking resources if not run to completion > > Change to return a tuplestore populated immediately and returned in full. > > Discussion: https://www.postgresql.org/message-id/20200308173103.GC1357%40telsasoft.com I wonder if this isn't saying that the whole value-per-call protocol is bogus, in that it seems impossible to write a useful function with it. Maybe we should add one final call with a special flag "function shutdown" or something, so that these resources can be released if the SRF isn't run to completion? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I wonder if this isn't saying that the whole value-per-call protocol is > bogus, in that it seems impossible to write a useful function with it. Only if you have a *very* narrow definition of "useful function". If you look through SRF_RETURN_DONE callers, only a small minority are trying to do resource cleanup beforehand. > Maybe we should add one final call with a special flag "function > shutdown" or something, so that these resources can be released if the > SRF isn't run to completion? We already have an appropriate mechanism for cleaning up resources, ie RegisterExprContextCallback. I do not think what you're suggesting could be made to work without an incompatible break in the API for SRFs, so it's not an improvement over telling people more forcefully about how to do resource cleanup correctly. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Thu, Mar 12, 2020 at 07:11:56AM -0500, Justin Pryzby wrote: > > Do you want to have a go at that? > > First draft attached. Note that I handled pg_ls_dir, even though I'm proposing > on the other thread to collapse/merge/meld it with pg_ls_dir_files [0]. > Possibly that's a bad idea with tuplestore, due to returning a scalar vs a row > and needing to conditionally call CreateTemplateTupleDesc vs > get_call_result_type. I'll rebase that patch later today. > > I didn't write test cases yet. Also didn't look for functions not on your > list. > > I noticed this doesn't actually do anything, but kept it for now...except in > pg_ls_dir error case: > > src/include/utils/tuplestore.h:/* tuplestore_donestoring() used to be required, but is no longer used */ > src/include/utils/tuplestore.h:#define tuplestore_donestoring(state) ((void) 0) v2 attached - I will add to next CF in case you want to defer it until later. -- Justin
Attachment
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > v2 attached - I will add to next CF in case you want to defer it until later. Thanks, reviewed and pushed. Since this is a bug fix (at least in part) I didn't want to wait. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Mon, Mar 16, 2020 at 09:38:50PM -0400, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > > v2 attached - I will add to next CF in case you want to defer it until later. > > Thanks, reviewed and pushed. Since this is a bug fix (at least in part) > I didn't want to wait. Thanks for fixing my test case and pushing. -- Justin
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > Thanks for fixing my test case and pushing. The buildfarm just showed up another instability in the test cases we added: =========================== regression.diffs ================ diff -U3 /home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/../pgsql/src/test/regress/expected/misc_functions.out /home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/misc_functions.out --- /home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/../pgsql/src/test/regress/expected/misc_functions.out 2020-03-1708:14:50.292037956 +0100 +++ /home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/misc_functions.out 2020-03-2813:55:12.490024822 +0100 @@ -169,11 +169,7 @@ select (w).size = :segsize as ok from (select pg_ls_waldir() w) ss where length((w).name) = 24 limit 1; - ok ----- - t -(1 row) - +ERROR: could not stat file "pg_wal/000000010000000000000078": No such file or directory select count(*) >= 0 as ok from pg_ls_archive_statusdir(); ok ---- It's pretty obvious what happened here: concurrent activity renamed or removed the WAL segment between when we saw it in the directory and when we tried to stat() it. This seems like it would be just as much of a hazard for field usage as it is for regression testing, so I propose that we fix these directory-scanning functions to silently ignore ENOENT failures from stat(). Are there any for which we should not do that? regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Sat, Mar 28, 2020 at 01:13:54PM -0400, Tom Lane wrote: > The buildfarm just showed up another instability in the test cases > we added: Yea, as you said, this is an issue with the *testcase*. The function behavior didn't change, we just weren't previously exercising it. > select (w).size = :segsize as ok > from (select pg_ls_waldir() w) ss where length((w).name) = 24 limit 1; > - ok > ----- > - t > -(1 row) > - > +ERROR: could not stat file "pg_wal/000000010000000000000078": No such file or directory > select count(*) >= 0 as ok from pg_ls_archive_statusdir(); > ok > ---- > > It's pretty obvious what happened here: concurrent activity renamed or > removed the WAL segment between when we saw it in the directory and > when we tried to stat() it. > > This seems like it would be just as much of a hazard for field usage > as it is for regression testing, That's clearly true for pg_ls_waldir(), which call pg_ls_dir_files, and includes some metadata columns. > so I propose that we fix these directory-scanning functions to silently > ignore ENOENT failures from stat(). Are there any for which we should not do > that? I think it's reasonable to ignore transient ENOENT for tmpdir, logdir, and probably archive_statusdir. That doesn't currently affect pg_ls_dir(), which lists file but not metadata for an arbitrary dir, so doesn't call stat(). Note that dangling links in the other functions currently cause (wrong [0]) error. I guess it should be documented if broken link is will be ignored due to ENOENT. Maybe we should lstat() the file to determine if it's a dangling link; if lstat() fails, then skip it. Currently, we use stat(), which shows metdata of a link's *target*. Maybe we'd change that. Note that I have a patch which generalizes pg_ls_dir_files and makes pg_ls_dir() a simple wrapper, so if that's pursued, they would behave the same unless I add another flag to do otherwise (but behaving the same has its merits). It already uses lstat() to show links to dirs as isdir=no, which was needed to avoid recursing into links-to-dirs in the new helper function pg_ls_dir_recurse(). https://commitfest.postgresql.org/26/2377/ [0] Which you fixed in 085b6b667 and I previously fixed at: https://www.postgresql.org/message-id/attachment/106478/v2-0001-BUG-in-errmsg.patch |$ sudo ln -s foo /var/log/postgresql/bar |ts=# SELECT * FROM pg_ls_logdir() ORDER BY 3; |ERROR: could not stat directory "/var/log/postgresql": No such file or directory
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > On Sat, Mar 28, 2020 at 01:13:54PM -0400, Tom Lane wrote: >> so I propose that we fix these directory-scanning functions to silently >> ignore ENOENT failures from stat(). Are there any for which we should not do >> that? > Maybe we should lstat() the file to determine if it's a dangling link; if > lstat() fails, then skip it. Currently, we use stat(), which shows metdata of > a link's *target*. Maybe we'd change that. Hm, good point that ENOENT could refer to a symlink's target. Still, I'm not sure it's worth going out of our way to disambiguate that, given that these directories aren't really supposed to contain symlinks. (And on the third hand, if they aren't supposed to, then maybe these functions needn't look through any symlinks? In which case just substituting lstat for stat would resolve the ambiguity.) > Note that I have a patch which generalizes pg_ls_dir_files and makes > pg_ls_dir() a simple wrapper, so if that's pursued, they would behave the same > unless I add another flag to do otherwise (but behaving the same has its > merits). It already uses lstat() to show links to dirs as isdir=no, which was > needed to avoid recursing into links-to-dirs in the new helper function > pg_ls_dir_recurse(). https://commitfest.postgresql.org/26/2377/ I think we need a back-patchable fix for the ENOENT failure, seeing that we back-patched the new regression test; intermittent buildfarm failures are no fun in any branch. So new functions aren't too relevant here, although it's fair to look ahead at whether the same behavior will be appropriate for them. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
I wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: >> Maybe we should lstat() the file to determine if it's a dangling link; if >> lstat() fails, then skip it. Currently, we use stat(), which shows metdata of >> a link's *target*. Maybe we'd change that. > Hm, good point that ENOENT could refer to a symlink's target. Still, > I'm not sure it's worth going out of our way to disambiguate that, > given that these directories aren't really supposed to contain symlinks. > (And on the third hand, if they aren't supposed to, then maybe these > functions needn't look through any symlinks? In which case just > substituting lstat for stat would resolve the ambiguity.) After looking at the callers of pg_ls_dir_files, and noticing that it's already defined to ignore anything that's not a regular file, I think switching to lstat makes sense. I also grepped the other uses of ReadDir[Extended], and didn't see any other ones that seemed desperately in need of changing. So the attached seems like a sufficient fix. regards, tom lane diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c index 01185f2..8429a12 100644 --- a/src/backend/utils/adt/genfile.c +++ b/src/backend/utils/adt/genfile.c @@ -596,10 +596,15 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok) /* Get the file info */ snprintf(path, sizeof(path), "%s/%s", dir, de->d_name); - if (stat(path, &attrib) < 0) + if (lstat(path, &attrib) < 0) + { + /* Ignore concurrently-deleted files, else complain */ + if (errno == ENOENT) + continue; ereport(ERROR, (errcode_for_file_access(), errmsg("could not stat file \"%s\": %m", path))); + } /* Ignore anything but regular files */ if (!S_ISREG(attrib.st_mode))
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Sun, Mar 29, 2020 at 12:37:05PM -0400, Tom Lane wrote: > I wrote: > > Justin Pryzby <pryzby@telsasoft.com> writes: > >> Maybe we should lstat() the file to determine if it's a dangling link; if > >> lstat() fails, then skip it. Currently, we use stat(), which shows metdata of > >> a link's *target*. Maybe we'd change that. > > > Hm, good point that ENOENT could refer to a symlink's target. Still, > > I'm not sure it's worth going out of our way to disambiguate that, > > given that these directories aren't really supposed to contain symlinks. > > (And on the third hand, if they aren't supposed to, then maybe these > > functions needn't look through any symlinks? In which case just > > substituting lstat for stat would resolve the ambiguity.) > > After looking at the callers of pg_ls_dir_files, and noticing that > it's already defined to ignore anything that's not a regular file, > I think switching to lstat makes sense. Yea, only pg_ls_dir() shows special file types (and currently the others even hide dirs). The essence of your patch is to ignore ENOENT, but you also changed to use lstat(), which seems unrelated. That means we'll now hide (non-broken) symlinks. Is that intentional/needed ? I guess maybe you're trying to fix the bug (?) that symlinks aren't skipped? If so, I guess it should be a separate commit, or the commit message should say so. I think the doc update is already handled by: 8b6d94cf6c8319bfd6bebf8b863a5db586c19c3b (we didn't used to say we skipped specials, and now we say we do, and we'll to follow through RSN and actually do it, too). > diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c > index 01185f2..8429a12 100644 > --- a/src/backend/utils/adt/genfile.c > +++ b/src/backend/utils/adt/genfile.c > @@ -596,10 +596,15 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok) > > /* Get the file info */ > snprintf(path, sizeof(path), "%s/%s", dir, de->d_name); > - if (stat(path, &attrib) < 0) > + if (lstat(path, &attrib) < 0) > + { > + /* Ignore concurrently-deleted files, else complain */ > + if (errno == ENOENT) > + continue; > ereport(ERROR, > (errcode_for_file_access(), > errmsg("could not stat file \"%s\": %m", path))); > + } > > /* Ignore anything but regular files */ > if (!S_ISREG(attrib.st_mode))
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > On Sun, Mar 29, 2020 at 12:37:05PM -0400, Tom Lane wrote: >> After looking at the callers of pg_ls_dir_files, and noticing that >> it's already defined to ignore anything that's not a regular file, >> I think switching to lstat makes sense. > Yea, only pg_ls_dir() shows special file types (and currently the others even > hide dirs). > The essence of your patch is to ignore ENOENT, but you also changed to use > lstat(), which seems unrelated. That means we'll now hide (non-broken) > symlinks. Is that intentional/needed ? Well, the following comment says "ignore anything but regular files", so I'm supposing that that is the behavior that we actually want here and failed to implement correctly. There might be scope for additional directory-reading functions, but I'd think you'd want more information (such as the file type) returned from anything that doesn't act this way. In practice, since these directories shouldn't contain symlinks, it's likely moot. The only place in PG data directories where we actually expect symlinks is pg_tablespace ... and that contains symlinks to directories, so that this function would ignore them anyway. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Sun, Mar 29, 2020 at 01:22:04PM -0400, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > > On Sun, Mar 29, 2020 at 12:37:05PM -0400, Tom Lane wrote: > >> After looking at the callers of pg_ls_dir_files, and noticing that > >> it's already defined to ignore anything that's not a regular file, > >> I think switching to lstat makes sense. > > > Yea, only pg_ls_dir() shows special file types (and currently the others even > > hide dirs). > > > The essence of your patch is to ignore ENOENT, but you also changed to use > > lstat(), which seems unrelated. That means we'll now hide (non-broken) > > symlinks. Is that intentional/needed ? > > Well, the following comment says "ignore anything but regular files", > so I'm supposing that that is the behavior that we actually want here > and failed to implement correctly. There might be scope for > additional directory-reading functions, but I'd think you'd want > more information (such as the file type) returned from anything > that doesn't act this way. Maybe pg_stat_file() deserves similar attention ? Right now, it'll fail on a broken link. If we changed it to lstat(), then it'd work, but it'd also show metadata for the *link* rather than its target. Patch proposed as v14-0001 patch here may be relevant: https://www.postgresql.org/message-id/20200317190401.GY26184%40telsasoft.com - indicating if it is a directory. Typical usages include: + indicating if it is a directory (or a symbolic link to a directory). ... > In practice, since these directories shouldn't contain symlinks, > it's likely moot. The only place in PG data directories where > we actually expect symlinks is pg_tablespace ... and that contains > symlinks to directories, so that this function would ignore them > anyway. I wouldn't hesitate to make symlinks, at least in log. It's surprising when files are hidden, but I won't argue about the best behavior here. I'm thinking of distributions or local configurations that use /var/log/postgresql. I didn't remember or didn't realize, but it looks like debian's packages use logging_collector=off and then launch postmaster with 2> /var/log/postgres/... It seems reasonable to do something like: log/huge-querylog.csv => /zfs/compressed/... -- Justin
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Fabien COELHO
Date:
Hello Justin, >> Well, the following comment says "ignore anything but regular files", >> so I'm supposing that that is the behavior that we actually want here >> and failed to implement correctly. There might be scope for >> additional directory-reading functions, but I'd think you'd want >> more information (such as the file type) returned from anything >> that doesn't act this way. > > Maybe pg_stat_file() deserves similar attention ? Right now, it'll fail on a > broken link. If we changed it to lstat(), then it'd work, but it'd also show > metadata for the *link* rather than its target. Yep. I think this traditional answer is the rational answer. As I wrote about an earlier version of the patch, ISTM that instead of reinventing, extending, adapting various ls variants (with/without metadata, which show only files, which shows target of links, which shows directory, etc.) we would just need *one* postgres "ls" implementation which would be like "ls -la arg" (returns file type, dates), and then everything else is a wrapper around that with appropriate filtering that can be done at the SQL level, like you started with recurse. It would reduce the amount of C code and I find the SQL-level approach quite elegant. -- Fabien.
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes: > As I wrote about an earlier version of the patch, ISTM that instead of > reinventing, extending, adapting various ls variants (with/without > metadata, which show only files, which shows target of links, which shows > directory, etc.) we would just need *one* postgres "ls" implementation > which would be like "ls -la arg" (returns file type, dates), and then > everything else is a wrapper around that with appropriate filtering that > can be done at the SQL level, like you started with recurse. Yeah, I agree that some new function that can represent symlinks explicitly in its output is the place to deal with this, for people who want to deal with it. In the meantime, there's still the question of what pg_ls_dir_files should do exactly. Are we content to have it ignore symlinks? I remain inclined to think that's the right thing given its current brief. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Fabien COELHO
Date:
Hello, >> As I wrote about an earlier version of the patch, ISTM that instead of >> reinventing, extending, adapting various ls variants (with/without >> metadata, which show only files, which shows target of links, which shows >> directory, etc.) we would just need *one* postgres "ls" implementation >> which would be like "ls -la arg" (returns file type, dates), and then >> everything else is a wrapper around that with appropriate filtering that >> can be done at the SQL level, like you started with recurse. > > Yeah, I agree that some new function that can represent symlinks > explicitly in its output is the place to deal with this, for > people who want to deal with it. > > In the meantime, there's still the question of what pg_ls_dir_files > should do exactly. Are we content to have it ignore symlinks? > I remain inclined to think that's the right thing given its current > brief. My 0.02€: I agree that it is enough to reproduce the current behavior of various existing pg_ls* functions, but on the other hand outputing a column type char like ls (-, d, l…) looks like really no big deal. I'd say that the only reason not to do it may be to pass this before feature freeze. -- Fabien.
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
From
Justin Pryzby
Date:
On Tue, Mar 31, 2020 at 07:36:03AM +0200, Fabien COELHO wrote: > > > As I wrote about an earlier version of the patch, ISTM that instead of > > > reinventing, extending, adapting various ls variants (with/without > > > metadata, which show only files, which shows target of links, which shows > > > directory, etc.) we would just need *one* postgres "ls" implementation > > > which would be like "ls -la arg" (returns file type, dates), and then > > > everything else is a wrapper around that with appropriate filtering that > > > can be done at the SQL level, like you started with recurse. > > > > Yeah, I agree that some new function that can represent symlinks > > explicitly in its output is the place to deal with this, for > > people who want to deal with it. > > > > In the meantime, there's still the question of what pg_ls_dir_files > > should do exactly. Are we content to have it ignore symlinks? > > I remain inclined to think that's the right thing given its current > > brief. > > My 0.02€: > > I agree that it is enough to reproduce the current behavior of various > existing pg_ls* functions, but on the other hand outputing a column type > char like ls (-, d, l…) looks like really no big deal. I'd say that the only > reason not to do it may be to pass this before feature freeze. Remember, there's two threads here, and this one is about the bug in stable releases ($SUBJECT), and now the instability in the test that was added with its fix. I suggest to leave stat() alone in your patch for stable releases. I think it's okay if we change behavior so that a broken symlink is skipped instead of erroring (as a side effect of skipping ENOENT with stat()). But not okay if we change pg_ls_logdir() to hide symlinks in back braches. -- Justin
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > I suggest to leave stat() alone in your patch for stable releases. I think > it's okay if we change behavior so that a broken symlink is skipped instead of > erroring (as a side effect of skipping ENOENT with stat()). But not okay if we > change pg_ls_logdir() to hide symlinks in back braches. Meh. I'm not really convinced, but in the absence of anyone expressing support for my position, I'll do it that way. I don't think it's worth doing both a stat and lstat to tell the difference between file-is-gone and file-is-a-broken-symlink. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Alexander Lakhin
Date:
Hello hackers, 31.03.2020 19:41, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: >> I suggest to leave stat() alone in your patch for stable releases. I think >> it's okay if we change behavior so that a broken symlink is skipped instead of >> erroring (as a side effect of skipping ENOENT with stat()). But not okay if we >> change pg_ls_logdir() to hide symlinks in back braches. > Meh. I'm not really convinced, but in the absence of anyone expressing > support for my position, I'll do it that way. I don't think it's worth > doing both a stat and lstat to tell the difference between file-is-gone > and file-is-a-broken-symlink. As we've discovered in Bug #[16161], stat() for "concurrently-deleted file" can also return ERROR_ACCESS_DENIED on Windows. It seems that pg_upgradeCheck failures seen on https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=fairywren&br=REL_13_STABLE caused by the same issue. Shouldn't pg_ls_dir_files() retry stat() on ERROR_ACCESS_DENIED just like the pgwin32_open() does to ignore files in "delete pending" state? [16161] https://www.postgresql.org/message-id/16161-7a985d2f1bbe8f71%40postgresql.org Best regards, Alexander
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Alexander Lakhin <exclusion@gmail.com> writes: > Shouldn't pg_ls_dir_files() retry stat() on ERROR_ACCESS_DENIED just > like the pgwin32_open() does to ignore files in "delete pending" state? That would soon lead us to changing every stat() caller in the system to have Windows-specific looping logic. No thanks. If we need to do this, let's put in a Windows wrapper layer comparable to pgwin32_open() for open(). regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Alexander Lakhin
Date:
13.11.2020 23:16, Tom Lane wrote: > Alexander Lakhin <exclusion@gmail.com> writes: >> Shouldn't pg_ls_dir_files() retry stat() on ERROR_ACCESS_DENIED just >> like the pgwin32_open() does to ignore files in "delete pending" state? > That would soon lead us to changing every stat() caller in the system > to have Windows-specific looping logic. No thanks. If we need to do > this, let's put in a Windows wrapper layer comparable to pgwin32_open() > for open(). Maybe pgwin32_safestat() should perform this... For now it checks GetLastError() for ERROR_DELETE_PENDING, but as we've found out previously this error code in fact is not returned by the OS. And if the fix is not going to be quick, probably it should be discussed in another thread... Best regards, Alexander
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
From
Tom Lane
Date:
Alexander Lakhin <exclusion@gmail.com> writes: > 13.11.2020 23:16, Tom Lane wrote: >> That would soon lead us to changing every stat() caller in the system >> to have Windows-specific looping logic. No thanks. If we need to do >> this, let's put in a Windows wrapper layer comparable to pgwin32_open() >> for open(). > Maybe pgwin32_safestat() should perform this... Uh ... now that you mention it, that's gone since bed90759f. There is code in win32stat.c that purports to cope with this case, though I've not tested it personally. regards, tom lane