Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction
Date
Msg-id 20200328183904.GI20103@telsasoft.com
Whole thread Raw
In response to Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Mar 28, 2020 at 01:13:54PM -0400, Tom Lane wrote:
> The buildfarm just showed up another instability in the test cases
> we added:

Yea, as you said, this is an issue with the *testcase*.  The function behavior
didn't change, we just weren't previously exercising it.

>  select (w).size = :segsize as ok
>  from (select pg_ls_waldir() w) ss where length((w).name) = 24 limit 1;
> - ok 
> -----
> - t
> -(1 row)
> -
> +ERROR:  could not stat file "pg_wal/000000010000000000000078": No such file or directory
>  select count(*) >= 0 as ok from pg_ls_archive_statusdir();
>   ok 
>  ----
> 
> It's pretty obvious what happened here: concurrent activity renamed or
> removed the WAL segment between when we saw it in the directory and
> when we tried to stat() it.
> 
> This seems like it would be just as much of a hazard for field usage
> as it is for regression testing,

That's clearly true for pg_ls_waldir(), which call pg_ls_dir_files, and
includes some metadata columns.

> so I propose that we fix these directory-scanning functions to silently
> ignore ENOENT failures from stat().  Are there any for which we should not do
> that?

I think it's reasonable to ignore transient ENOENT for tmpdir, logdir, and
probably archive_statusdir.  That doesn't currently affect pg_ls_dir(), which
lists file but not metadata for an arbitrary dir, so doesn't call stat().

Note that dangling links in the other functions currently cause (wrong [0])
error.  I guess it should be documented if broken link is will be ignored due
to ENOENT.

Maybe we should lstat() the file to determine if it's a dangling link; if
lstat() fails, then skip it.  Currently, we use stat(), which shows metdata of
a link's *target*.  Maybe we'd change that.

Note that I have a patch which generalizes pg_ls_dir_files and makes
pg_ls_dir() a simple wrapper, so if that's pursued, they would behave the same
unless I add another flag to do otherwise (but behaving the same has its
merits).  It already uses lstat() to show links to dirs as isdir=no, which was
needed to avoid recursing into links-to-dirs in the new helper function
pg_ls_dir_recurse().  https://commitfest.postgresql.org/26/2377/

[0] Which you fixed in 085b6b667 and I previously fixed at:
https://www.postgresql.org/message-id/attachment/106478/v2-0001-BUG-in-errmsg.patch
|$ sudo ln -s foo /var/log/postgresql/bar
|ts=# SELECT * FROM pg_ls_logdir() ORDER BY 3;
|ERROR:  could not stat directory "/var/log/postgresql": No such file or directory



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pgbench - refactor init functions with buffers
Next
From: Tom Lane
Date:
Subject: Re: pgbench - refactor init functions with buffers