Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData() - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()
Date
Msg-id 21575.1511831661@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-bugs
Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 1. twophase.c does this:
>>     cldir = AllocateDir(TWOPHASE_DIR);
>>     LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>>     while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
>> which is flat out wrong because LWLockAcquire might well clobber errno.

> I have not checked if it actually updates errno or not, but relying on
> the fact that it may do it sucks.

I think it might only change errno in some code paths, for example
LWLockAcquire -> wait needed -> PGSemaphoreLock -> sem_wait gets
interrupted -> EINTR.  Which is the worst of all worlds, because
the report would usually be right and only occasionally mislead you.

>> There might be some other problems I missed in a quick scan.

> I have spotted more problems.

Ugh.

> I think that this requires its own new thread with a more extended
> analysis on -hackers to attract attention,

Agreed.

One thing I was thinking about is that there are places such as
DeleteAllExportedSnapshotFiles and RelationCacheInitFileRemove that
want the error message to come out only at LOG level, not ERROR,
because they're running in the postmaster.  But not only are they
issuing a mostly-duplicative ereport call, but they're really not
protecting themselves fully, because you'd probably want failures
inside ReadDir to be reported at LOG level as well.  So what this
leads me to think is that we should export ReadDirExtended (currently
that's only accessible inside fd.c) and then the use-case would be
solvable with
dir = AllocateDir(path);while ((dirent = ReadDirExtended(dir, path, LOG)) != NULL)    process dirent;FreeDir(dir);

This line of thinking also says that FreeDir needs to be tweaked to
do nothing if dir == NULL, assuming that somebody should have logged
the directory-open failure already.

The other thing that really could use discussion, as you mentioned
upthread, is how useful are custom error messages for AllocateDir
failures.  I'm not real sure for instance thatcould not open write-ahead log directory "pg_wal": ...
is a useful improvement over a generic messagecould not open directory "pg_wal": ...
Experts will know what pg_wal is anyway, and non-experts may not
gain much from being told it's a write-ahead log directory.

I'm prepared to listen to arguments that these custom messages
(or at least some of them) are worth keeping, but I think someone
ought to make that case, rather than letting them stand just because
they're there now.
        regards, tom lane


pgsql-bugs by date:

Previous
From: Amit Langote
Date:
Subject: Re: BUG #14928: Unchecked SearchSysCacheCopy1() return value
Next
From: Tom Lane
Date:
Subject: Re: BUG #14931: Unchecked attnum value in ATExecAlterColumnType()