Thread: BUG #14929: Unchecked AllocateDir() return value inrestoreTwoPhaseData()

BUG #14929: Unchecked AllocateDir() return value inrestoreTwoPhaseData()

From
bianpan2016@163.com
Date:
The following bug has been logged on the website:

Bug reference:      14929
Logged by:          Pan Bian
Email address:      bianpan2016@163.com
PostgreSQL version: 10.1
Operating system:   Linux
Description:

File: src/backend/access/transam/twophase.c
Function: restoreTwoPhaseData
Line: 1738

AllocateDir() will return a NULL pointer if it fails to open the specified
directory. However, in function restoreTwoPhaseData(), its return value is
not checked. This may result in a NULL pointer dereference when trying to
free it (see line 1759).

For your convenience, I copy and paste related codes as follows:

1732 void
1733 restoreTwoPhaseData(void)
1734 {
1735     DIR        *cldir;
1736     struct dirent *clde;
1737 
1738     cldir = AllocateDir(TWOPHASE_DIR);
1739     LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
1740     while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
1741     {
         ...
1758     LWLockRelease(TwoPhaseStateLock);
1759     FreeDir(cldir);
1760 }

Thank you!

Pan Bian


Re: BUG #14929: Unchecked AllocateDir() return value inrestoreTwoPhaseData()

From
Amit Langote
Date:
On 2017/11/27 18:31, bianpan2016@163.com wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      14929
> Logged by:          Pan Bian
> Email address:      bianpan2016@163.com
> PostgreSQL version: 10.1
> Operating system:   Linux
> Description:        
> 
> File: src/backend/access/transam/twophase.c
> Function: restoreTwoPhaseData
> Line: 1738
> 
> AllocateDir() will return a NULL pointer if it fails to open the specified
> directory. However, in function restoreTwoPhaseData(), its return value is
> not checked. This may result in a NULL pointer dereference when trying to
> free it (see line 1759).
> 
> For your convenience, I copy and paste related codes as follows:
> 
> 1732 void
> 1733 restoreTwoPhaseData(void)
> 1734 {
> 1735     DIR        *cldir;
> 1736     struct dirent *clde;
> 1737 
> 1738     cldir = AllocateDir(TWOPHASE_DIR);
> 1739     LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
> 1740     while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
> 1741     {
>          ...
> 1758     LWLockRelease(TwoPhaseStateLock);
> 1759     FreeDir(cldir);
> 1760 }

Thanks for the report.

It seems like a good idea to check cldir for NULL before freeing.  Please
find attached a patch to implement the same.

Thanks,
Amit

Attachment

Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

From
Michael Paquier
Date:
On Mon, Nov 27, 2017 at 6:31 PM,  <bianpan2016@163.com> wrote:
> AllocateDir() will return a NULL pointer if it fails to open the specified
> directory. However, in function restoreTwoPhaseData(), its return value is
> not checked. This may result in a NULL pointer dereference when trying to
> free it (see line 1759).

You are missing the fact that ReadDir goes through ReadDirExtended,
which drops an ERROR log if the folder allocated is NULL.
-- 
Michael


Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

From
Michael Paquier
Date:
On Mon, Nov 27, 2017 at 7:35 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> It seems like a good idea to check cldir for NULL before freeing.  Please
> find attached a patch to implement the same.

There are other places where this could be added like
CheckPointLogicalRewriteHeap() if you'd like to make the code more
consistent, but my opinion would be on the contrary to drop all the
checks after AllocateDir if ReadDir is directly used. Not worth
bothering at the end.
-- 
Michael


Re: BUG #14929: Unchecked AllocateDir() return value inrestoreTwoPhaseData()

From
Amit Langote
Date:
On 2017/11/27 19:53, Michael Paquier wrote:
> On Mon, Nov 27, 2017 at 6:31 PM,  <bianpan2016@163.com> wrote:
>> AllocateDir() will return a NULL pointer if it fails to open the specified
>> directory. However, in function restoreTwoPhaseData(), its return value is
>> not checked. This may result in a NULL pointer dereference when trying to
>> free it (see line 1759).
> 
> You are missing the fact that ReadDir goes through ReadDirExtended,
> which drops an ERROR log if the folder allocated is NULL.

I noticed that too, but isn't possible that elevel might be such that we
end up returning to restoreTwoPhaseData() after all and hit the line in it
that will then dereference the NULL cldir?  Maybe, that never happens
because, elevel is never less than ERROR in that code path?

Thanks,
Amit



Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

From
Michael Paquier
Date:
On Mon, Nov 27, 2017 at 8:09 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/11/27 19:53, Michael Paquier wrote:
>> On Mon, Nov 27, 2017 at 6:31 PM,  <bianpan2016@163.com> wrote:
>>> AllocateDir() will return a NULL pointer if it fails to open the specified
>>> directory. However, in function restoreTwoPhaseData(), its return value is
>>> not checked. This may result in a NULL pointer dereference when trying to
>>> free it (see line 1759).
>>
>> You are missing the fact that ReadDir goes through ReadDirExtended,
>> which drops an ERROR log if the folder allocated is NULL.
>
> I noticed that too, but isn't possible that elevel might be such that we
> end up returning to restoreTwoPhaseData() after all and hit the line in it
> that will then dereference the NULL cldir?  Maybe, that never happens
> because, elevel is never less than ERROR in that code path?

I don't quite follow. ReadDir is called at least once via
ReadDirExtended with elevel = ERROR. So if the input folder is NULL,
the call to ReadDir directly fails so you would not face a pointer
dereference. Something that can turn wrong though is the error message
generated if errno is changed in LWLockAcquire between the calls of
AllocateDir and ReadDir. For that the current code pattern is actually
incorrect.
-- 
Michael


On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote:
> On Mon, Nov 27, 2017 at 6:31 PM,  <bianpan2016@163.com> wrote:
> > AllocateDir() will return a NULL pointer if it fails to open the specified
> > directory. However, in function restoreTwoPhaseData(), its return value is
> > not checked. This may result in a NULL pointer dereference when trying to
> > free it (see line 1759).
> 
> You are missing the fact that ReadDir goes through ReadDirExtended,
> which drops an ERROR log if the folder allocated is NULL.

You are right. Its my carelessness. ReadDir will not return back on a
NULL dir parameter. The code is bug free. Sorry for the trouble.

Thank you all,
Pan Bian

> -- 
> Michael



PanBian <bianpan2016@163.com> writes:
> On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote:
>> You are missing the fact that ReadDir goes through ReadDirExtended,
>> which drops an ERROR log if the folder allocated is NULL.

> You are right. Its my carelessness. ReadDir will not return back on a
> NULL dir parameter. The code is bug free. Sorry for the trouble.

There are some issues here, though:

1. twophase.c does this:
   cldir = AllocateDir(TWOPHASE_DIR);   LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);   while ((clde = ReadDir(cldir,
TWOPHASE_DIR))!= NULL)   { 

which is flat out wrong because LWLockAcquire might well clobber errno.
I don't see any good reason why we couldn't just swap the order of
those two calls.

2. basebackup.c and some other places do things like
       dir = AllocateDir("pg_wal");       if (!dir)           ereport(ERROR,                   (errmsg("could not open
directory\"%s\": %m", "pg_wal")));       while ((de = ReadDir(dir, "pg_wal")) != NULL) 

Not only is this a waste of code, because the error message is no better
than what ReadDir would provide, but it's wrong because it omits
errcode_for_file_access(), causing the SQLSTATE to be reported as XX000.
There are other places that are even lazier and use elog(), failing
translatability as well as the errcode test.

There might be some other problems I missed in a quick scan.

So there's definitely room for a cleanup patch here, but the originally
proposed change isn't it.
        regards, tom lane


Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

From
Michael Paquier
Date:
On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PanBian <bianpan2016@163.com> writes:
>> On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote:
>>> You are missing the fact that ReadDir goes through ReadDirExtended,
>>> which drops an ERROR log if the folder allocated is NULL.
>
>> You are right. Its my carelessness. ReadDir will not return back on a
>> NULL dir parameter. The code is bug free. Sorry for the trouble.
>
> There are some issues here, though:
>
> 1. twophase.c does this:
>
>     cldir = AllocateDir(TWOPHASE_DIR);
>     LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>     while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
>     {
>
> which is flat out wrong because LWLockAcquire might well clobber errno.
> I don't see any good reason why we couldn't just swap the order of
> those two calls.

I have not checked if it actually updates errno or not, but relying on
the fact that it may do it sucks.

> 2. basebackup.c and some other places do things like
>
>         dir = AllocateDir("pg_wal");
>         if (!dir)
>             ereport(ERROR,
>                     (errmsg("could not open directory \"%s\": %m", "pg_wal")));
>         while ((de = ReadDir(dir, "pg_wal")) != NULL)
>
> Not only is this a waste of code, because the error message is no better
> than what ReadDir would provide, but it's wrong because it omits
> errcode_for_file_access(), causing the SQLSTATE to be reported as XX000.
> There are other places that are even lazier and use elog(), failing
> translatability as well as the errcode test.

I agree with using ereport() everywhere, a path may have been created
by initdb, but anything used after a base backup should be reported to
the user.

> There might be some other problems I missed in a quick scan.
>
> So there's definitely room for a cleanup patch here, but the originally
> proposed change isn't it.

I have spotted more problems. In pg_available_extensions,
AllocateDir() does nothing in the event of an error but forgets to
reset errno. In perform_base_backup, CheckXLogRemoved() is called
without saving errno, so the error message generated after may be
wrong.

I think that this requires its own new thread with a more extended
analysis on -hackers to attract attention, this goes way beyond the
original complain about a pointer dereference. And there is a
collection of small issues. I'll try to look at that after I am done
with my CF duties except if someone beats me or volunteers for it.
-- 
Michael


Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 1. twophase.c does this:
>>     cldir = AllocateDir(TWOPHASE_DIR);
>>     LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>>     while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
>> which is flat out wrong because LWLockAcquire might well clobber errno.

> I have not checked if it actually updates errno or not, but relying on
> the fact that it may do it sucks.

I think it might only change errno in some code paths, for example
LWLockAcquire -> wait needed -> PGSemaphoreLock -> sem_wait gets
interrupted -> EINTR.  Which is the worst of all worlds, because
the report would usually be right and only occasionally mislead you.

>> There might be some other problems I missed in a quick scan.

> I have spotted more problems.

Ugh.

> I think that this requires its own new thread with a more extended
> analysis on -hackers to attract attention,

Agreed.

One thing I was thinking about is that there are places such as
DeleteAllExportedSnapshotFiles and RelationCacheInitFileRemove that
want the error message to come out only at LOG level, not ERROR,
because they're running in the postmaster.  But not only are they
issuing a mostly-duplicative ereport call, but they're really not
protecting themselves fully, because you'd probably want failures
inside ReadDir to be reported at LOG level as well.  So what this
leads me to think is that we should export ReadDirExtended (currently
that's only accessible inside fd.c) and then the use-case would be
solvable with
dir = AllocateDir(path);while ((dirent = ReadDirExtended(dir, path, LOG)) != NULL)    process dirent;FreeDir(dir);

This line of thinking also says that FreeDir needs to be tweaked to
do nothing if dir == NULL, assuming that somebody should have logged
the directory-open failure already.

The other thing that really could use discussion, as you mentioned
upthread, is how useful are custom error messages for AllocateDir
failures.  I'm not real sure for instance thatcould not open write-ahead log directory "pg_wal": ...
is a useful improvement over a generic messagecould not open directory "pg_wal": ...
Experts will know what pg_wal is anyway, and non-experts may not
gain much from being told it's a write-ahead log directory.

I'm prepared to listen to arguments that these custom messages
(or at least some of them) are worth keeping, but I think someone
ought to make that case, rather than letting them stand just because
they're there now.
        regards, tom lane