Thread: BUG #14929: Unchecked AllocateDir() return value inrestoreTwoPhaseData()
BUG #14929: Unchecked AllocateDir() return value inrestoreTwoPhaseData()
From
bianpan2016@163.com
Date:
The following bug has been logged on the website: Bug reference: 14929 Logged by: Pan Bian Email address: bianpan2016@163.com PostgreSQL version: 10.1 Operating system: Linux Description: File: src/backend/access/transam/twophase.c Function: restoreTwoPhaseData Line: 1738 AllocateDir() will return a NULL pointer if it fails to open the specified directory. However, in function restoreTwoPhaseData(), its return value is not checked. This may result in a NULL pointer dereference when trying to free it (see line 1759). For your convenience, I copy and paste related codes as follows: 1732 void 1733 restoreTwoPhaseData(void) 1734 { 1735 DIR *cldir; 1736 struct dirent *clde; 1737 1738 cldir = AllocateDir(TWOPHASE_DIR); 1739 LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); 1740 while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL) 1741 { ... 1758 LWLockRelease(TwoPhaseStateLock); 1759 FreeDir(cldir); 1760 } Thank you! Pan Bian
Re: BUG #14929: Unchecked AllocateDir() return value inrestoreTwoPhaseData()
From
Amit Langote
Date:
On 2017/11/27 18:31, bianpan2016@163.com wrote: > The following bug has been logged on the website: > > Bug reference: 14929 > Logged by: Pan Bian > Email address: bianpan2016@163.com > PostgreSQL version: 10.1 > Operating system: Linux > Description: > > File: src/backend/access/transam/twophase.c > Function: restoreTwoPhaseData > Line: 1738 > > AllocateDir() will return a NULL pointer if it fails to open the specified > directory. However, in function restoreTwoPhaseData(), its return value is > not checked. This may result in a NULL pointer dereference when trying to > free it (see line 1759). > > For your convenience, I copy and paste related codes as follows: > > 1732 void > 1733 restoreTwoPhaseData(void) > 1734 { > 1735 DIR *cldir; > 1736 struct dirent *clde; > 1737 > 1738 cldir = AllocateDir(TWOPHASE_DIR); > 1739 LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); > 1740 while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL) > 1741 { > ... > 1758 LWLockRelease(TwoPhaseStateLock); > 1759 FreeDir(cldir); > 1760 } Thanks for the report. It seems like a good idea to check cldir for NULL before freeing. Please find attached a patch to implement the same. Thanks, Amit
Attachment
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()
From
Michael Paquier
Date:
On Mon, Nov 27, 2017 at 6:31 PM, <bianpan2016@163.com> wrote: > AllocateDir() will return a NULL pointer if it fails to open the specified > directory. However, in function restoreTwoPhaseData(), its return value is > not checked. This may result in a NULL pointer dereference when trying to > free it (see line 1759). You are missing the fact that ReadDir goes through ReadDirExtended, which drops an ERROR log if the folder allocated is NULL. -- Michael
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()
From
Michael Paquier
Date:
On Mon, Nov 27, 2017 at 7:35 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > It seems like a good idea to check cldir for NULL before freeing. Please > find attached a patch to implement the same. There are other places where this could be added like CheckPointLogicalRewriteHeap() if you'd like to make the code more consistent, but my opinion would be on the contrary to drop all the checks after AllocateDir if ReadDir is directly used. Not worth bothering at the end. -- Michael
Re: BUG #14929: Unchecked AllocateDir() return value inrestoreTwoPhaseData()
From
Amit Langote
Date:
On 2017/11/27 19:53, Michael Paquier wrote: > On Mon, Nov 27, 2017 at 6:31 PM, <bianpan2016@163.com> wrote: >> AllocateDir() will return a NULL pointer if it fails to open the specified >> directory. However, in function restoreTwoPhaseData(), its return value is >> not checked. This may result in a NULL pointer dereference when trying to >> free it (see line 1759). > > You are missing the fact that ReadDir goes through ReadDirExtended, > which drops an ERROR log if the folder allocated is NULL. I noticed that too, but isn't possible that elevel might be such that we end up returning to restoreTwoPhaseData() after all and hit the line in it that will then dereference the NULL cldir? Maybe, that never happens because, elevel is never less than ERROR in that code path? Thanks, Amit
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()
From
Michael Paquier
Date:
On Mon, Nov 27, 2017 at 8:09 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/11/27 19:53, Michael Paquier wrote: >> On Mon, Nov 27, 2017 at 6:31 PM, <bianpan2016@163.com> wrote: >>> AllocateDir() will return a NULL pointer if it fails to open the specified >>> directory. However, in function restoreTwoPhaseData(), its return value is >>> not checked. This may result in a NULL pointer dereference when trying to >>> free it (see line 1759). >> >> You are missing the fact that ReadDir goes through ReadDirExtended, >> which drops an ERROR log if the folder allocated is NULL. > > I noticed that too, but isn't possible that elevel might be such that we > end up returning to restoreTwoPhaseData() after all and hit the line in it > that will then dereference the NULL cldir? Maybe, that never happens > because, elevel is never less than ERROR in that code path? I don't quite follow. ReadDir is called at least once via ReadDirExtended with elevel = ERROR. So if the input folder is NULL, the call to ReadDir directly fails so you would not face a pointer dereference. Something that can turn wrong though is the error message generated if errno is changed in LWLockAcquire between the calls of AllocateDir and ReadDir. For that the current code pattern is actually incorrect. -- Michael
On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote: > On Mon, Nov 27, 2017 at 6:31 PM, <bianpan2016@163.com> wrote: > > AllocateDir() will return a NULL pointer if it fails to open the specified > > directory. However, in function restoreTwoPhaseData(), its return value is > > not checked. This may result in a NULL pointer dereference when trying to > > free it (see line 1759). > > You are missing the fact that ReadDir goes through ReadDirExtended, > which drops an ERROR log if the folder allocated is NULL. You are right. Its my carelessness. ReadDir will not return back on a NULL dir parameter. The code is bug free. Sorry for the trouble. Thank you all, Pan Bian > -- > Michael
PanBian <bianpan2016@163.com> writes: > On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote: >> You are missing the fact that ReadDir goes through ReadDirExtended, >> which drops an ERROR log if the folder allocated is NULL. > You are right. Its my carelessness. ReadDir will not return back on a > NULL dir parameter. The code is bug free. Sorry for the trouble. There are some issues here, though: 1. twophase.c does this: cldir = AllocateDir(TWOPHASE_DIR); LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); while ((clde = ReadDir(cldir, TWOPHASE_DIR))!= NULL) { which is flat out wrong because LWLockAcquire might well clobber errno. I don't see any good reason why we couldn't just swap the order of those two calls. 2. basebackup.c and some other places do things like dir = AllocateDir("pg_wal"); if (!dir) ereport(ERROR, (errmsg("could not open directory\"%s\": %m", "pg_wal"))); while ((de = ReadDir(dir, "pg_wal")) != NULL) Not only is this a waste of code, because the error message is no better than what ReadDir would provide, but it's wrong because it omits errcode_for_file_access(), causing the SQLSTATE to be reported as XX000. There are other places that are even lazier and use elog(), failing translatability as well as the errcode test. There might be some other problems I missed in a quick scan. So there's definitely room for a cleanup patch here, but the originally proposed change isn't it. regards, tom lane
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()
From
Michael Paquier
Date:
On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > PanBian <bianpan2016@163.com> writes: >> On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote: >>> You are missing the fact that ReadDir goes through ReadDirExtended, >>> which drops an ERROR log if the folder allocated is NULL. > >> You are right. Its my carelessness. ReadDir will not return back on a >> NULL dir parameter. The code is bug free. Sorry for the trouble. > > There are some issues here, though: > > 1. twophase.c does this: > > cldir = AllocateDir(TWOPHASE_DIR); > LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); > while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL) > { > > which is flat out wrong because LWLockAcquire might well clobber errno. > I don't see any good reason why we couldn't just swap the order of > those two calls. I have not checked if it actually updates errno or not, but relying on the fact that it may do it sucks. > 2. basebackup.c and some other places do things like > > dir = AllocateDir("pg_wal"); > if (!dir) > ereport(ERROR, > (errmsg("could not open directory \"%s\": %m", "pg_wal"))); > while ((de = ReadDir(dir, "pg_wal")) != NULL) > > Not only is this a waste of code, because the error message is no better > than what ReadDir would provide, but it's wrong because it omits > errcode_for_file_access(), causing the SQLSTATE to be reported as XX000. > There are other places that are even lazier and use elog(), failing > translatability as well as the errcode test. I agree with using ereport() everywhere, a path may have been created by initdb, but anything used after a base backup should be reported to the user. > There might be some other problems I missed in a quick scan. > > So there's definitely room for a cleanup patch here, but the originally > proposed change isn't it. I have spotted more problems. In pg_available_extensions, AllocateDir() does nothing in the event of an error but forgets to reset errno. In perform_base_backup, CheckXLogRemoved() is called without saving errno, so the error message generated after may be wrong. I think that this requires its own new thread with a more extended analysis on -hackers to attract attention, this goes way beyond the original complain about a pointer dereference. And there is a collection of small issues. I'll try to look at that after I am done with my CF duties except if someone beats me or volunteers for it. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. twophase.c does this: >> cldir = AllocateDir(TWOPHASE_DIR); >> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); >> while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL) >> which is flat out wrong because LWLockAcquire might well clobber errno. > I have not checked if it actually updates errno or not, but relying on > the fact that it may do it sucks. I think it might only change errno in some code paths, for example LWLockAcquire -> wait needed -> PGSemaphoreLock -> sem_wait gets interrupted -> EINTR. Which is the worst of all worlds, because the report would usually be right and only occasionally mislead you. >> There might be some other problems I missed in a quick scan. > I have spotted more problems. Ugh. > I think that this requires its own new thread with a more extended > analysis on -hackers to attract attention, Agreed. One thing I was thinking about is that there are places such as DeleteAllExportedSnapshotFiles and RelationCacheInitFileRemove that want the error message to come out only at LOG level, not ERROR, because they're running in the postmaster. But not only are they issuing a mostly-duplicative ereport call, but they're really not protecting themselves fully, because you'd probably want failures inside ReadDir to be reported at LOG level as well. So what this leads me to think is that we should export ReadDirExtended (currently that's only accessible inside fd.c) and then the use-case would be solvable with dir = AllocateDir(path);while ((dirent = ReadDirExtended(dir, path, LOG)) != NULL) process dirent;FreeDir(dir); This line of thinking also says that FreeDir needs to be tweaked to do nothing if dir == NULL, assuming that somebody should have logged the directory-open failure already. The other thing that really could use discussion, as you mentioned upthread, is how useful are custom error messages for AllocateDir failures. I'm not real sure for instance thatcould not open write-ahead log directory "pg_wal": ... is a useful improvement over a generic messagecould not open directory "pg_wal": ... Experts will know what pg_wal is anyway, and non-experts may not gain much from being told it's a write-ahead log directory. I'm prepared to listen to arguments that these custom messages (or at least some of them) are worth keeping, but I think someone ought to make that case, rather than letting them stand just because they're there now. regards, tom lane