Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAFiTN-u8VvrH5VuMy+u6V1W1qASm+mf43VxkcWNfc+sqB8r-Mg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Andres Freund <andres@anarazel.de>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
On Wed, Aug 14, 2019 at 10:35 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2019-08-14 14:48:07 +0530, Dilip Kumar wrote: > > On Wed, Aug 14, 2019 at 12:27 PM Andres Freund <andres@anarazel.de> wrote: > I don't think we can normally pin the undo buffers properly at that > stage. Without knowing the correct contents of the table page - which we > can't know without holding some form of lock preventing modifications - > we can't know how big our undo records are going to be. And we can't > just have buffers that don't exist on disk in shared memory, and we > don't want to allocate undo that we then don't need. So I think what > we'd have to do at that stage, is to "pre-allocate" buffers for the > maximum amount of UNDO needed, but mark the associated bufferdesc as not > yet valid. These buffers would have a pincount > 0, but BM_TAG_VALID > would not be set. > > So at the start of a function that will need to insert undo we'd need to > pre-reserve the maximum number of buffers we could potentially > need. That reservation stage would > > a) pin the page with the current end of the undo > b) if needed pin the page of older undo that we need to update (e.g. to > update the next pointer) > c) perform clock sweep etc to acquire (find or create) enough clean to > hold the maximum amount of undo needed. These buffers would be marked > as !BM_TAG_VALID | BUF_REFCOUNT_ONE. > > I assume that we'd make a) cheap by keeping it pinned for undo logs that > a backend is actively attached to. b) should only be needed once in a > transaction, so it's not too bad. c) we'd probably need to amortize > across multiple undo insertions, by keeping the unused buffers pinned > until the end of the transaction. > > I assume that having the infrastructure c) might also make some code > for already in postgres easier. There's obviously some issues around > guaranteeing that the maximum number of such buffers isn't high. I have analyzed this further, I think there is a problem if the record/s will not fit into the current undo log and we will have to switch the log. Because before knowing the actual record length we are not sure whether the undo log will switch or not and which undo log we will get. And, without knowing the logno (rnode) how we are going to pin the buffers? Am I missing something? Thomas do you think we can get around this problem? Apart from this while analyzing the other code I have noticed that in the current PG code we have few occurrences where try to read buffer under the buffer lock held. 1. In gistplacetopage { ... for (; ptr; ptr = ptr->next) { /* Allocate new page */ ptr->buffer = gistNewBuffer(rel); GISTInitBuffer(ptr->buffer, (is_leaf) ? F_LEAF : 0); ptr->page = BufferGetPage(ptr->buffer); ptr->block.blkno = BufferGetBlockNumber(ptr->buffer); } 2. During page split we find new buffer while holding the lock on the current buffer. That doesn't mean that we can't do better but I am just referring to the existing code where we already have such issues. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: