Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAFiTN-u8VvrH5VuMy+u6V1W1qASm+mf43VxkcWNfc+sqB8r-Mg@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Andres Freund <andres@anarazel.de>)
Responses Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
On Wed, Aug 14, 2019 at 10:35 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-08-14 14:48:07 +0530, Dilip Kumar wrote:
> > On Wed, Aug 14, 2019 at 12:27 PM Andres Freund <andres@anarazel.de> wrote:

> I don't think we can normally pin the undo buffers properly at that
> stage. Without knowing the correct contents of the table page - which we
> can't know without holding some form of lock preventing modifications -
> we can't know how big our undo records are going to be. And we can't
> just have buffers that don't exist on disk in shared memory, and we
> don't want to allocate undo that we then don't need. So I think what
> we'd have to do at that stage, is to "pre-allocate" buffers for the
> maximum amount of UNDO needed, but mark the associated bufferdesc as not
> yet valid. These buffers would have a pincount > 0, but BM_TAG_VALID
> would not be set.
>
> So at the start of a function that will need to insert undo we'd need to
> pre-reserve the maximum number of buffers we could potentially
> need. That reservation stage would
>
> a) pin the page with the current end of the undo
> b) if needed pin the page of older undo that we need to update (e.g. to
>    update the next pointer)
> c) perform clock sweep etc to acquire (find or create) enough clean to
>    hold the maximum amount of undo needed. These buffers would be marked
>    as !BM_TAG_VALID | BUF_REFCOUNT_ONE.
>
> I assume that we'd make a) cheap by keeping it pinned for undo logs that
> a backend is actively attached to. b) should only be needed once in a
> transaction, so it's not too bad. c) we'd probably need to amortize
> across multiple undo insertions, by keeping the unused buffers pinned
> until the end of the transaction.
>
> I assume that having the infrastructure c) might also make some code
> for already in postgres easier. There's obviously some issues around
> guaranteeing that the maximum number of such buffers isn't high.


I have analyzed this further, I think there is a problem if the
record/s will not fit into the current undo log and we will have to
switch the log.  Because before knowing the actual record length we
are not sure whether the undo log will switch or not and which undo
log we will get.  And, without knowing the logno (rnode) how we are
going to pin the buffers?  Am I missing something?

Thomas do you think we can get around this problem?

Apart from this while analyzing the other code I have noticed that in
the current PG code we have few occurrences where try to read buffer
under the buffer lock held.
1.
In gistplacetopage
{
...
for (; ptr; ptr = ptr->next)
{
/* Allocate new page */
ptr->buffer = gistNewBuffer(rel);
GISTInitBuffer(ptr->buffer, (is_leaf) ? F_LEAF : 0);
ptr->page = BufferGetPage(ptr->buffer);
ptr->block.blkno = BufferGetBlockNumber(ptr->buffer);
}
2. During page split we find new buffer while holding the lock on the
current buffer.

That doesn't mean that we can't do better but I am just referring to
the existing code where we already have such issues.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Zedstore - compressed in-core columnar storage
Next
From: Alexander Korotkov
Date:
Subject: Re: Make SQL/JSON error code names match SQL standard