Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id 20210905202800.ji4fnfs3xzhvo7l6@alap3.anarazel.de
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
List pgsql-hackers
On 2021-09-05 14:22:51 +0530, Dilip Kumar wrote:
> On Sat, Sep 4, 2021 at 3:24 AM Andres Freund <andres@anarazel.de> wrote:
> 
> > Hi,
> >
> > On 2021-09-03 14:25:10 +0530, Dilip Kumar wrote:
> > > Yeah, we can surely lock the relation as described by Robert, but IMHO,
> > > while creating the database we are already holding the exclusive lock on
> > > the database and there is no one else allowed to be connected to the
> > > database, so do we actually need to bother about the lock for the
> > > correctness?
> >
> > The problem is that checkpointer, bgwriter, buffer reclaim don't care about
> > the database of the buffer they're working on... The exclusive lock on the
> > database doesn't change anything about that.
> 
> 
> But these directly operate on the buffers and In my patch, whether we are
> reading the pg_class for identifying the relfilenode or we are copying the
> relation block by block we are always holding the lock on the buffer.

I don't think a buffer lock is really sufficient. See e.g. code like:

static void
InvalidateBuffer(BufferDesc *buf)
{
...
    /*
     * We assume the only reason for it to be pinned is that someone else is
     * flushing the page out.  Wait for them to finish.  (This could be an
     * infinite loop if the refcount is messed up... it would be nice to time
     * out after awhile, but there seems no way to be sure how many loops may
     * be needed.  Note that if the other guy has pinned the buffer but not
     * yet done StartBufferIO, WaitIO will fall through and we'll effectively
     * be busy-looping here.)
     */
    if (BUF_STATE_GET_REFCOUNT(buf_state) != 0)
    {
        UnlockBufHdr(buf, buf_state);
        LWLockRelease(oldPartitionLock);
        /* safety check: should definitely not be our *own* pin */
        if (GetPrivateRefCount(BufferDescriptorGetBuffer(buf)) > 0)
            elog(ERROR, "buffer is pinned in InvalidateBuffer");
        WaitIO(buf);
        goto retry;
    }

IOW, currently we assume that you're only allowed to pin a block in a relation
while you hold a lock on the relation. It might be a good idea to change that,
but it's not as trivial as one might think - consider e.g. dropping a
relation's buffers while holding an exclusive lock: If there's potential
concurrent reads of that buffer we'd be in trouble.


> 3. While copying the relation whether to use the bufmgr or directly use the
> smgr?
> 
> If we use the bufmgr then maybe we can avoid flushing some of the buffers
> to the disk and save some I/O but in general we copy from the template
> database so there might not be a lot of dirty buffers and we might not save
> anything

I would assume the big benefit would be that the *target* database does not
have to be written out / shared buffer is immediately populated.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: use-regular-expressions-to-simplify-less_greater-and-not_equals.patch
Next
From: Andres Freund
Date:
Subject: Re: stat() vs ERROR_DELETE_PENDING, round N + 1