Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date | |
Msg-id | CAFiTN-t1tYG3pgTmw63J5LqpcQfcf+D7XPmMz6nXpaYJLPV5Jw@mail.gmail.com Whole thread Raw |
In response to | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
|
List | pgsql-hackers |
On Sat, Sep 4, 2021 at 3:24 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-09-03 14:25:10 +0530, Dilip Kumar wrote:
> Yeah, we can surely lock the relation as described by Robert, but IMHO,
> while creating the database we are already holding the exclusive lock on
> the database and there is no one else allowed to be connected to the
> database, so do we actually need to bother about the lock for the
> correctness?
The problem is that checkpointer, bgwriter, buffer reclaim don't care about
the database of the buffer they're working on... The exclusive lock on the
database doesn't change anything about that.
But these directly operate on the buffers and In my patch, whether we are reading the pg_class for identifying the relfilenode or we are copying the relation block by block we are always holding the lock on the buffer.
Perhaps you can justify it's safe
because there can't be any dirty buffers or such though.
> I think we already have such a code in multiple places where we bypass the
> shared buffers for copying the relation
> e.g. index_copy_data(), heapam_relation_copy_data().
That's not at all comparable. We hold an exclusive lock on the relation at
that point, and we don't have a separate implementation of reading tuples from
the table or something like that.
Okay, but my example was against the point Robert raised that he feels that bypassing the shared buffer anywhere is hackish. But yeah, I agree his point might be that even if we are using it in existing code we can not justify it.
For moving forward I think the main open concerns we have as of now are
1. Special purpose code of scanning pg_class, so that we can solve it by scanning the source database directory, I think Robert doesn't like this approach because we are directly scanning to directory and bypassing the shared buffers? But this is not any worse than what we have now right? I mean now also we are scanning the directory directly, so only change will be instead of copying files directly we will read file and copy block by block.
2. Another problem is, while copying the relation we are accessing the relation buffers but we are not holding the relation lock, but we are already holding the buffer so I am not sure do we really have a problem here w.r.t checkpointer, bgwriter? But if we have the problem then also we can create the lock tag and acquire the relation lock.
3. While copying the relation whether to use the bufmgr or directly use the smgr?
If we use the bufmgr then maybe we can avoid flushing some of the buffers to the disk and save some I/O but in general we copy from the template database so there might not be a lot of dirty buffers and we might not save anything, OTOH, if we directly use the smgr for copying the relation data we can reuse some existing code RelationCopyStorage() and the patch will be simpler. Other than just code simplicity or IO there is also a concern by Robert that he doesn't like to bypass the bufmgr, and that will be applicable to the point #1 as well as #3.
Thoughts?
pgsql-hackers by date: