Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id CAFiTN-vPp8czS_dBFoVCZ+nd2B6rUGTSyXOZwuMSPnMBKYExfQ@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
List pgsql-hackers
On Wed, Mar 23, 2022 at 5:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 23, 2022 at 4:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > Okay this is an interesting point.  So one option is that in case of
> > failure while using the wal log strategy we do not remove the database
> > directory, because an abort transaction will take care of removing the
> > relation file.  But then in failure case we will leave the orphaned
> > database directory with version file and the relmap file.  Another
> > option is to do the redundant cleanup as we are doing now.  Any other
> > options?
>
> I think our overriding goal should be to get everything using one
> mechanism. It doesn't look straightforward to get everything to go
> through the PendingRelDelete mechanism, because as you say, it can't
> handle non-relation files or directories. However, what if we opt out
> of that mechanism? We could do that either by not using
> RelationCreateStorage() in the first place and directly calling
> smgrcreate(), or by using RelationPreserveStorage() afterwards to yank
> the file back out of the list.

I think directly using smgrcreate() is a better idea instead of first
registering and then unregistering it.   I have made that change in
the attached patch.  After this change now we can merge creating the
MAIN_FORKNUM also in the loop below where we are creating other
fork[1] with one extra condition but I think current code is in more
sync with the other code where we are doing the similar things so I
have not merged it in the loop.  Please let me know if you think
otherwise.

[1]
+    /*
+     * Create and copy all forks of the relation.  We are not using
+     * RelationCreateStorage() as it is registering the cleanup for the
+     * underlying relation storage on the transaction abort.  But during create
+     * database failure, we have a separate cleanup mechanism for the whole
+     * database directory. Therefore, we don't need to register cleanup for
+     * each individual relation storage.
+     */
+    smgrcreate(RelationGetSmgr(dst_rel), MAIN_FORKNUM, false);
+    if (permanent)
+        log_smgrcreate(&dst_rnode, MAIN_FORKNUM);
+
+    /* copy main fork. */
+    RelationCopyStorageUsingBuffer(src_rel, dst_rel, MAIN_FORKNUM, permanent);
+
+    /* copy those extra forks that exist */
+    for (ForkNumber forkNum = MAIN_FORKNUM + 1;
+         forkNum <= MAX_FORKNUM; forkNum++)
+    {
+        if (smgrexists(RelationGetSmgr(src_rel), forkNum))
+        {
+            smgrcreate(RelationGetSmgr(dst_rel), forkNum, false);
+

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: make MaxBackends available in _PG_init
Next
From: Robert Haas
Date:
Subject: Re: refactoring basebackup.c (zstd workers)