Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id CA+TgmobqqvB_AotL2Bbi5xzF53wa7ERCSVy=o=xa2xWdoDGfsw@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
List pgsql-hackers
On Fri, Mar 11, 2022 at 5:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Changes, 1) it take Robert's patch as first refactoring patch 2)
> Rebase other new relmapper apis on top of that in 0002 3) Some code
> refactoring in main patch 0005 and also one problem fix, earlier in
> wal log method I have removed ForceSyncCommit(), but IMHO that is
> equally valid whether we use file_copy or wal_log because in both
> cases we are creating the disk files.  4) Support strategy in createdb
> tool and add test case as part of 0006.

I think there's something wrong with what this patch is doing with the
XLOG records. It adds XLOG_DBASE_CREATEDIR, but the only new
XLogInsert() calls in the patch are passing XLOG_DBASE_CREATE, and no
existing references are adjusted. Similarly with xl_dbase_create_rec
and xl_dbase_createdir_rec. Why would we introduce a new record type
and not use it?

Let's not call the functions for the different strategies
CopyDatabase() and CopyDatabaseWithWal() but rather something that
matches up to the strategy names e.g. create_database_using_wal_log()
and create_database_using_file_copy(). There's something a little
funny about the names wal_log and file_copy ... they're not quite
parallel gramatically. But it's probably OK.

The changes to createdb_failure_params make me a little nervous. I
think we'd be in real trouble if we failed before completing both
DropDatabaseBuffers() and ForgetDatabaseSyncRequests(). However, it
looks to me like those are both intended to be no-fail operations, so
I don't see an actual hazard here. But, hmm, what about on the
recovery side? Suppose that we start copying the database block by
block and then either (a) the standby is promoted before the copy is
finished or (b) the copy fails. Now the standby has data in
shared_buffers for a database that does not exist. If that's not bad,
then why does createdb_failure_params need to DropDatabaseBuffers()?
But I bet it is bad. I wonder if we should be using
RelationCopyStorage() rather than this new function
RelationCopyStorageUsingBuffer(). That would avoid having the buffers
in shared_buffers, dodging the problem. But then we have a problem
with checkpoint interlocking: we could begin replay from a checkpoint
and find that the pages that were supposed to get copied prior to the
checkpoint were actually not copied, because the checkpoint record
could be written after we've logged a page being copied and before we
actually write the page. Or, we could crash after writing a whole lot
of pages and a checkpoint record, but before RelationCopyStorage()
fsyncs the destination fork. It doesn't seem advisable to hold off
checkpoints for the time it takes to copy an entire relation fork, so
the solution is apparently to keep the data in shared buffers after
all. But that brings us right back to square one. Have you thought
through this whole problem carefully? It seems like a total mess to me
at the moment, but maybe I'm missing something.

There seems to be no reason to specify specific values for the members
of enum CreateDBStrategy.

I think the naming of some of the new functions might need work, in
particular GetRelInfoFromTuple, GetRelListFromPage, and
GetDatabaseRelationList. The names seem kind of generic for what
they're doing. Maybe ScanSourceDatabasePgClass,
ScanSourceDatabasePgClassPage, ScanSourceDatabasePgClassTuple?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Proposal for internal Numeric to Uint64 conversion function.
Next
From: Robert Haas
Date:
Subject: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats