Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date
Msg-id 20210616215804.nhc7bpkuoscfrv52@alap3.anarazel.de
Whole thread Raw
In response to [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
List pgsql-hackers
Hi,

On 2021-06-15 16:50:24 +0530, Dilip Kumar wrote:
> The patch modifies both CREATE DATABASE and ALTER DATABASE..SET
> TABLESPACE to be fully WAL-logged.

Generally quite a bit in favor of this - the current approach is very
heavyweight, slow and I think we have a few open corner bugs related to
it.


> Design Idea:
> -----------------
> First, create the target database directory along with the version
> file and WAL-log this operation.

What happens if you crash / promote at this point?


> Create the "relation map file" in the target database and copy the
> content from the source database. For this, we can use some modified
> versions of the write_relmap_file() and WAL-log the relmap create
> operation along with the file content.  Now, read the relmap file to
> find the relfilenode for pg_class and then we read pg_class block by
> block and decode the tuples.

This doesn't seem like a great approach - you're not going to be able to
use much of the normal infrastructure around processing tuples. So it
seems like it'd end up with quite a bit of special case code that needs
to maintained in parallel.


> Now read the source relfilenode block by block using
> ReadBufferWithoutRelCache() and copy the block to the target
> relfilenode using smgrextend() and WAL-log them using log_newpage().
> For the source database, we can not directly use the smgrread(),
> because there could be some dirty buffers so we will have to read them
> through the buffer manager interface, otherwise, we will have to flush
> all the dirty buffers.

I think we might need a bit more batching for the WAL logging. There are
cases of template database considerably bigger than the default and the
overhead of logging each write separately seems likely to be noticable.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: postgres_fdw batching vs. (re)creating the tuple slots
Next
From: John Naylor
Date:
Subject: Re: a path towards replacing GEQO with something better