Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date | |
Msg-id | aea01367-d988-fcfa-5224-bbc91839d5d6@iki.fi Whole thread Raw |
In response to | [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
List | pgsql-hackers |
On 15/06/2021 14:20, Dilip Kumar wrote: > Design Idea: > ----------------- > First, create the target database directory along with the version > file and WAL-log this operation. Create the "relation map file" in > the target database and copy the content from the source database. For > this, we can use some modified versions of the write_relmap_file() and > WAL-log the relmap create operation along with the file content. Now, > read the relmap file to find the relfilenode for pg_class and then we > read pg_class block by block and decode the tuples. For reading the > pg_class blocks, we can use ReadBufferWithoutRelCache() so that we > don't need the relcache. Nothing prevents us from checking visibility > for tuples in another database because CLOG is global to the cluster. > And nothing prevents us from deforming those tuples because the column > definitions for pg_class have to be the same in every database. Then > we can get the relfilenode of every file we need to copy, and prepare > a list of all such relfilenode. I guess that would work, but you could also walk the database directory like copydir() does. How you find the relations to copy is orthogonal to whether you WAL-log them or use checkpoints. And whether you use the buffer cache is also orthogonal to the rest of the proposal; you could issue FlushDatabaseBuffers() instead of a checkpoint. > Next, for each relfilenode in the > source database, create a respective relfilenode in the target > database (for all forks) using smgrcreate, which is already a > WAL-logged operation. Now read the source relfilenode block by block > using ReadBufferWithoutRelCache() and copy the block to the target > relfilenode using smgrextend() and WAL-log them using log_newpage(). > For the source database, we can not directly use the smgrread(), > because there could be some dirty buffers so we will have to read them > through the buffer manager interface, otherwise, we will have to flush > all the dirty buffers. Yeah, WAL-logging the contents of the source database would certainly be less weird than the current system. As Julien also pointed out, the question is, are there people using on "CREATE DATABASE foo TEMPLATE bar" to copy a large source database, on the premise that it's fast because it skips WAL-logging? In principle, we could have both mechanisms, and use the new WAL-logged system if the database is small, and the old system with checkpoints if it's large. But I don't like idea of having to maintain both. - Heikki
pgsql-hackers by date: