Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date
Msg-id 429fe6fc1aa1d3a91804af032bbf1b5d.squirrel@2.emaily.eu
Whole thread Raw
In response to Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
List pgsql-hackers
Dne 27 Říjen 2014, 10:47, Heikki Linnakangas napsal(a):
> On 10/26/2014 11:47 PM, Tomas Vondra wrote:
>> After eyeballing the code for an hour or two, I think CREATE DATABASE
>> should be fine with performing only a 'partial checkpoint' on the
>> template database - calling FlushDatabaseBuffers and processing unlink
>> requests, as suggested by the comment in createdb().
>
> Hmm. You could replace the first checkpoint with that, but I don't think
> that's enough for the second. To get any significant performance
> benefit, you need to get rid of both checkpoints, because doing two
> checkpoints one after another is almost as fast as doing a single
> checkpoint; the second checkpoint has very little work to do because the
> first checkpoint already flushed out everything.
>
> The second checkpoint, after copying but before commit, is done because
> (from the comments in createdb function):
>
>>  * #1: When PITR is off, we don't XLOG the contents of newly created
>>  * indexes; therefore the drop-and-recreate-whole-directory behavior
>>  * of DBASE_CREATE replay would lose such indexes.
>>
>>  * #2: Since we have to recopy the source database during DBASE_CREATE
>>  * replay, we run the risk of copying changes in it that were
>>  * committed after the original CREATE DATABASE command but before the
>>  * system crash that led to the replay.  This is at least unexpected
>>  * and at worst could lead to inconsistencies, eg duplicate table
>>  * names.
>
> Doing only FlushDatabaseBuffers would not prevent these issues - you
> need a full checkpoint. These issues are better explained here:
> http://www.postgresql.org/message-id/28884.1119727671@sss.pgh.pa.us

Thinking about this a bit more, do we really need a full checkpoint? That
is a checkpoint of all the databases in the cluster? Why checkpointing the
source database is not enough?

I mean, when we use database A as a template, why do we need to checkpoint
B, C, D and F too? (Apologies if this is somehow obvious, I'm way out of
my comfort zone in this part of the code.)

regards
Tomas




pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Heikki Linnakangas
Date:
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT