On 10/27/2014 03:21 PM, Tomas Vondra wrote:
> Dne 27 Říjen 2014, 10:47, Heikki Linnakangas napsal(a):
>> On 10/26/2014 11:47 PM, Tomas Vondra wrote:
>>> After eyeballing the code for an hour or two, I think CREATE DATABASE
>>> should be fine with performing only a 'partial checkpoint' on the
>>> template database - calling FlushDatabaseBuffers and processing unlink
>>> requests, as suggested by the comment in createdb().
>>
>> Hmm. You could replace the first checkpoint with that, but I don't think
>> that's enough for the second. To get any significant performance
>> benefit, you need to get rid of both checkpoints, because doing two
>> checkpoints one after another is almost as fast as doing a single
>> checkpoint; the second checkpoint has very little work to do because the
>> first checkpoint already flushed out everything.
>>
>> The second checkpoint, after copying but before commit, is done because
>> (from the comments in createdb function):
>>
>>> * #1: When PITR is off, we don't XLOG the contents of newly created
>>> * indexes; therefore the drop-and-recreate-whole-directory behavior
>>> * of DBASE_CREATE replay would lose such indexes.
>>>
>>> * #2: Since we have to recopy the source database during DBASE_CREATE
>>> * replay, we run the risk of copying changes in it that were
>>> * committed after the original CREATE DATABASE command but before the
>>> * system crash that led to the replay. This is at least unexpected
>>> * and at worst could lead to inconsistencies, eg duplicate table
>>> * names.
>>
>> Doing only FlushDatabaseBuffers would not prevent these issues - you
>> need a full checkpoint. These issues are better explained here:
>> http://www.postgresql.org/message-id/28884.1119727671@sss.pgh.pa.us
>
> Thinking about this a bit more, do we really need a full checkpoint? That
> is a checkpoint of all the databases in the cluster? Why checkpointing the
> source database is not enough?
>
> I mean, when we use database A as a template, why do we need to checkpoint
> B, C, D and F too? (Apologies if this is somehow obvious, I'm way out of
> my comfort zone in this part of the code.)
A full checkpoint ensures that you always begin recovery *after* the
DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during
crash recovery (except when you crash before the transaction commits, in
which case it doesn't matter if the new database's directory is borked).
- Heikki