Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date
Msg-id 544E4863.2080407@vmware.com
Whole thread Raw
In response to Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT  ("Tomas Vondra" <tv@fuzzy.cz>)
Responses Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
List pgsql-hackers
On 10/27/2014 03:21 PM, Tomas Vondra wrote:
> Dne 27 Říjen 2014, 10:47, Heikki Linnakangas napsal(a):
>> On 10/26/2014 11:47 PM, Tomas Vondra wrote:
>>> After eyeballing the code for an hour or two, I think CREATE DATABASE
>>> should be fine with performing only a 'partial checkpoint' on the
>>> template database - calling FlushDatabaseBuffers and processing unlink
>>> requests, as suggested by the comment in createdb().
>>
>> Hmm. You could replace the first checkpoint with that, but I don't think
>> that's enough for the second. To get any significant performance
>> benefit, you need to get rid of both checkpoints, because doing two
>> checkpoints one after another is almost as fast as doing a single
>> checkpoint; the second checkpoint has very little work to do because the
>> first checkpoint already flushed out everything.
>>
>> The second checkpoint, after copying but before commit, is done because
>> (from the comments in createdb function):
>>
>>>   * #1: When PITR is off, we don't XLOG the contents of newly created
>>>   * indexes; therefore the drop-and-recreate-whole-directory behavior
>>>   * of DBASE_CREATE replay would lose such indexes.
>>>
>>>   * #2: Since we have to recopy the source database during DBASE_CREATE
>>>   * replay, we run the risk of copying changes in it that were
>>>   * committed after the original CREATE DATABASE command but before the
>>>   * system crash that led to the replay.  This is at least unexpected
>>>   * and at worst could lead to inconsistencies, eg duplicate table
>>>   * names.
>>
>> Doing only FlushDatabaseBuffers would not prevent these issues - you
>> need a full checkpoint. These issues are better explained here:
>> http://www.postgresql.org/message-id/28884.1119727671@sss.pgh.pa.us
>
> Thinking about this a bit more, do we really need a full checkpoint? That
> is a checkpoint of all the databases in the cluster? Why checkpointing the
> source database is not enough?
>
> I mean, when we use database A as a template, why do we need to checkpoint
> B, C, D and F too? (Apologies if this is somehow obvious, I'm way out of
> my comfort zone in this part of the code.)

A full checkpoint ensures that you always begin recovery *after* the 
DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during 
crash recovery (except when you crash before the transaction commits, in 
which case it doesn't matter if the new database's directory is borked).

- Heikki




pgsql-hackers by date:

Previous
From: "Tomas Vondra"
Date:
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Next
From: Tom Lane
Date:
Subject: Re: Missing FIN_CRC32 calls in logical replication code