Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT |
Date | |
Msg-id | 544EC007.40702@fuzzy.cz Whole thread Raw |
In response to | Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT |
List | pgsql-hackers |
On 27.10.2014 17:24, Heikki Linnakangas wrote: > On 10/27/2014 03:46 PM, Tom Lane wrote: >> Heikki Linnakangas <hlinnakangas@vmware.com> writes: >>> On 10/27/2014 03:21 PM, Tomas Vondra wrote: >>>> Thinking about this a bit more, do we really need a full checkpoint? >>>> That >>>> is a checkpoint of all the databases in the cluster? Why >>>> checkpointing the >>>> source database is not enough? >> >>> A full checkpoint ensures that you always begin recovery *after* the >>> DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during >>> crash recovery (except when you crash before the transaction commits, in >>> which case it doesn't matter if the new database's directory is borked). >> >> Yeah. After re-reading the 2005 thread, I wonder if we shouldn't just >> bite the bullet and redesign CREATE DATABASE as you suggest, ie, WAL-log >> all the copied files instead of doing a "cp -r"-equivalent directory >> copy. >> That would fix a number of existing replay hazards as well as making it >> safe to do what Tomas wants. In the small scale this would cause more >> I/O >> (2 copies of the template database's data) but in production situations >> we might well come out ahead by avoiding a forced checkpoint of the rest >> of the cluster. Also I guess we could skip WAL-logging if WAL archiving >> is off, similarly to the existing optimization for CREATE INDEX etc. > > That would be a nasty surprise for anyone who's using CREATE DATABASE > as a fast way to clone a large database. But I would be OK with that, > at least if we can skip the WAL-logging with wal_level=minimal. That's true. Sadly, I can't think of a solution that would address both use cases at the same time :-( The only thing I can think of is having two CREATE DATABASE "flavors". One keeping the current approach (suitable for fast cloning) and one with the WAL logging (minimizing the CREATE DATABASE duration the impact on other backends). It will probably make the code significantly more complex, which is not exactly desirable, I guess. Also, if we keep the current code (even if only as a special case) it won't eliminate the existing replay hazards (which was one of the Tom's arguments for biting the bullet). I'm also thinking that for wal_level=archive and large databases, this won't really eliminate the checkpoint as it will likely generate enough WAL to hit checkpoint_segments and trigger a checkpoint anyway. No? That being said, our CREATE DATABASE docs currently say this Although it is possible to copy a database other than template1 by specifying its name as the template, this is not(yet) intended as a general-purpose "COPY DATABASE" facility. The principal limitation is that no other sessions canbe connected to the template database while it is being copied. CREATE DATABASE will fail if any other connectionexists when it starts; otherwise, new connections to the template database are locked out until CREATE DATABASEcompletes. See Section 21.3 for more information. I think that this limitation pretty much means no one should use CREATE DATABASE for cloning live databases in production environment (because of the locking). It also seems to me the "general-purpose COPY DATABASE" described in the docs is what we're describing in this thread. regards Tomas
pgsql-hackers by date: