Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT |
Date | |
Msg-id | 544ECDD7.1040500@dunslane.net Whole thread Raw |
In response to | Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT (Tomas Vondra <tv@fuzzy.cz>) |
Responses |
Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
|
List | pgsql-hackers |
On 10/27/2014 05:58 PM, Tomas Vondra wrote: > On 27.10.2014 17:24, Heikki Linnakangas wrote: >> On 10/27/2014 03:46 PM, Tom Lane wrote: >>> Heikki Linnakangas <hlinnakangas@vmware.com> writes: >>>> On 10/27/2014 03:21 PM, Tomas Vondra wrote: >>>>> Thinking about this a bit more, do we really need a full checkpoint? >>>>> That >>>>> is a checkpoint of all the databases in the cluster? Why >>>>> checkpointing the >>>>> source database is not enough? >>>> A full checkpoint ensures that you always begin recovery *after* the >>>> DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during >>>> crash recovery (except when you crash before the transaction commits, in >>>> which case it doesn't matter if the new database's directory is borked). >>> Yeah. After re-reading the 2005 thread, I wonder if we shouldn't just >>> bite the bullet and redesign CREATE DATABASE as you suggest, ie, WAL-log >>> all the copied files instead of doing a "cp -r"-equivalent directory >>> copy. >>> That would fix a number of existing replay hazards as well as making it >>> safe to do what Tomas wants. In the small scale this would cause more >>> I/O >>> (2 copies of the template database's data) but in production situations >>> we might well come out ahead by avoiding a forced checkpoint of the rest >>> of the cluster. Also I guess we could skip WAL-logging if WAL archiving >>> is off, similarly to the existing optimization for CREATE INDEX etc. >> That would be a nasty surprise for anyone who's using CREATE DATABASE >> as a fast way to clone a large database. But I would be OK with that, >> at least if we can skip the WAL-logging with wal_level=minimal. > That's true. Sadly, I can't think of a solution that would address both > use cases at the same time :-( > > The only thing I can think of is having two CREATE DATABASE "flavors". > One keeping the current approach (suitable for fast cloning) and one > with the WAL logging (minimizing the CREATE DATABASE duration the impact > on other backends). > > It will probably make the code significantly more complex, which is not > exactly desirable, I guess. Also, if we keep the current code (even if > only as a special case) it won't eliminate the existing replay hazards > (which was one of the Tom's arguments for biting the bullet). > > I'm also thinking that for wal_level=archive and large databases, this > won't really eliminate the checkpoint as it will likely generate enough > WAL to hit checkpoint_segments and trigger a checkpoint anyway. No? > > That being said, our CREATE DATABASE docs currently say this > > Although it is possible to copy a database other than template1 by > specifying its name as the template, this is not (yet) intended as > a general-purpose "COPY DATABASE" facility. The principal > limitation is that no other sessions can be connected to the > template database while it is being copied. CREATE DATABASE will > fail if any other connection exists when it starts; otherwise, new > connections to the template database are locked out until CREATE > DATABASE completes. See Section 21.3 for more information. > > I think that this limitation pretty much means no one should use CREATE > DATABASE for cloning live databases in production environment (because > of the locking). > > It also seems to me the "general-purpose COPY DATABASE" described in the > docs is what we're describing in this thread. > Notwithstanding what the docs say, I have seen CREATE DATABASE used plenty of times, and quite effectively, to clone databases. I don't think making it do twice the IO in the general case is going to go down well. cheers andrew
pgsql-hackers by date: