Thread: CLONE DATABASE (with copy on write?)
Hello all! Our company has some headaches in our application development and deployment process. The chief problem is, "creating stages", which to this audience is, cloning a database efficiently, making and testing a few changes, perhaps recording the differences between databases, and then dropping the database. I'm eternally grateful for someone who pointed out that we should be using "CREATE DATABASE ... WITH TEMPLATE". However, this has two big disadvantages. First, it only works if you can kick the users off the clone. Secondly, it still takes time, uses disk space, etc. We have some big databases. I was also thinking about using ZFS with PostgreSQL to do some sort of copy-on-write. However, this would require me to spawn a whole *new* PostgreSQL instance. In both of these cases, you lose your cache... So, I was wondering... could PostgreSQL grow the ability to "CLONE" a database by re-using existing file system blocks, sharing them across databases? This would perhaps be fast, keep the shared memory cache relevant for both the old copy and the clone, and remove WAL overhead. Then, if the block has to be edited, it'd be cloned in memory, and the clone would be flushed. I'm just imagining a world where "deploying" a new version of our software that makes very small catalog changes and tweaks a few rows would be... quick. Quick to try, check, test, and even deploy on live servers. Best, Clark
I'm in a similar position, cloning a database multiple times to provide development and qa databases for multiple groups. A feature desired by Clark would greatly help many people with their development and qa databases. On the other hand, I imagine, a feature like this would not be easy to develop, and it is useful for many but not all the users. So I doubt we can have it any time soon. On Sat, 2011-11-12 at 16:40 -0500, Clark C. Evans wrote: > Hello all! > > Our company has some headaches in our application development > and deployment process. The chief problem is, "creating stages", > which to this audience is, cloning a database efficiently, > making and testing a few changes, perhaps recording the > differences between databases, and then dropping the database. > > I'm eternally grateful for someone who pointed out that we > should be using "CREATE DATABASE ... WITH TEMPLATE". However, > this has two big disadvantages. First, it only works if you > can kick the users off the clone. Secondly, it still takes > time, uses disk space, etc. We have some big databases. > > I was also thinking about using ZFS with PostgreSQL to do > some sort of copy-on-write. However, this would require me > to spawn a whole *new* PostgreSQL instance. In both of these > cases, you lose your cache... > > So, I was wondering... could PostgreSQL grow the ability to > "CLONE" a database by re-using existing file system blocks, > sharing them across databases? This would perhaps be fast, > keep the shared memory cache relevant for both the old copy > and the clone, and remove WAL overhead. Then, if the block > has to be edited, it'd be cloned in memory, and the clone > would be flushed. > > I'm just imagining a world where "deploying" a new version > of our software that makes very small catalog changes and > tweaks a few rows would be... quick. Quick to try, check, > test, and even deploy on live servers. > > Best, > > Clark >
On Sat, Nov 12, 2011 at 9:40 PM, Clark C. Evans <cce@clarkevans.com> wrote: > Our company has some headaches in our application development > and deployment process. The chief problem is, "creating stages", > which to this audience is, cloning a database efficiently, > making and testing a few changes, perhaps recording the > differences between databases, and then dropping the database. > > I'm eternally grateful for someone who pointed out that we > should be using "CREATE DATABASE ... WITH TEMPLATE". However, > this has two big disadvantages. First, it only works if you > can kick the users off the clone. Secondly, it still takes > time, uses disk space, etc. We have some big databases. An interesting proposal. Thanks for taking the time to raise this. The existing situation is that you need to either 1) quiesce the database so it can be copied locally 2) take a hot backup to create a clone on another server (1) currently involves disconnection. Would a command to quiesce sessions without disconnection be useful? We could get sessions to sleep until woken after the copy. With large databases we would still need to copy while sessions sleep to ensure a consistent database after the copy. Is (2) a problem for you? In what way? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
NVM the implementation, but ability to clone the database without disconnects would be very good for backups and testing. We also create loads of templates, so that would make it more practical.
On Sunday, November 13, 2011 7:33 AM, "Simon Riggs" <simon@2ndQuadrant.com> wrote: > On Sat, Nov 12, 2011 at 9:40 PM, Clark C. Evans <cce@clarkevans.com> > > [We] should be using "CREATE DATABASE ... WITH TEMPLATE". > > However, this has two big disadvantages. First, it only works > > if you can kick the users off the clone. Secondly, it still > > takes time, uses disk space, etc. We have some big databases. > > An interesting proposal. Thanks for taking the time to raise this. Thank you for responding Simon. > The existing situation is that you need to either: > 1) quiesce the database so it can be copied locally > 2) take a hot backup to create a clone on another server > > (1) currently involves disconnection. Would a command to quiesce > sessions without disconnection be useful? We could get sessions to > sleep until woken after the copy. With large databases we would still > need to copy while sessions sleep to ensure a consistent database > after the copy. Could their be a way to put the database in "read only" mode, where it rejects all attempts to change database state with an appropriate application level error message? We could then update our application to behave appropriately while the copy is being performed. Something like this could be broadly useful in other contexts as well, for example, having a replica that you brought up for reporting purposes. Even so, the CREATE DATABASE... WITH TEMPLATE still has a set of additional issues with it. It ties up the hard drive with activity and then extra space while it duplicates data. Further, it causes the shared memory cache to be split between the original and the replica, this causes both databases to be much slower. Finally, it creates a ton of WAL traffic (perhaps we could suspend this?) > Is (2) a problem for you? In what way? Due to our configuration, yes. Being able to CLONE the database in the same cluster is much preferred. Our user configuration, deliberately, does not involve hot backups. Hot backups to another server won't work for us since our servers are encrypted and isolated behind client firewalls. Data that leaves the box has to be encrypted where the decrypt key is only available upon hardware failure, etc. Our upstream pipe isn't huge... which is why the WAL traffic for backups is also problematic. Perhaps we could create two PostgreSQL clusters on each server. One of them would be production, the other would be for staging. This involves some logistics... the advantage of this approach is that we could limit resource usage on the slave and turn off backups on it, reducing our disk usage and WAL traffic. We'd keep shared memory on the slave to a minimum. This solution still chews up 2x disk space and doubles the disk activity. Could "WITH TEMPLATE" reach into another cluster's storage? Best, Clark
On Sun, Nov 13, 2011 at 3:07 PM, Clark C. Evans <cce@clarkevans.com> wrote: > Could their be a way to put the database in "read only" mode, > where it rejects all attempts to change database state with an > appropriate application level error message? We could then > update our application to behave appropriately while the copy > is being performed. Something like this could be broadly > useful in other contexts as well, for example, having a replica > that you brought up for reporting purposes. > > Even so, the CREATE DATABASE... WITH TEMPLATE still has a set of > additional issues with it. It ties up the hard drive with activity > and then extra space while it duplicates data. Further, it causes > the shared memory cache to be split between the original and the > replica, this causes both databases to be much slower. Finally, > it creates a ton of WAL traffic (perhaps we could suspend this?) It would be possible to suspend writes to a particular database and then copy the database without writing WAL. It's probably possible to wait for all write transactions to complete first. Yes, it would use up disk space and shared_buffers to cache the new db. Allowing writes to continue while we copy is more complex. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
"Clark C. Evans" <cce@clarkevans.com> writes: > Even so, the CREATE DATABASE... WITH TEMPLATE still has a set of > additional issues with it. It ties up the hard drive with activity > and then extra space while it duplicates data. Further, it causes > the shared memory cache to be split between the original and the > replica, this causes both databases to be much slower. Finally, > it creates a ton of WAL traffic (perhaps we could suspend this?) That last claim is false ... regards, tom lane
After this discussion and the spin-off discussion on the hacker list, I wanted to summarize my understanding. So, what I asked for is relatively inexpensive way to make copies of an existing database for staging, upgrade tests, and other activities. There are two approaches to this sort of replication (besides dump/restore). 1. "COPY DATABASE ... WITH TEMPLATE" does a disk level copy of an existing database in the same cluster. This approach is an order of magnitude faster than a dump/load cycle. Not only isn't there a dump/load and intermediate result, but write ahead log segments are efficiently handled (Tom, thank you this correction). This approach has three downsides: (a) users must be booted off the system, (b) you duplicate storage, and (c) shared memory of the cluster is split and cache state has to be re-learned on the replica. Simon suggested that it may be possible to find a solution for the exclusive access requirement; a way to quiesce sessions without disconnection. 2. Use WALS to have a hot backup of the cluster; you setup a cluster replica and then detach it. This approach solves the quiesce problem via replication, so you don't have to boot users off the system. It also doesn't muck with the shared memory cache state of your production source database since you're making a copy to another PostgreSQL instance. However, it has a few disadvantages: (a) you have to copy the entire cluster, (b) you must create and maintain another PostgreSQL instance. In a hackers thread, Thom proposed "detach/attach" feature so that you could move a database from one cluster to another. This would be particularly useful, but it looks quite infeasible since you'd have to touch every disk block to rewrite the transaction IDs. This feature was requested by Yang Zhang on April 6th as well. Best, Clark
"Clark C. Evans" <cce@clarkevans.com> writes: > Hello all! > > Our company has some headaches in our application development > and deployment process. The chief problem is, "creating stages", > which to this audience is, cloning a database efficiently, > making and testing a few changes, perhaps recording the > differences between databases, and then dropping the database. > > I'm eternally grateful for someone who pointed out that we > should be using "CREATE DATABASE ... WITH TEMPLATE". However, > this has two big disadvantages. First, it only works if you > can kick the users off the clone. Secondly, it still takes > time, uses disk space, etc. We have some big databases. > > I was also thinking about using ZFS with PostgreSQL to do > some sort of copy-on-write. However, this would require me > to spawn a whole *new* PostgreSQL instance. In both of these > cases, you lose your cache... > > So, I was wondering... could PostgreSQL grow the ability to > "CLONE" a database by re-using existing file system blocks, > sharing them across databases? This would perhaps be fast, > keep the shared memory cache relevant for both the old copy > and the clone, and remove WAL overhead. Then, if the block > has to be edited, it'd be cloned in memory, and the clone > would be flushed. > > I'm just imagining a world where "deploying" a new version > of our software that makes very small catalog changes and > tweaks a few rows would be... quick. Quick to try, check, > test, and even deploy on live servers. Clark; What you are describing can be done easily at the cluster level but without any way to copy the live buffer cache state using any number of storage array and/or file system solutions which implement copy-on-write snapshots. If being done on a shared storage array, you have the option to expose the new snap to another host and run the clone instance there. NOt possible if on local storage with just FS support for snapping. Taking the snap itself involves putting the source cluster in backup mode same as you would normally for making an image for later PITR. The only difference is the snapping takes just a few seconds. Firing up the clone involves recovering same also as for PITR. Yeah; very cool if Pg itself could support such a thing at the level of individual DB but I shudder to think what all might ve involved. The last company I worked at had a hierarchy of snaps in some cases a few layers deep. We did whatever needed to the clones and then dropped the snaps when done with them. Physical disk use by the clones is determined by what amount of churn the disk blocks experience after the snapping, usually very little and thus, not much physical storage was required. "Thin Provisioning" is the buzzword that applies to this technology which can be found on various EMCs, 3PARs and probably NetAp devices and others. HTH > Best, > > Clark > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 305.321.1144