Thread: CLONE DATABASE (with copy on write?)

CLONE DATABASE (with copy on write?)

From

"Clark C. Evans"

Date:

12 November 2011, 17:47:42

Hello all!

Our company has some headaches in our application development
and deployment process.  The chief problem is, "creating stages",
which to this audience is, cloning a database efficiently,
making and testing a few changes, perhaps recording the
differences between databases, and then dropping the database.

I'm eternally grateful for someone who pointed out that we
should be using "CREATE DATABASE ... WITH TEMPLATE".  However,
this has two big disadvantages.  First, it only works if you
can kick the users off the clone.  Secondly, it still takes
time, uses disk space, etc.  We have some big databases.

I was also thinking about using ZFS with PostgreSQL to do
some sort of copy-on-write.  However, this would require me
to spawn a whole *new* PostgreSQL instance.  In both of these
cases, you lose your cache...

So, I was wondering... could PostgreSQL grow the ability to
"CLONE" a database by re-using existing file system blocks,
sharing them across databases?  This would perhaps be fast,
keep the shared memory cache relevant for both the old copy
and the clone, and remove WAL overhead.  Then, if the block
has to be edited, it'd be cloned in memory, and the clone
would be flushed.

I'm just imagining a world where "deploying" a new version
of our software that makes very small catalog changes and
tweaks a few rows would be... quick.  Quick to try, check,
test, and even deploy on live servers.

Best,

Clark

Re: CLONE DATABASE (with copy on write?)

From

Igor Polishchuk

Date:

12 November 2011, 22:49:57

I'm in a similar position, cloning a database multiple times to provide
development and qa databases for multiple groups. A feature desired by
Clark would greatly help many people with their development and qa
databases.
On the other hand, I imagine,  a feature like this would not be easy to
develop, and it is useful for many but not all the users.
So I doubt we can have it any time soon.


On Sat, 2011-11-12 at 16:40 -0500, Clark C. Evans wrote:
> Hello all!
>
> Our company has some headaches in our application development
> and deployment process.  The chief problem is, "creating stages",
> which to this audience is, cloning a database efficiently,
> making and testing a few changes, perhaps recording the
> differences between databases, and then dropping the database.
>
> I'm eternally grateful for someone who pointed out that we
> should be using "CREATE DATABASE ... WITH TEMPLATE".  However,
> this has two big disadvantages.  First, it only works if you
> can kick the users off the clone.  Secondly, it still takes
> time, uses disk space, etc.  We have some big databases.
>
> I was also thinking about using ZFS with PostgreSQL to do
> some sort of copy-on-write.  However, this would require me
> to spawn a whole *new* PostgreSQL instance.  In both of these
> cases, you lose your cache...
>
> So, I was wondering... could PostgreSQL grow the ability to
> "CLONE" a database by re-using existing file system blocks,
> sharing them across databases?  This would perhaps be fast,
> keep the shared memory cache relevant for both the old copy
> and the clone, and remove WAL overhead.  Then, if the block
> has to be edited, it'd be cloned in memory, and the clone
> would be flushed.
>
> I'm just imagining a world where "deploying" a new version
> of our software that makes very small catalog changes and
> tweaks a few rows would be... quick.  Quick to try, check,
> test, and even deploy on live servers.
>
> Best,
>
> Clark
>

Re: CLONE DATABASE (with copy on write?)

From

Simon Riggs

Date:

13 November 2011, 03:33:55

On Sat, Nov 12, 2011 at 9:40 PM, Clark C. Evans <cce@clarkevans.com> wrote:

> Our company has some headaches in our application development
> and deployment process.  The chief problem is, "creating stages",
> which to this audience is, cloning a database efficiently,
> making and testing a few changes, perhaps recording the
> differences between databases, and then dropping the database.
>
> I'm eternally grateful for someone who pointed out that we
> should be using "CREATE DATABASE ... WITH TEMPLATE".  However,
> this has two big disadvantages.  First, it only works if you
> can kick the users off the clone.  Secondly, it still takes
> time, uses disk space, etc.  We have some big databases.

An interesting proposal. Thanks for taking the time to raise this.

The existing situation is that you need to either

1) quiesce the database so it can be copied locally

2) take a hot backup to create a clone on another server


(1) currently involves disconnection. Would a command to quiesce
sessions without disconnection be useful? We could get sessions to
sleep until woken after the copy. With large databases we would still
need to copy while sessions sleep to ensure a consistent database
after the copy.

Is (2) a problem for you? In what way?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: CLONE DATABASE (with copy on write?)

From

Gregg Jaskiewicz

Date:

13 November 2011, 09:26:43

NVM the implementation, but ability to clone the database without
disconnects would be very good for backups and testing.
We also create loads of templates, so that would make it more practical.

Re: CLONE DATABASE (with copy on write?)

From

"Clark C. Evans"

Date:

13 November 2011, 11:07:47

On Sunday, November 13, 2011 7:33 AM, "Simon Riggs"
<simon@2ndQuadrant.com> wrote:
> On Sat, Nov 12, 2011 at 9:40 PM, Clark C. Evans <cce@clarkevans.com>
> > [We] should be using "CREATE DATABASE ... WITH TEMPLATE".
> > However, this has two big disadvantages.  First, it only works
> > if you can kick the users off the clone.  Secondly, it still
> > takes time, uses disk space, etc.  We have some big databases.
>
> An interesting proposal. Thanks for taking the time to raise this.

Thank you for responding Simon.

> The existing situation is that you need to either:
> 1) quiesce the database so it can be copied locally
> 2) take a hot backup to create a clone on another server
>
> (1) currently involves disconnection. Would a command to quiesce
> sessions without disconnection be useful? We could get sessions to
> sleep until woken after the copy. With large databases we would still
> need to copy while sessions sleep to ensure a consistent database
> after the copy.

Could their be a way to put the database in "read only" mode,
where it rejects all attempts to change database state with an
appropriate application level error message?  We could then
update our application to behave appropriately while the copy
is being performed.   Something like this could be broadly
useful in other contexts as well, for example, having a replica
that you brought up for reporting purposes.

Even so, the CREATE DATABASE... WITH TEMPLATE still has a set of
additional issues with it.  It ties up the hard drive with activity
and then extra space while it duplicates data.  Further, it causes
the shared memory cache to be split between the original and the
replica, this causes both databases to be much slower.  Finally,
it creates a ton of WAL traffic (perhaps we could suspend this?)

> Is (2) a problem for you? In what way?

Due to our configuration, yes.  Being able to CLONE the
database in the same cluster is much preferred.  Our user
configuration, deliberately, does not involve hot backups.

Hot backups to another server won't work for us since our
servers are encrypted and isolated behind client firewalls.
Data that leaves the box has to be encrypted where the
decrypt key is only available upon hardware failure, etc.
Our upstream pipe isn't huge... which is why the WAL traffic
for backups is also problematic.

Perhaps we could create two PostgreSQL clusters on each server.
One of them would be production, the other would be for staging.
This involves some logistics...  the advantage of this approach
is that we could limit resource usage on the slave and turn off
backups on it, reducing our disk usage and WAL traffic.   We'd
keep shared memory on the slave to a minimum.   This solution
still chews up 2x disk space and doubles the disk activity.

Could "WITH TEMPLATE" reach into another cluster's storage?

Best,

Clark

Re: [SPAM?]: Re: CLONE DATABASE (with copy on write?)

From

Simon Riggs

Date:

13 November 2011, 11:33:59

On Sun, Nov 13, 2011 at 3:07 PM, Clark C. Evans <cce@clarkevans.com> wrote:

> Could their be a way to put the database in "read only" mode,
> where it rejects all attempts to change database state with an
> appropriate application level error message?  We could then
> update our application to behave appropriately while the copy
> is being performed.   Something like this could be broadly
> useful in other contexts as well, for example, having a replica
> that you brought up for reporting purposes.
>
> Even so, the CREATE DATABASE... WITH TEMPLATE still has a set of
> additional issues with it.  It ties up the hard drive with activity
> and then extra space while it duplicates data.  Further, it causes
> the shared memory cache to be split between the original and the
> replica, this causes both databases to be much slower.  Finally,
> it creates a ton of WAL traffic (perhaps we could suspend this?)

It would be possible to suspend writes to a particular database and
then copy the database without writing WAL. It's probably possible to
wait for all write transactions to complete first.

Yes, it would use up disk space and shared_buffers to cache the new db.

Allowing writes to continue while we copy is more complex.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: CLONE DATABASE (with copy on write?)

From

Tom Lane

Date:

13 November 2011, 12:45:17

"Clark C. Evans" <cce@clarkevans.com> writes:
> Even so, the CREATE DATABASE... WITH TEMPLATE still has a set of
> additional issues with it.  It ties up the hard drive with activity
> and then extra space while it duplicates data.  Further, it causes
> the shared memory cache to be split between the original and the
> replica, this causes both databases to be much slower.  Finally,
> it creates a ton of WAL traffic (perhaps we could suspend this?)

That last claim is false ...

            regards, tom lane

Re: CLONE DATABASE (with copy on write?)

From

"Clark C. Evans"

Date:

16 November 2011, 11:23:07

After this discussion and the spin-off discussion on the
hacker list, I wanted to summarize my understanding.

So, what I asked for is relatively inexpensive way to make
copies of an existing database for staging, upgrade tests,
and other activities.  There are two approaches to this
sort of replication (besides dump/restore).

1. "COPY DATABASE ... WITH TEMPLATE" does a disk level
   copy of an existing database in the same cluster.

   This approach is an order of magnitude faster than
   a dump/load cycle.  Not only isn't there a dump/load
   and intermediate result, but write ahead log segments
   are efficiently handled (Tom, thank you this correction).

   This approach has three downsides: (a) users must be
   booted off the system, (b) you duplicate storage,
   and (c) shared memory of the cluster is split and
   cache state has to be re-learned on the replica.

   Simon suggested that it may be possible to find a
   solution for the exclusive access requirement; a way
   to quiesce sessions without disconnection.

2. Use WALS to have a hot backup of the cluster; you
   setup a cluster replica and then detach it.

   This approach solves the quiesce problem via
   replication, so you don't have to boot users off the
   system.  It also doesn't muck with the shared memory
   cache state of your production source database since
   you're making a copy to another PostgreSQL instance.

   However, it has a few disadvantages: (a) you have to
   copy the entire cluster, (b) you must create and
   maintain another PostgreSQL instance.

In a hackers thread, Thom proposed "detach/attach"
feature so that you could move a database from one
cluster to another.  This would be particularly useful,
but it looks quite infeasible since you'd have to touch
every disk block to rewrite the transaction IDs.  This
feature was requested by Yang Zhang on April 6th as well.

Best,

Clark

Re: CLONE DATABASE (with copy on write?)

From

Jerry Sievers

Date:

18 November 2011, 17:14:46

"Clark C. Evans" <cce@clarkevans.com> writes:

> Hello all!
>
> Our company has some headaches in our application development
> and deployment process.  The chief problem is, "creating stages",
> which to this audience is, cloning a database efficiently,
> making and testing a few changes, perhaps recording the
> differences between databases, and then dropping the database.
>
> I'm eternally grateful for someone who pointed out that we
> should be using "CREATE DATABASE ... WITH TEMPLATE".  However,
> this has two big disadvantages.  First, it only works if you
> can kick the users off the clone.  Secondly, it still takes
> time, uses disk space, etc.  We have some big databases.
>
> I was also thinking about using ZFS with PostgreSQL to do
> some sort of copy-on-write.  However, this would require me
> to spawn a whole *new* PostgreSQL instance.  In both of these
> cases, you lose your cache...
>
> So, I was wondering... could PostgreSQL grow the ability to
> "CLONE" a database by re-using existing file system blocks,
> sharing them across databases?  This would perhaps be fast,
> keep the shared memory cache relevant for both the old copy
> and the clone, and remove WAL overhead.  Then, if the block
> has to be edited, it'd be cloned in memory, and the clone
> would be flushed.
>
> I'm just imagining a world where "deploying" a new version
> of our software that makes very small catalog changes and
> tweaks a few rows would be... quick.  Quick to try, check,
> test, and even deploy on live servers.

Clark;

What you are describing can be done easily at the cluster level but
without any way to copy the live buffer cache state using any number
of storage array and/or file system solutions which implement
copy-on-write snapshots.

If being done on a shared storage array, you have the option to expose
the new snap to another host and run the clone instance there.  NOt
possible if on local storage with just FS support for snapping.

Taking the snap itself involves putting the source cluster in backup
mode same as you would  normally for making an image for later PITR.
The only difference is the snapping takes  just a few seconds.  Firing
up the clone involves recovering same also as for PITR.

Yeah; very cool if Pg itself could support such a thing at the level
of individual DB but I shudder to think what all might ve involved.

The last company I worked at had a hierarchy of snaps in some cases a
few layers deep.  We did whatever needed to the clones and then
dropped the snaps when done with them.  Physical disk use by the
clones is determined by what amount of churn the disk blocks
experience after the snapping, usually very little and thus, not much
physical storage was required.

"Thin Provisioning" is the buzzword that applies to this technology
which can be found on various EMCs, 3PARs and probably NetAp devices
and others.

HTH
> Best,
>
> Clark
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 305.321.1144