Thread: Need solution for weekly database "snapshot"

Need solution for weekly database "snapshot"

From
Moshe Jacobson
Date:
We have a report in our system that is pulled many times each week, but needs to be based off data from Sunday at noon. So every Sunday at noon, we back up our main database and restore it into a new reporting snapshot database.

The problem is that backing up and restoring the database every week is a very costly operation that can take many hours. It brings the system to its knees.

Unfortunately the report touches a lot of the database, so it would be unreasonable to try to add versioning to the data the report pulls.

Is there an easier way of maintaining a snapshot of our database that can be accessed throughout the week?

Thanks for any suggestion.

--
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | 
www.neadwerx.com

"Quality is not an act, it is a habit." -- Aristotle

Re: Need solution for weekly database "snapshot"

From
Karsten Hilbert
Date:
On Mon, Apr 22, 2013 at 11:58:58AM -0400, Moshe Jacobson wrote:

> We have a report in our system that is pulled many times each week, but
> needs to be based off data from Sunday at noon. So every Sunday at noon, we
> back up our main database and restore it into a new reporting snapshot
> database.
>
> The problem is that backing up and restoring the database every week is a
> very costly operation that can take many hours. It brings the system to its
> knees.

Setup a slave.

Sunday at noon let main DB reach a consistent state, "stop"
it, let slave catch up, detach slave Sunday at noon, restart
main, backup from slave, re-attach slave.

Similar things can be done with things like Btrfs at the FS level.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Need solution for weekly database "snapshot"

From
Moshe Jacobson
Date:
On Mon, Apr 22, 2013 at 12:14 PM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
> We have a report in our system that is pulled many times each week, but
> needs to be based off data from Sunday at noon. So every Sunday at noon, we
> back up our main database and restore it into a new reporting snapshot
> database.

Setup a slave.
 
Not bad, but the transaction logs would fill up the file system. Besides, it would not be worth it to set up a whole database cluster just for this purpose.

Any alternatives?

-- 
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | 
www.neadwerx.com

"Quality is not an act, it is a habit." -- Aristotle

Re: Need solution for weekly database "snapshot"

From
Christophe Pettus
Date:
On Apr 22, 2013, at 10:05 AM, Moshe Jacobson wrote:
> Not bad, but the transaction logs would fill up the file system.

I'm not sure I understand that comment.  Why would the transaction logs be particularly voluminous in this case?

> Besides, it would not be worth it to set up a whole database cluster just for this purpose.

It does seem to meet all of your needs in a very efficient way; setting up a PG cluster is not that complex.

--
-- Christophe Pettus
   xof@thebuild.com



Re: Need solution for weekly database "snapshot"

From
John R Pierce
Date:
On 4/22/2013 10:05 AM, Moshe Jacobson wrote:
Setup a slave.
 
Not bad, but the transaction logs would fill up the file system. Besides, it would not be worth it to set up a whole database cluster just for this purpose.

Any alternatives?

if your file system supports it (ZFS, for instance), you /could/ take a volume snapshot/clone, and then mount that and run a separate instance of PostgreSQL on it, using a different port number.  when done with your reports, you'd drop that instance and snapshot.

we run a separate server (hardware and database) for our reporting systems as their workload is heavier than the realtime transaction system.


-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Need solution for weekly database "snapshot"

From
Moshe Jacobson
Date:

On Mon, Apr 22, 2013 at 1:41 PM, Christophe Pettus <xof@thebuild.com> wrote:
> Not bad, but the transaction logs would fill up the file system.

I'm not sure I understand that comment.  Why would the transaction logs be particularly voluminous in this case?

I assumed the logs would be shipping to the slave and accumulating if the replication stopped. Is that not the case? 
Is it possible for the slave to pause replication indefinitely and pick up where it left off without requiring huge volumes of transaction logs?

> Besides, it would not be worth it to set up a whole database cluster just for this purpose.

It does seem to meet all of your needs in a very efficient way; setting up a PG cluster is not that complex.

We don't have enough disk space to create a whole new copy of the database cluster. Until now we have been restoring from a pg_dump backup that does not include all of the audit logs.

Thanks!

--
Moshe Jacobson
Nead Werx, Inc. | Manager of Systems Engineering
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | 
www.neadwerx.com

"Quality is not an act, it is a habit." -- Aristotle

Re: Need solution for weekly database "snapshot"

From
Karsten Hilbert
Date:
On Mon, Apr 22, 2013 at 03:16:19PM -0400, Moshe Jacobson wrote:

> > > Not bad, but the transaction logs would fill up the file system.
> >
> > I'm not sure I understand that comment.  Why would the transaction logs be
> > particularly voluminous in this case?
> >
>
> I assumed the logs would be shipping to the slave and accumulating if the
> replication stopped. Is that not the case?

That IS the case but you wouldn't do that. You would only
create the slave as needed, say, Saturday night or whatever
is practical, then let it catch up.

THEN on Sunday at noon you detach the slave and *create your
backup from the detached slave* (which does not affect your
main DB). That backup can be used to restore into a static
"snapshot" database. Then you can remove the slave (or let
it replicate until next week).

> We don't have enough disk space to create a whole new copy
> of the database cluster.

The slave can sit on another machine.

Size or Speed. Pick either.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Need solution for weekly database "snapshot"

From
Adrian Klaver
Date:
On 04/22/2013 12:16 PM, Moshe Jacobson wrote:
>
> On Mon, Apr 22, 2013 at 1:41 PM, Christophe Pettus <xof@thebuild.com
> <mailto:xof@thebuild.com>> wrote:
>
>      > Not bad, but the transaction logs would fill up the file system.
>
>     I'm not sure I understand that comment.  Why would the transaction
>     logs be particularly voluminous in this case?
>
>
> I assumed the logs would be shipping to the slave and accumulating if
> the replication stopped. Is that not the case?
> Is it possible for the slave to pause replication indefinitely and pick
> up where it left off without requiring huge volumes of transaction logs?
>
>> Besides, it would not be worth it to set up a whole database cluster just for this purpose.
>
>     It does seem to meet all of your needs in a very efficient way;
>     setting up a PG cluster is not that complex.
>
>
> We don't have enough disk space to create a whole new copy of the
> database cluster. Until now we have been restoring from a pg_dump backup
> that does not include all of the audit logs.

Tools that allow more selectivity, in no particular order:

Bucardo http://bucardo.org/wiki/Bucardo

Slony http://slony.info/

Londiste http://skytools.projects.pgfoundry.org/doc/londiste.ref.html

>
> Thanks!
>
> --
> Moshe Jacobson
> Nead Werx, Inc. | Manager of Systems Engineering
> 2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
> moshe@neadwerx.com <mailto:moshe@neadwerx.com> | www.neadwerx.com
> <http://www.neadwerx.com/>
>
> "Quality is not an act, it is a habit." -- Aristotle


--
Adrian Klaver
adrian.klaver@gmail.com


Re: Need solution for weekly database "snapshot"

From
Jeff Janes
Date:
On Monday, April 22, 2013, Moshe Jacobson wrote:
We have a report in our system that is pulled many times each week, but needs to be based off data from Sunday at noon. So every Sunday at noon, we back up our main database and restore it into a new reporting snapshot database.

The problem is that backing up and restoring the database every week is a very costly operation that can take many hours. It brings the system to its knees.

How do you do normal backups, for disaster recovery purposes?  Could you merge that with your snapshotting needs to reduce the workload on your main server from two to one?

Cheers,

Jeff