Basic Question on Point In Time Recovery - Mailing list pgsql-general

From Robert Inder
Subject Basic Question on Point In Time Recovery
Date
Msg-id CAKqjJm9ADC7yTxdiU+YphNtL9zHXBB06+nWeYGUQtAnu9iaCYw@mail.gmail.com
Whole thread Raw
Responses Re: Basic Question on Point In Time Recovery  (Andrew Sullivan <ajs@crankycanuck.ca>)
Re: Basic Question on Point In Time Recovery  (Stéphane Schildknecht<stephane.schildknecht@postgres.fr>)
Re: Basic Question on Point In Time Recovery  (Francisco Olarte <folarte@peoplecall.com>)
Re: Basic Question on Point In Time Recovery  (Robert Inder <robert@interactive.co.uk>)
List pgsql-general
We are developing a new software system which is now used by a number
of independent clients for gathering and storing live data as part of
their day to day work.

We have a number of clients sharing a single server.  It is running
one Postgres service, and each client is a separate user with access
to their own database.  Each client's database will contain "hundreds
of thousands" of records, and will be supporting occasional queries by
a small number of users.   So the system is currently running on
"modest" hardware.

To guard against the server failing, we have a standby server being
updated by WAL files, so if the worst comes to the worst we'll only
lose "a few minutes" work.  No problems there.

But, at least while the system is under rapid development, we also
want to have a way to roll a particular client's database back to a
(recent) "known good" state, but without affecting any other client.

My understanding is that the WAL files mechanism is installation-wide
-- it will affect all clients alike.

So to allow us to restore data for an individual client, we're running
"pg_dump" once an hour on each database in turn.  In the event of a
problem with one client's system, we can restore just that one
database, without affecting any other client.

The problem is that we're finding that as the number of clients grows,
and with it the amount of data, pg_dump is becoming more intrusive.
Our perception is that when pg_dump is running for any database,
performance on all databases is reduced.  I'm guessing this is because
the dump is making heavy use of the disk.

There is obviously scope for improving performance by getting using
more, or more powerful, hardware.  That's obviously going to be
necessary at some point, but it is obviously an expense that our
client would like to defer as long as possible.

So before we go down that route, I'd like to check that we're not
doing something dopey.

Is our current "frequent pg_dump" approach a sensible way to go about
things.  Or are we missing something?  Is there some other way to
restore one database without affecting the others?

Thanks in advance.

Robert.

--
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
                                           Interactions speak louder than words


pgsql-general by date:

Previous
From: "Deole, Pushkar (Pushkar)"
Date:
Subject: Asynchronous replication in postgresql
Next
From: Andrew Sullivan
Date:
Subject: Re: Basic Question on Point In Time Recovery