Re: Differential backup - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Differential backup
Date
Msg-id 4BD81BE70200002500030FEA@gw.wicourts.gov
Whole thread Raw
In response to Re: Differential backup  (Hannu Krosing <hannu@2ndquadrant.com>)
List pgsql-hackers
Hannu Krosing <hannu@2ndquadrant.com> wrote:
> On Tue, 2010-04-27 at 17:28 +0200, Csaba Nagy wrote:
>> One use case we would have is to dump only the changes from the
>> last backup of a single table. This table takes 30% of the DB
>> disk space, it is in the order of ~400GB, and it's only inserted,
>> never updated, then after ~1 year the old entries are archived.
>> There's ~10M new entries daily in this table. If the backup would
>> be smart enough to only read the changed blocks (in this case
>> only for newly inserted records), it would be a fairly big win...
That is covered pretty effectively in PITR-style backups with the
hard link and rsync approach cited earlier in the thread.  Those 1GB
table segment files which haven't changed aren't read or written,
and only those portions of the other files which have actually
changed are sent over the wire (although the entire disk file is
written on the receiving end).
> The standard trick for this kind of table is having this table
> partitioned by insertion date
That doesn't always work.  In our situation the supreme court sets
records retention rules which can be quite complex, but usually key
on *final disposition* of a case rather than insertion date; that
is, the earliest date on which the data related to a case is
*allowed* to be deleted isn't known until weeks or years after
insertion.  Additionally, it is the elected clerk of court in each
county who determines when and if data for that county will be
purged once it has reached the minimum retention threshold set by
supreme court rules.
That's not to say that partitioning couldn't help with some backup
strategies; just that it doesn't solve all "insert-only" (with
eventual purge) use cases.  One of the nicest things about
PostgreSQL is the availability of several easy and viable backup
strategies, so that you can tailor one to fit your environment.
-Kevin


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: pg_start_backup and pg_stop_backup Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct
Next
From: Tom Lane
Date:
Subject: Re: Add column if not exists (CINE)