Re: Backup strategies - Mailing list pgsql-performance

From Peter Childs
Subject Re: Backup strategies
Date
Msg-id a2de01dd0810150722i6f9895b0u1731683b8bce8dc2@mail.gmail.com
Whole thread Raw
In response to Re: Backup strategies  (Ivan Voras <ivoras@freebsd.org>)
List pgsql-performance
2008/10/15 Ivan Voras <ivoras@freebsd.org>:
> Matthew Wakeling wrote:
>> On Wed, 15 Oct 2008, Ivan Voras wrote:
>>>> Nope. Even files in data directory change. That's why the documentation
>>>> warns against tools that emit errors for files that change during the
>>>> copy.
>>>
>>> Ok, thanks. This is a bit off-topic, but if it's not how I imagine it,
>>> then how is it implemented?
>>
>> The files may change, but it doesn't matter, because there is enough
>> information in the xlog to correct it all.
>
> I'm thinking about these paragraphs in the documentation:
>
> """
> Be certain that your backup dump includes all of the files underneath
> the database cluster directory (e.g., /usr/local/pgsql/data). If you are
> using tablespaces that do not reside underneath this directory, be
> careful to include them as well (and be sure that your backup dump
> archives symbolic links as links, otherwise the restore will mess up
> your tablespaces).
>
> You can, however, omit from the backup dump the files within the
> pg_xlog/ subdirectory of the cluster directory. This slight complication
> is worthwhile because it reduces the risk of mistakes when restoring.
> This is easy to arrange if pg_xlog/ is a symbolic link pointing to
> someplace outside the cluster directory, which is a common setup anyway
> for performance reasons.
> """
>
> So, pg_start_backup() freezes the data at the time it's called but still
> data and xlog are changed, in a different way that's safe to backup? Why
> not run with pg_start_backup() always enabled?
>

Because nothing would get vacuumed and your data would just grow and grow.

Your data is held at the point in time when you typed pg_start_backup
so when you restore your data is back at that point. If you need to go
forward you need the xlog. (hence point in time backup....)

This is all part of the mvcc feature that PostgreSQL has.

PostgreSQL never delete anything until nothing can read it anymore, So
if you vacuum during a backup it will only delete stuff that was
finished with before the backup started.

If you don't do a pg_start_backup first you don't have this promise
that vacuum will not remove somthing you need. (Oh I think checkpoints
might come into this as well but I'm not sure how)

Or at least thats my understanding...

So if your base backup takes a while I would advise running vacuum
afterwards. But then if your running autovacuum there is probably very
little need to worry.

Peter Childs

pgsql-performance by date:

Previous
From: Ivan Voras
Date:
Subject: Re: Backup strategies
Next
From: "Ivan Voras"
Date:
Subject: Re: Backup strategies