Re: Proposal: Incremental Backup - Mailing list pgsql-hackers

From Gabriele Bartolini
Subject Re: Proposal: Incremental Backup
Date
Msg-id CAHNtfO6vgvoG9Zip774V7GqhNi7rsWUWEvYMDRFicODJJPpdBA@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Incremental Backup  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Proposal: Incremental Backup
List pgsql-hackers
Hi guys,
 sorry if I jump in the middle of the conversation. I have been
reading with much interest all that's been said above. However, the
goal of this patch is to give users another possibility while
performing backups. Especially when large databases are in use.
  I really like the proposal of working on a block level incremental
backup feature and the idea of considering LSN. However, I'd suggest
to see block level as a second step and a goal to keep in mind while
working on the first step. I believe that file-level incremental
backup will bring a lot of benefits to our community and users anyway.
 I base this sentence on our daily experience. We have to honour (and
the duty) to manage - probably - some of the largest Postgres
databases in the world. We currently rely on rsync to copy database
pages. Performing a full backup in 2 days instead of 9 days completely
changes disaster recovery policies in a company. Or even 2 hours
instead of 6.

My 2 cents,
Gabriele
--Gabriele Bartolini - 2ndQuadrant ItaliaPostgreSQL Training, Services and Supportgabriele.bartolini@2ndQuadrant.it |
www.2ndQuadrant.it


2014-08-01 19:05 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
> On Fri, Aug 1, 2014 at 1:43 PM, desmodemone <desmodemone@gmail.com> wrote:
>>
>>
>>
>> 2014-08-01 18:20 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
>>
>>> On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila <amit.kapila16@gmail.com>
>>> wrote:
>>> >> c) the map is not crash safe by design, because it needs only for
>>> >> incremental backup to track what blocks needs to be backuped, not for
>>> >> consistency or recovery of the whole cluster, so it's not an heavy cost
>>> >> for
>>> >> the whole cluster to maintain it. we could think an option (but it's
>>> >> heavy)
>>> >> to write it at every flush  on file to have crash-safe map, but I not
>>> >> think
>>> >> it's so usefull . I think it's acceptable, and probably it's better to
>>> >> force
>>> >> that, to say: "if your db will crash, you need a fullbackup ",
>>> >
>>> > I am not sure if your this assumption is right/acceptable, how can
>>> > we say that in such a case users will be okay to have a fullbackup?
>>> > In general, taking fullbackup is very heavy operation and we should
>>> > try to avoid such a situation.
>>>
>>>
>>> Besides, the one taking the backup (ie: script) may not be aware of
>>> the need to take a full one.
>>>
>>> It's a bad design to allow broken backups at all, IMNSHO.
>>
>>
>> Hi Claudio,
>>                  thanks for your observation
>> First: the case it's after a crash of a database, and it's not something
>> happens every day or every week. It's something that happens in rare
>> conditions, or almost my experience is so. If it happens very often probably
>> there are other problems.
>
> Not so much. In this case, the software design isn't software-crash
> safe, it's not that it's not hardware-crash safe.
>
> What I mean, is that an in-memory bitmap will also be out of sync if
> you kill -9 (or if one of the backends is killed by the OOM), or if it
> runs out of disk space too.
>
> Normally, a simple restart fixes it because pg will do crash recovery
> just fine, but now the bitmap is out of sync, and further backups are
> broken. It's not a situation I want to face unless there's a huge
> reason to go for such design.
>
> If you make it so that the commit includes flipping the bitmap, it can
> be done cleverly enough to avoid too much overhead (though it will
> have some), and you now have it so that any to-be-touched block is now
> part of the backup. You just apply all the bitmap changes in batch
> after a checkpoint, before syncing to disk, and before erasing the WAL
> segments. Simple, relatively efficient, and far more robust than an
> in-memory thing.
>
> Still, it *can* double checkpoint I/O on the worst case, and it's not
> an unfathomable case either.
>
>> Second: to avoid the problem to know if the db needed to have a full backup
>> to rebuild the map we could think to write in the map header the backup
>> reference (with an id and LSN reference for example ) so  if the
>> someone/something try to do an incremental backup after a crash, the map
>> header will not have noone full backup listed [because it will be empty] ,
>> and automaticcaly switch to a full one. I think after a crash it's a good
>> practice to do a full backup, to see if there are some problems on files or
>> on filesystems, but if I am wrong I am happy to know :) .
>
> After a crash I do not do a backup, I do a verification of the data
> (VACUUM and some data consistency checks usually), lest you have a
> useless backup. The backup goes after that.
>
> But, I'm not DBA guru.
>
>> Remember that I propose a map in ram to reduce the impact on performances,
>> but we could create an option to leave the choose to the user, if you want a
>> crash safe map, at every flush will be updated also a map file , if not, the
>> map will be in ram.
>
> I think the performance impact of a WAL-linked map isn't so big as to
> prefer the possibility of broken backups. I wouldn't even allow it.
>
> It's not free, making it crash safe, but it's not that expensive
> either. If you want to support incremental backups, you really really
> need to make sure those backups are correct and usable, and IMV
> anything short of full crash safety will be too fragile for that
> purpose. I don't want to be in a position of needing the backup and
> finding out it's inconsistent after the fact, and I don't want to
> encourage people to set themselves up for that by adding that "faster
> but unsafe backups" flag.
>
> I'd do it either safe, or not at all.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Use unique index for longer pathkeys.
Next
From: Christoph Berg
Date:
Subject: Re: Is analyze_new_cluster.sh still useful?