Re: [PERFORM] Backup taking long time !!! - Mailing list pgsql-performance

From Vladimir Borodin
Subject Re: [PERFORM] Backup taking long time !!!
Date
Msg-id 7696A57C-C871-4A45-9141-FFD5F75CCB65@simply.name
Whole thread Raw
In response to [PERFORM] Backup taking long time !!!  (Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com>)
Responses Re: [PERFORM] Backup taking long time !!!
Re: [PERFORM] Backup taking long time !!!
List pgsql-performance

20 янв. 2017 г., в 16:40, Stephen Frost <sfrost@snowman.net> написал(а):

Vladimir,

Increments in pgbackrest are done on file level which is not really efficient. We have done parallelism, compression and page-level increments (9.3+) in barman fork [1], but unfortunately guys from 2ndquadrant-it don’t hurry to work on it.

We're looking at page-level incremental backup in pgbackrest also.  For
larger systems, we've not heard too much complaining about it being
file-based though, which is why it hasn't been a priority.  Of course,
the OP is on 9.1 too, so.

Well, we have forked barman and made everything from the above just because we needed ~ 2 PB of disk space for storing backups for our ~ 300 TB of data. (Our recovery window is 7 days) And on 5 TB database it took a lot of time to make/restore a backup.


As for your fork, well, I can't say I really blame the barman folks for
being cautious- that's usually a good thing in your backup software. :)

The reason seems to be not the caution but the lack of time for working on it. But yep, it took us half a year to deploy our fork everywhere. And it would take much more time if we didn’t have system for checking backups consistency.


I'm curious how you're handling compressed page-level incremental
backups though.  I looked through barman-incr and it wasn't obvious to
me what was going wrt how the incrementals are stored, are they ending
up as sparse files, or are you actually copying/overwriting the prior
file in the backup repository?

No, we do store each file in the following way. At the beginning you write a map of changed pages. At second you write changed pages themselves. The compression is streaming so you don’t need much memory for that but the downside of this approach is that you read each datafile twice (we believe in page cache here).

 Apologies, python isn't my first
language, but the lack of any comment anywhere in that file doesn't
really help.

Not a problem. Actually, it would be much easier to understand if it was a series of commits rather than one commit that we do ammend and force-push after each rebase on vanilla barman. We should add comments.

--
May the force be with you…

pgsql-performance by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [PERFORM] Backup taking long time !!!
Next
From: Stephen Frost
Date:
Subject: Re: [PERFORM] Backup taking long time !!!