Re: pgBackRest for a 50 TB database - Mailing list pgsql-general

From Stephen Frost
Subject Re: pgBackRest for a 50 TB database
Date
Msg-id ZOtqNu9cI4zcohw7@tamriel.snowman.net
Whole thread Raw
In response to pgBackRest for a 50 TB database  (Abhishek Bhola <abhishek.bhola@japannext.co.jp>)
Responses Re: pgBackRest for a 50 TB database
List pgsql-general
Greetings,

* Abhishek Bhola (abhishek.bhola@japannext.co.jp) wrote:
> I am trying to use pgBackRest for all my Postgres servers. I have tested it
> on a sample database and it works fine. But my concern is for some of the
> bigger DB clusters, the largest one being 50TB and growing by about
> 200-300GB a day.

Glad pgBackRest has been working well for you.

> I plan to mount NAS storage on my DB server to store my backup. The server
> with 50 TB data is using DELL Storage underneath to store this data and has
> 36 18-core CPUs.

How much free CPU capacity does the system have?

> As I understand, pgBackRest recommends having 2 full backups and then
> having incremental or differential backups as per requirement. Does anyone
> have any reference numbers on how much time a backup for such a DB would
> usually take, just for reference. If I take a full backup every Sunday and
> then incremental backups for the rest of the week, I believe the
> incremental backups should not be a problem, but the full backup every
> Sunday might not finish in time.

pgBackRest scales extremely well- what's going to matter here is how
much you can give it in terms of resources.  The primary bottle necks
will be CPU time for compression, network bandwidth for the NAS, and
storage bandwidth of the NAS and the DB filesystems.  Typically, CPU
time dominates due to the compression, though if you're able to give
pgBackRest a lot of those CPUs then you might get to the point of
running out of network bandwidth or storage bandwidth on your NAS.
We've certainly seen folks pushing upwards of 3TB/hr, so a 50TB backup
should be able to complete in less than a day.  Strongly recommend
taking an incremental backup more-or-less immediately after the full
backup to minimize the amount of WAL you'd have to replay on a restore.
Also strongly recommend actually doing serious restore tests of this
system to make sure you understand the process, have an idea how long
it'll take to restore the actual files with pgBackRest and then how long
PG will take to come up and replay the WAL generated during the backup.

> I think converting a diff/incr backup to a full backup has been discussed
> here <https://github.com/pgbackrest/pgbackrest/issues/644>, but not yet
> implemented. If there is a workaround, please let me know. Or if someone is
> simply using pgBackRest for a bigger DB (comparable to 50TB), please share
> your experience with the exact numbers and config/schedule of backups. I
> know the easiest way would be to use it myself and find out, but since it
> is a PROD DB, I wanted to get some ideas before starting.

No, we haven't implemented that yet.  It's starting to come up higher in
our list of things we want to work on though.  There are risks to doing
such conversions though that have to be considered- it creates long
dependencies on things all working because if there's a PG or pgBackRest
bug or some way that corruption slipped in then that ends up getting
propagated down.  If you feel really confident that your restore testing
is good (full restore w/ PG replaying WAL, running amcheck across the
entire restored system, then pg_dump'ing everything and restoring it
into a new PG cluster to re-validate all constraints, doing additional
app-level review and testing...) then that can certainly help with
mitigation of the risks mentioned above.

Overall though, yes, people certainly use pgBackRest for 50TB+ PG
clusters.

Thanks,

Stephen

Attachment

pgsql-general by date:

Previous
From: Abhishek Bhola
Date:
Subject: pgBackRest for a 50 TB database
Next
From: Abhishek Bhola
Date:
Subject: Re: pgBackRest for a 50 TB database