Thread: Inherited an 18TB DB & need to backup

Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Hello All,
I have very recently inherited an 18 TB DB that is running version 9.2. Apparently this database has never been backed up and I have been tasked to set in a periodic backup routine (weekly full & daily incremental) and dump it into a NAS. What is the best way to go about this? Did some reading and hear that pgbackrest does a good job with such huge sizes. Your expert advise is needed. 

--
Cheers,

Suhail 
Cell#  +97150 8194870

Re: Inherited an 18TB DB & need to backup

From
Rory Campbell-Lange
Date:
On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> Hello All,
> I have very recently inherited an 18 TB DB that is running version 9.2.
> Apparently this database has never been backed up and I have been tasked to
> set in a periodic backup routine (weekly full & daily incremental) and dump
> it into a NAS. What is the best way to go about this? Did some reading and
> hear that pgbackrest does a good job with such huge sizes. Your expert
> advise is needed.

Incremental backups suggest the need to backup WAL archives. See
https://www.postgresql.org/docs/9.2/continuous-archiving.html

pgbackrest looks very cool but we haven't used it.

A very simple solution could be just to dump the database daily with
pg_dump, if you have the space and machine capacity to do it. Depending
on what you are storing, you can achieve good compression with this, and
it is a great way of having a simple file from which to restore a
database.

Our ~200GB cluster resolves to under 10GB of pg_dump files, although
18TB is a whole different order of size.

Rory



Re: Inherited an 18TB DB & need to backup

From
Christoph Berg
Date:
Re: Rory Campbell-Lange
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > Hello All,
> > I have very recently inherited an 18 TB DB that is running version 9.2.

Push hard to get that upgraded to a supported version.

Christoph



Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Hi Christoph
Thats very high on my agenda.. but need to make sure i can backup this beast to start with..

On Fri, 15 May 2020, 17:08 Christoph Berg, <christoph.berg@credativ.de> wrote:
Re: Rory Campbell-Lange
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > Hello All,
> > I have very recently inherited an 18 TB DB that is running version 9.2.

Push hard to get that upgraded to a supported version.

Christoph

Re: Inherited an 18TB DB & need to backup

From
Jan Karremans
Date:
Hi Suhail,

That is not an issue. We have customers backing up Postgres databases up to 80 TB.

Mit freundlichem Gruß, kind regards,




Jan Karremans
Director of Sales Engineering, EMEA
Senior Sales Engineer DACH-Region
EDB Postgres Advanced Server Professional
-- Postgres Everywhere --

 Oracle ACE Alumni 

- Deutsch ist nicht meine Muttersprache, bitte verzeihen Sie Verstöße gegen die Rechtschreibung -

Mobile: +31-(0)6-1638 9607

http://www.enterprisedb.com/

Don't walk behind me; I may not lead.
Don't walk in front of me; I may not follow.
Just walk beside me and be my friend.
+*+ Albert Camus +*+

Op 15 mei 2020, om 15:23 heeft Suhail Bamzena <suhailsalem@gmail.com> het volgende geschreven:

Hi Christoph
Thats very high on my agenda.. but need to make sure i can backup this beast to start with..

On Fri, 15 May 2020, 17:08 Christoph Berg, <christoph.berg@credativ.de> wrote:
Re: Rory Campbell-Lange
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > Hello All,
> > I have very recently inherited an 18 TB DB that is running version 9.2.

Push hard to get that upgraded to a supported version.

Christoph

Attachment

Re: Inherited an 18TB DB & need to backup

From
"Wolff, Ken L"
Date:

Depending on your storage subsystem, perhaps storage-level snapshots might be an option?  They often seem to be the best choice for VLDBs.

 

From: Suhail Bamzena <suhailsalem@gmail.com>
Sent: Friday, May 15, 2020 7:23 AM
To: Christoph Berg <christoph.berg@credativ.de>
Cc: Rory Campbell-Lange <rory@campbell-lange.net>; pgsql-general@lists.postgresql.org; pgeu-general@lists.postgresql.org
Subject: EXTERNAL: Re: Inherited an 18TB DB & need to backup

 

Hi Christoph

Thats very high on my agenda.. but need to make sure i can backup this beast to start with..

On Fri, 15 May 2020, 17:08 Christoph Berg, <christoph.berg@credativ.de> wrote:

Re: Rory Campbell-Lange
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > Hello All,
> > I have very recently inherited an 18 TB DB that is running version 9.2.

Push hard to get that upgraded to a supported version.

Christoph

Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Thanks Rory, the machine has the capacity to pull through pg_dumps but like u rightly mentioned incremental backups mean that we will need to work with the wal's.. 18TB is what is the scary part and with compression I dont see it being less than 2TB a day...

On Fri, 15 May 2020, 17:02 Rory Campbell-Lange, <rory@campbell-lange.net> wrote:
On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> Hello All,
> I have very recently inherited an 18 TB DB that is running version 9.2.
> Apparently this database has never been backed up and I have been tasked to
> set in a periodic backup routine (weekly full & daily incremental) and dump
> it into a NAS. What is the best way to go about this? Did some reading and
> hear that pgbackrest does a good job with such huge sizes. Your expert
> advise is needed.

Incremental backups suggest the need to backup WAL archives. See
https://www.postgresql.org/docs/9.2/continuous-archiving.html

pgbackrest looks very cool but we haven't used it.

A very simple solution could be just to dump the database daily with
pg_dump, if you have the space and machine capacity to do it. Depending
on what you are storing, you can achieve good compression with this, and
it is a great way of having a simple file from which to restore a
database.

Our ~200GB cluster resolves to under 10GB of pg_dump files, although
18TB is a whole different order of size.

Rory

Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Thanks Jan.. just to know more is it using the native pg_dump or another tool like pgbackrest or barman ??

On Fri, 15 May 2020, 17:26 Jan Karremans, <jan.karremans@enterprisedb.com> wrote:
Hi Suhail,

That is not an issue. We have customers backing up Postgres databases up to 80 TB.

Mit freundlichem Gruß, kind regards,




Jan Karremans
Director of Sales Engineering, EMEA
Senior Sales Engineer DACH-Region
EDB Postgres Advanced Server Professional
-- Postgres Everywhere --

 Oracle ACE Alumni 

- Deutsch ist nicht meine Muttersprache, bitte verzeihen Sie Verstöße gegen die Rechtschreibung -

Mobile: +31-(0)6-1638 9607

http://www.enterprisedb.com/

Don't walk behind me; I may not lead.
Don't walk in front of me; I may not follow.
Just walk beside me and be my friend.
+*+ Albert Camus +*+

Op 15 mei 2020, om 15:23 heeft Suhail Bamzena <suhailsalem@gmail.com> het volgende geschreven:

Hi Christoph
Thats very high on my agenda.. but need to make sure i can backup this beast to start with..

On Fri, 15 May 2020, 17:08 Christoph Berg, <christoph.berg@credativ.de> wrote:
Re: Rory Campbell-Lange
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > Hello All,
> > I have very recently inherited an 18 TB DB that is running version 9.2.

Push hard to get that upgraded to a supported version.

Christoph

Attachment

Re: Inherited an 18TB DB & need to backup

From
Jan Karremans
Date:
Hi Suhail,

I do not know personally, but I can figure out / get you setup with someone who can tell you more.

Mit freundlichem Gruß, kind regards,




Jan Karremans
Director of Sales Engineering, EMEA
Senior Sales Engineer DACH-Region
EDB Postgres Advanced Server Professional
-- Postgres Everywhere --

 Oracle ACE Alumni 

- Deutsch ist nicht meine Muttersprache, bitte verzeihen Sie Verstöße gegen die Rechtschreibung -

Mobile: +31-(0)6-1638 9607

http://www.enterprisedb.com/

Don't walk behind me; I may not lead.
Don't walk in front of me; I may not follow.
Just walk beside me and be my friend.
+*+ Albert Camus +*+

Op 15 mei 2020, om 15:31 heeft Suhail Bamzena <suhailsalem@gmail.com> het volgende geschreven:

Thanks Jan.. just to know more is it using the native pg_dump or another tool like pgbackrest or barman ??

On Fri, 15 May 2020, 17:26 Jan Karremans, <jan.karremans@enterprisedb.com> wrote:
Hi Suhail,

That is not an issue. We have customers backing up Postgres databases up to 80 TB.

Mit freundlichem Gruß, kind regards,




Jan Karremans
Director of Sales Engineering, EMEA
Senior Sales Engineer DACH-Region
EDB Postgres Advanced Server Professional
-- Postgres Everywhere --

 Oracle ACE Alumni 

- Deutsch ist nicht meine Muttersprache, bitte verzeihen Sie Verstöße gegen die Rechtschreibung -

Mobile: +31-(0)6-1638 9607

http://www.enterprisedb.com/

Don't walk behind me; I may not lead.
Don't walk in front of me; I may not follow.
Just walk beside me and be my friend.
+*+ Albert Camus +*+

Op 15 mei 2020, om 15:23 heeft Suhail Bamzena <suhailsalem@gmail.com> het volgende geschreven:

Hi Christoph
Thats very high on my agenda.. but need to make sure i can backup this beast to start with..

On Fri, 15 May 2020, 17:08 Christoph Berg, <christoph.berg@credativ.de> wrote:
Re: Rory Campbell-Lange
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > Hello All,
> > I have very recently inherited an 18 TB DB that is running version 9.2.

Push hard to get that upgraded to a supported version.

Christoph

<PastedGraphic-4.tiff><uc.png><uc.png><uc.png>

Attachment

Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
I can look into that.. not very sure abt the storage infra..

On Fri, 15 May 2020, 17:26 Wolff, Ken L, <ken.l.wolff@lmco.com> wrote:

Depending on your storage subsystem, perhaps storage-level snapshots might be an option?  They often seem to be the best choice for VLDBs.

 

From: Suhail Bamzena <suhailsalem@gmail.com>
Sent: Friday, May 15, 2020 7:23 AM
To: Christoph Berg <christoph.berg@credativ.de>
Cc: Rory Campbell-Lange <rory@campbell-lange.net>; pgsql-general@lists.postgresql.org; pgeu-general@lists.postgresql.org
Subject: EXTERNAL: Re: Inherited an 18TB DB & need to backup

 

Hi Christoph

Thats very high on my agenda.. but need to make sure i can backup this beast to start with..

On Fri, 15 May 2020, 17:08 Christoph Berg, <christoph.berg@credativ.de> wrote:

Re: Rory Campbell-Lange
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > Hello All,
> > I have very recently inherited an 18 TB DB that is running version 9.2.

Push hard to get that upgraded to a supported version.

Christoph

Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Thanks Jan.. would appreciate all the info I can get.

On Fri, 15 May 2020, 17:32 Jan Karremans, <jan.karremans@enterprisedb.com> wrote:
Hi Suhail,

I do not know personally, but I can figure out / get you setup with someone who can tell you more.

Mit freundlichem Gruß, kind regards,




Jan Karremans
Director of Sales Engineering, EMEA
Senior Sales Engineer DACH-Region
EDB Postgres Advanced Server Professional
-- Postgres Everywhere --

 Oracle ACE Alumni 

- Deutsch ist nicht meine Muttersprache, bitte verzeihen Sie Verstöße gegen die Rechtschreibung -

Mobile: +31-(0)6-1638 9607

http://www.enterprisedb.com/

Don't walk behind me; I may not lead.
Don't walk in front of me; I may not follow.
Just walk beside me and be my friend.
+*+ Albert Camus +*+

Op 15 mei 2020, om 15:31 heeft Suhail Bamzena <suhailsalem@gmail.com> het volgende geschreven:

Thanks Jan.. just to know more is it using the native pg_dump or another tool like pgbackrest or barman ??

On Fri, 15 May 2020, 17:26 Jan Karremans, <jan.karremans@enterprisedb.com> wrote:
Hi Suhail,

That is not an issue. We have customers backing up Postgres databases up to 80 TB.

Mit freundlichem Gruß, kind regards,




Jan Karremans
Director of Sales Engineering, EMEA
Senior Sales Engineer DACH-Region
EDB Postgres Advanced Server Professional
-- Postgres Everywhere --

 Oracle ACE Alumni 

- Deutsch ist nicht meine Muttersprache, bitte verzeihen Sie Verstöße gegen die Rechtschreibung -

Mobile: +31-(0)6-1638 9607

http://www.enterprisedb.com/

Don't walk behind me; I may not lead.
Don't walk in front of me; I may not follow.
Just walk beside me and be my friend.
+*+ Albert Camus +*+

Op 15 mei 2020, om 15:23 heeft Suhail Bamzena <suhailsalem@gmail.com> het volgende geschreven:

Hi Christoph
Thats very high on my agenda.. but need to make sure i can backup this beast to start with..

On Fri, 15 May 2020, 17:08 Christoph Berg, <christoph.berg@credativ.de> wrote:
Re: Rory Campbell-Lange
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > Hello All,
> > I have very recently inherited an 18 TB DB that is running version 9.2.

Push hard to get that upgraded to a supported version.

Christoph

<PastedGraphic-4.tiff><uc.png><uc.png><uc.png>

Attachment

Re: Inherited an 18TB DB & need to backup

From
Rory Campbell-Lange
Date:
On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> Thanks Rory, the machine has the capacity to pull through pg_dumps but like
> u rightly mentioned incremental backups mean that we will need to work with
> the wal's.. 18TB is what is the scary part and with compression I dont see
> it being less than 2TB a day...

I suggest you try one immediately, assuming you can dump it somewhere
and the machine has the resources. Then you will at least have a backup
and a sense of what is possible. (Which will help inform your upgrade
strategy too.)

Rory




Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Yeah Rory want to pull one asap..hopefully by COB tonight.. 

On Fri, 15 May 2020, 17:35 Rory Campbell-Lange, <rory@campbell-lange.net> wrote:
On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> Thanks Rory, the machine has the capacity to pull through pg_dumps but like
> u rightly mentioned incremental backups mean that we will need to work with
> the wal's.. 18TB is what is the scary part and with compression I dont see
> it being less than 2TB a day...

I suggest you try one immediately, assuming you can dump it somewhere
and the machine has the resources. Then you will at least have a backup
and a sense of what is possible. (Which will help inform your upgrade
strategy too.)

Rory

Re: Inherited an 18TB DB & need to backup

From
"Ravi Krishna"
Date:
IMO a database of this size should only be backed up in s3.  pgbackrest has support for backup to s3.

Re: Inherited an 18TB DB & need to backup

From
Gabriele Bartolini
Date:
Hi Suhail,

We developed Barman (www.pgbarman.org), which works with older versions of PostgreSQL too. It works with very large databases and is open source (GPL).

This is an old article in which I could publicly mention the size of the database:


More articles about Barman from our blog: https://www.2ndquadrant.com/en/blog/tag/barman/

Good luck.

Cheers,
Gabriele

Il giorno ven 15 mag 2020 alle ore 15:49 Ravi Krishna <srkrishna1@comcast.net> ha scritto:
IMO a database of this size should only be backed up in s3.  pgbackrest has support for backup to s3.

Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Thanks Gabriele... i will review this in detail. 

On Fri, 15 May 2020, 18:41 Gabriele Bartolini, <gabriele.bartolini@gmail.com> wrote:
Hi Suhail,

We developed Barman (www.pgbarman.org), which works with older versions of PostgreSQL too. It works with very large databases and is open source (GPL).

This is an old article in which I could publicly mention the size of the database:


More articles about Barman from our blog: https://www.2ndquadrant.com/en/blog/tag/barman/

Good luck.

Cheers,
Gabriele

Il giorno ven 15 mag 2020 alle ore 15:49 Ravi Krishna <srkrishna1@comcast.net> ha scritto:
IMO a database of this size should only be backed up in s3.  pgbackrest has support for backup to s3.

Re: Inherited an 18TB DB & need to backup

From
"Andreas 'ads' Scherbaum"
Date:


On Fri, May 15, 2020 at 3:49 PM Ravi Krishna <srkrishna1@comcast.net> wrote:
IMO a database of this size should only be backed up in s3.  pgbackrest has support for backup to s3.

Why should the backup land in S3, and not local somewhere?
Any good reason why one should pay for the additional storage and transfer costs?

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

Re: Inherited an 18TB DB & need to backup

From
"Ravi Krishna"
Date:

Why should the backup land in S3, and not local somewhere?
Any good reason why one should pay for the additional storage and transfer costs?

Good question. The key point in my statement was "db of this size".

The problem with local backup is that space is not infinite. If your business requires you to
store backups for say 7 years, storing it locally will be a problem.  In one large financial
company I use to work, full backup was used to store old data.
(except last 30 days where WAL logs were used for a real PIT).  We use to store full backups
for about 60 days and then send older backup to an off site storage.  Nothing is free.

I remember a case where we were requested by business to restore a db of a given date two yrs
prior as they had to look at old data. It took us close to 96 hrs to give the users the required database.

S3 storage is ridiculously cheap.  Off site storage companies like Iron Mountain should find their client base
ditching them big time.

Re: Inherited an 18TB DB & need to backup

From
"Andreas 'ads' Scherbaum"
Date:


On Fri, May 15, 2020 at 7:52 PM Ravi Krishna <srkrishna1@comcast.net> wrote:

Why should the backup land in S3, and not local somewhere?
Any good reason why one should pay for the additional storage and transfer costs?

Good question. The key point in my statement was "db of this size".

The problem with local backup is that space is not infinite. If your business requires you to
store backups for say 7 years, storing it locally will be a problem.  In one large financial
company I use to work, full backup was used to store old data.
(except last 30 days where WAL logs were used for a real PIT).  We use to store full backups
for about 60 days and then send older backup to an off site storage.  Nothing is free.

I remember a case where we were requested by business to restore a db of a given date two yrs
prior as they had to look at old data. It took us close to 96 hrs to give the users the required database.

S3 storage is ridiculously cheap.  Off site storage companies like Iron Mountain should find their client base
ditching them big time.

If your database is running somewhere in the cloud, then yes, that might make
sense. If your database runs in your own data center, then usually you also have
disk space available there. Plus a transfer out of your data center will take time.

There is no "per se" recommendation to move data to S3. And there might be
additional requirements like data protection laws, encryption requirements ect.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

Re: Inherited an 18TB DB & need to backup

From
Scottix
Date:
Also when you get in the multi TB data storage the bill gets a little harder to digest in S3.

On Fri, May 15, 2020 at 11:49 Andreas 'ads' Scherbaum <adsmail@wars-nicht.de> wrote:


On Fri, May 15, 2020 at 7:52 PM Ravi Krishna <srkrishna1@comcast.net> wrote:

Why should the backup land in S3, and not local somewhere?
Any good reason why one should pay for the additional storage and transfer costs?

Good question. The key point in my statement was "db of this size".

The problem with local backup is that space is not infinite. If your business requires you to
store backups for say 7 years, storing it locally will be a problem.  In one large financial
company I use to work, full backup was used to store old data.
(except last 30 days where WAL logs were used for a real PIT).  We use to store full backups
for about 60 days and then send older backup to an off site storage.  Nothing is free.

I remember a case where we were requested by business to restore a db of a given date two yrs
prior as they had to look at old data. It took us close to 96 hrs to give the users the required database.

S3 storage is ridiculously cheap.  Off site storage companies like Iron Mountain should find their client base
ditching them big time.

If your database is running somewhere in the cloud, then yes, that might make
sense. If your database runs in your own data center, then usually you also have
disk space available there. Plus a transfer out of your data center will take time.

There is no "per se" recommendation to move data to S3. And there might be
additional requirements like data protection laws, encryption requirements ect.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
--
T: @Thaumion
IG: Thaumion
Scottix@Gmail.com

Re: Inherited an 18TB DB & need to backup

From
Christophe Pettus
Date:

> On May 15, 2020, at 12:01, Scottix <scottix@gmail.com> wrote:
>
> Also when you get in the multi TB data storage the bill gets a little harder to digest in S3.

Indeed.  Right now, just buying off of Amazon, a 12TB Seagate IronWolf drive is $0.03/GB.  S3 infrequent access is
$0.0125/GB/month,so the drive pays for itself (storage only) in 2.4 months.  Even with an annualized failure rate of
1.4%(per Backblaze), and with the required host, rack, etc., etc., it be significantly more economical to run your own
backupserver.  If you regularly do restores off of the backups (for example, to prime staging environments or developer
systems),the outbound transfer can add up fast, too. 
--
-- Christophe Pettus
   xof@thebuild.com




Re: Inherited an 18TB DB & need to backup

From
Ron
Date:
For a database that size, I'd install pgbackrest, since it features parallel backups and compression.  With it, I'd do monthly full backups with daily differential backups.

(If it's mostly historical data, I'd split the database into multiple instances, so that older data rarely needs to be backed up. The application, of course, would have to be modified.)

On 5/15/20 8:26 AM, Suhail Bamzena wrote:
Thanks Rory, the machine has the capacity to pull through pg_dumps but like u rightly mentioned incremental backups mean that we will need to work with the wal's.. 18TB is what is the scary part and with compression I dont see it being less than 2TB a day...

On Fri, 15 May 2020, 17:02 Rory Campbell-Lange, <rory@campbell-lange.net> wrote:
On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> Hello All,
> I have very recently inherited an 18 TB DB that is running version 9.2.
> Apparently this database has never been backed up and I have been tasked to
> set in a periodic backup routine (weekly full & daily incremental) and dump
> it into a NAS. What is the best way to go about this? Did some reading and
> hear that pgbackrest does a good job with such huge sizes. Your expert
> advise is needed.

Incremental backups suggest the need to backup WAL archives. See
https://www.postgresql.org/docs/9.2/continuous-archiving.html

pgbackrest looks very cool but we haven't used it.

A very simple solution could be just to dump the database daily with
pg_dump, if you have the space and machine capacity to do it. Depending
on what you are storing, you can achieve good compression with this, and
it is a great way of having a simple file from which to restore a
database.

Our ~200GB cluster resolves to under 10GB of pg_dump files, although
18TB is a whole different order of size.

Rory

--
Angular momentum makes the world go 'round.

Re: Inherited an 18TB DB & need to backup

From
Michael Nolan
Date:


On Fri, May 15, 2020 at 12:51 PM Ravi Krishna <srkrishna1@comcast.net> wrote:

Why should the backup land in S3, and not local somewhere?
Any good reason why one should pay for the additional storage and transfer costs?

Good question. The key point in my statement was "db of this size".

The problem with local backup is that space is not infinite. 

Another problem is storage devices fail.  S3 storage lakes _should_ be checking your data integrity on a regular basis and possibly maintaining copies of it iin multiple locations so you're not vulnerable to a site disaster.
--
Mike Nolan

Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Thanks Ron.. pgbackrest and barman seem to b good options..

On Sat, 16 May 2020, 02:26 Ron, <ronljohnsonjr@gmail.com> wrote:
For a database that size, I'd install pgbackrest, since it features parallel backups and compression.  With it, I'd do monthly full backups with daily differential backups.

(If it's mostly historical data, I'd split the database into multiple instances, so that older data rarely needs to be backed up. The application, of course, would have to be modified.)

On 5/15/20 8:26 AM, Suhail Bamzena wrote:
Thanks Rory, the machine has the capacity to pull through pg_dumps but like u rightly mentioned incremental backups mean that we will need to work with the wal's.. 18TB is what is the scary part and with compression I dont see it being less than 2TB a day...

On Fri, 15 May 2020, 17:02 Rory Campbell-Lange, <rory@campbell-lange.net> wrote:
On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> Hello All,
> I have very recently inherited an 18 TB DB that is running version 9.2.
> Apparently this database has never been backed up and I have been tasked to
> set in a periodic backup routine (weekly full & daily incremental) and dump
> it into a NAS. What is the best way to go about this? Did some reading and
> hear that pgbackrest does a good job with such huge sizes. Your expert
> advise is needed.

Incremental backups suggest the need to backup WAL archives. See
https://www.postgresql.org/docs/9.2/continuous-archiving.html

pgbackrest looks very cool but we haven't used it.

A very simple solution could be just to dump the database daily with
pg_dump, if you have the space and machine capacity to do it. Depending
on what you are storing, you can achieve good compression with this, and
it is a great way of having a simple file from which to restore a
database.

Our ~200GB cluster resolves to under 10GB of pg_dump files, although
18TB is a whole different order of size.

Rory

--
Angular momentum makes the world go 'round.

Re: Inherited an 18TB DB & need to backup

From
Ron
Date:
On 5/15/20 6:01 PM, Michael Nolan wrote:

On Fri, May 15, 2020 at 12:51 PM Ravi Krishna <srkrishna1@comcast.net> wrote:

Why should the backup land in S3, and not local somewhere?
Any good reason why one should pay for the additional storage and transfer costs?

Good question. The key point in my statement was "db of this size".

The problem with local backup is that space is not infinite. 

Another problem is storage devices fail.  S3 storage lakes _should_ be checking your data integrity on a regular basis and possibly maintaining copies of it iin multiple locations so you're not vulnerable to a site disaster.

Tape FTW!!

--
Angular momentum makes the world go 'round.

Re: Inherited an 18TB DB & need to backup

From
Rob Sargent
Date:

> O
>> Another problem is storage devices fail.  S3 storage lakes _should_ be checking your data integrity on a regular
basisand possibly maintaining copies of it iin multiple locations so you're not vulnerable to a site disaster. 
>
> Tape FTW!!
>
> --
Or WTF Tape??   :)
> Angular momentum makes the world go 'round.



Re: Inherited an 18TB DB & need to backup

From
"Peter J. Holzer"
Date:
On 2020-05-15 14:02:46 +0100, Rory Campbell-Lange wrote:
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > I have very recently inherited an 18 TB DB that is running version 9.2.
> > Apparently this database has never been backed up
[...]
> A very simple solution could be just to dump the database daily with
> pg_dump, if you have the space and machine capacity to do it. Depending
> on what you are storing, you can achieve good compression with this, and
> it is a great way of having a simple file from which to restore a
> database.
>
> Our ~200GB cluster resolves to under 10GB of pg_dump files, although
> 18TB is a whole different order of size.

I love pg_dump (especially the -Fd format), but for a database of that
size it might be too slow. Ours is about 1TB, and «pg_dump --compress=5 -Fd»
takes a bit over 2 hours. Extrapolating to 18 TB that would be 40 hours
...

And restoring the database takes even more time because it only restores
the tables and has to rebuild the indexes.

Still - for a first backup, just firing off pg_dump might be the way to
go. Better to have a backup in two days than still none after two weeks
because you are still evaluating the fancier alternatives.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Hi Peter 
Thanks for the info & the entire forum for their inputs.... i did fireup a pg_dump last night pairing it with gzip & split it to 1TB size.. will let you all know how it goes.



On Sat, 16 May 2020, 18:12 Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:
On 2020-05-15 14:02:46 +0100, Rory Campbell-Lange wrote:
> On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:
> > I have very recently inherited an 18 TB DB that is running version 9.2.
> > Apparently this database has never been backed up
[...]
> A very simple solution could be just to dump the database daily with
> pg_dump, if you have the space and machine capacity to do it. Depending
> on what you are storing, you can achieve good compression with this, and
> it is a great way of having a simple file from which to restore a
> database.
>
> Our ~200GB cluster resolves to under 10GB of pg_dump files, although
> 18TB is a whole different order of size.

I love pg_dump (especially the -Fd format), but for a database of that
size it might be too slow. Ours is about 1TB, and «pg_dump --compress=5 -Fd»
takes a bit over 2 hours. Extrapolating to 18 TB that would be 40 hours
...

And restoring the database takes even more time because it only restores
the tables and has to rebuild the indexes.

Still - for a first backup, just firing off pg_dump might be the way to
go. Better to have a backup in two days than still none after two weeks
because you are still evaluating the fancier alternatives.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: Inherited an 18TB DB & need to backup

From
Ron
Date:
On 5/16/20 7:18 AM, Rob Sargent wrote:
>> O
>>> Another problem is storage devices fail.  S3 storage lakes _should_ be checking your data integrity on a regular
basisand possibly maintaining copies of it iin multiple locations so you're not vulnerable to a site disaster.
 
>> Tape FTW!!
> Or WTF Tape??   :)

Tape is durable, long-lasting, high-density, under your control, can be 
taken off-site (don't underestimate the bandwidth of a station wagon full of 
tapes hurtling down the highway!) and -- with the proper software -- is 
multi-threaded.

-- 
Angular momentum makes the world go 'round.



Re: Inherited an 18TB DB & need to backup

From
Gavin Flower
Date:
On 17/05/2020 08:12, Ron wrote:
> On 5/16/20 7:18 AM, Rob Sargent wrote:
>>> O
>>>> Another problem is storage devices fail.  S3 storage lakes _should_ 
>>>> be checking your data integrity on a regular basis and possibly 
>>>> maintaining copies of it iin multiple locations so you're not 
>>>> vulnerable to a site disaster.
>>> Tape FTW!!
>> Or WTF Tape??   :)
>
> Tape is durable, long-lasting, high-density, under your control, can 
> be taken off-site (don't underestimate the bandwidth of a station 
> wagon full of tapes hurtling down the highway!) and -- with the proper 
> software -- is multi-threaded.
>
Don't you mean multi-spooled??? :-)

Fascinating problem.  If the dump & load programs are designed to take a 
parameter for N drives for effective parallel operation, and N > 2, then 
things will run a lot faster.

I can think of several ways the the data can be dumped in parallel, with 
various trade-offs.  Would love to know how it's implemented in practice.


Cheers,
Gavn




Re: Inherited an 18TB DB & need to backup

From
Ron
Date:
On 5/16/20 3:30 PM, Gavin Flower wrote:
On 17/05/2020 08:12, Ron wrote:
On 5/16/20 7:18 AM, Rob Sargent wrote:
O
Another problem is storage devices fail.  S3 storage lakes _should_ be checking your data integrity on a regular basis and possibly maintaining copies of it iin multiple locations so you're not vulnerable to a site disaster.
Tape FTW!!
Or WTF Tape??   :)

Tape is durable, long-lasting, high-density, under your control, can be taken off-site (don't underestimate the bandwidth of a station wagon full of tapes hurtling down the highway!) and -- with the proper software -- is multi-threaded.

Don't you mean multi-spooled??? :-)

That's a superset of multi-threaded IO.

Fascinating problem.  If the dump & load programs are designed to take a parameter for N drives for effective parallel operation, and N > 2, then things will run a lot faster.

I can think of several ways the the data can be dumped in parallel, with various trade-offs.  Would love to know how it's implemented in practice.

An OS with asynchronous, queued, non-blocking IO, and a programming language with callbacks.  OpenVMS has had it since since at least the early 1990s, and probably mid-1980s.  I remember backing up an Rdb/VMS database to 10 tape drives at the same time.  Typically, though, we "only" used six tape drives for that database, because we simultaneously backed up multiple databases.

--
Angular momentum makes the world go 'round.

Re: Inherited an 18TB DB & need to backup

From
James Sewell
Date:


On Fri, 15 May 2020 at 17:09, Suhail Bamzena <suhailsalem@gmail.com> wrote:
Hello All,
I have very recently inherited an 18 TB DB that is running version 9.2. Apparently this database has never been backed up and I have been tasked to set in a periodic backup routine (weekly full & daily incremental) and dump it into a NAS. What is the best way to go about this? Did some reading and hear that pgbackrest does a good job with such huge sizes. Your expert advise is needed. 


Just chiming in for pgbackrest - we are backing up multiple databases in the 50TB - 150TB range across multiple customers and it works very well (and most importantly at that scale, very efficiently).

I've found the team really responsive to issues / bugs / errors reported via GitHub when we hit them, which is always a bonus. 

James Sewell,
Chief Architect, Jirotech
Suite 46, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Inherited an 18TB DB & need to backup

From
Steve Litt
Date:
On Fri, 15 May 2020 15:08:05 +0200
Christoph Berg <christoph.berg@credativ.de> wrote:

> Re: Rory Campbell-Lange
> > On 15/05/20, Suhail Bamzena (suhailsalem@gmail.com) wrote:  
> > > Hello All,
> > > I have very recently inherited an 18 TB DB that is running
> > > version 9.2.  
> 
> Push hard to get that upgraded to a supported version.
> 
> Christoph

He can't change the version without getting at least one verified
backup. Can you imagine if installing the new version trashed
the database?

SteveT

Steve Litt 
May 2020 featured book: Troubleshooting Techniques
     of the Successful Technologist
http://www.troubleshooters.com/techniques



Re: Inherited an 18TB DB & need to backup

From
Suhail Bamzena
Date:
Thanks James... that is reassuring, will be working on it this week.

On Mon, 18 May 2020, 04:33 James Sewell, <james.sewell@jirotech.com> wrote:


On Fri, 15 May 2020 at 17:09, Suhail Bamzena <suhailsalem@gmail.com> wrote:
Hello All,
I have very recently inherited an 18 TB DB that is running version 9.2. Apparently this database has never been backed up and I have been tasked to set in a periodic backup routine (weekly full & daily incremental) and dump it into a NAS. What is the best way to go about this? Did some reading and hear that pgbackrest does a good job with such huge sizes. Your expert advise is needed. 


Just chiming in for pgbackrest - we are backing up multiple databases in the 50TB - 150TB range across multiple customers and it works very well (and most importantly at that scale, very efficiently).

I've found the team really responsive to issues / bugs / errors reported via GitHub when we hit them, which is always a bonus. 

James Sewell,
Chief Architect, Jirotech
Suite 46, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.