Thread: a back up question
Are there rules for thumb for deciding when you can dump a whole database and when you’d be better off dumping groups of tables? I have a database that has around 100 tables, some of them quite large, and right now the data directory is well over 100GB. My hunch is that I should divide and conquer, but I don’t have a clear sense of what counts as “too big” these days. Nor do I have a clear sense of whether the constraints have to do with overall size, the number of tables, or machine memory (my machine has 32GB of memory).
Is 10GB a good practical limit to keep in mind?
Are there rules for thumb for deciding when you can dump a whole database and when you’d be better off dumping groups of tables? I have a database that has around 100 tables, some of them quite large, and right now the data directory is well over 100GB. My hunch is that I should divide and conquer, but I don’t have a clear sense of what counts as “too big” these days. Nor do I have a clear sense of whether the constraints have to do with overall size, the number of tables, or machine memory (my machine has 32GB of memory).
Is 10GB a good practical limit to keep in mind?
Are there rules for thumb for deciding when you can dump a whole database and when you’d be better off dumping groups of tables? I have a database that has around 100 tables, some of them quite large, and right now the data directory is well over 100GB. My hunch is that I should divide and conquer, but I don’t have a clear sense of what counts as “too big” these days. Nor do I have a clear sense of whether the constraints have to do with overall size, the number of tables, or machine memory (my machine has 32GB of memory).
Is 10GB a good practical limit to keep in mind?
--
Time is not really a problem for me, if we talk about hours rather than days. On a roughly comparable machine I’ve made backups of databases less than 10 GB, and it was a matter of minutes. But I know that there are scale problems. Sometimes programs just hang if the data are beyond some size. Is that likely in Postgres if you go from ~ 10 GB to ~100 GB? There isn’t any interdependence among my tables beyond queries I construct on the fly, because I use the database in a single user environment
From: "David G. Johnston" <david.g.johnston@gmail.com>
Date: Tuesday, December 5, 2017 at 3:59 PM
To: Martin Mueller <martinmueller@northwestern.edu>
Cc: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Subject: Re: a back up question
On Tue, Dec 5, 2017 at 2:52 PM, Martin Mueller <martinmueller@northwestern.edu> wrote:
Are there rules for thumb for deciding when you can dump a whole database and when you’d be better off dumping groups of tables? I have a database that has around 100 tables, some of them quite large, and right now the data directory is well over 100GB. My hunch is that I should divide and conquer, but I don’t have a clear sense of what counts as “too big” these days. Nor do I have a clear sense of whether the constraints have to do with overall size, the number of tables, or machine memory (my machine has 32GB of memory).
Is 10GB a good practical limit to keep in mind?
I'd say the rule-of-thumb is if you have to "divide-and-conquer" you should use non-pg_dump based backup solutions. Too big is usually measured in units of time, not memory.
Any ability to partition your backups into discrete chunks is going to be very specific to your personal setup. Restoring such a monster without constraint violations is something I'd be VERY worried about.
David J.
Carl Karsten wrote: > Nothing wrong with lots of tables and data. > > Don't impose any constraints on your problem you don't need to. > > Like what are you backing up to? $400 for a 1T ssd or $80 fo a 2T usb3 > spinny disk. > > If you are backing up while the db is being updated, you need to make sure > updates are queued until the backup is done. don't mess with that > process. personally I would assume the db is always being updated and > expect that. A backup generated by pg_dump never includes writes that are in flight while the backup is being taken. That would make the backup absolutely worthless! -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Time is not really a problem for me, if we talk about hours rather than days. On a roughly comparable machine I’ve made backups of databases less than 10 GB, and it was a matter of minutes. But I know that there are scale problems. Sometimes programs just hang if the data are beyond some size. Is that likely in Postgres if you go from ~ 10 GB to ~100 GB? There isn’t any interdependence among my tables beyond queries I construct on the fly, because I use the database in a single user environment
Carl Karsten wrote:
> Nothing wrong with lots of tables and data.
>
> Don't impose any constraints on your problem you don't need to.
>
> Like what are you backing up to? $400 for a 1T ssd or $80 fo a 2T usb3
> spinny disk.
>
> If you are backing up while the db is being updated, you need to make sure
> updates are queued until the backup is done. don't mess with that
> process. personally I would assume the db is always being updated and
> expect that.
A backup generated by pg_dump never includes writes that are in flight
while the backup is being taken. That would make the backup absolutely
worthless!
--
Time is not really a problem for me, if we talk about hours rather than days. On a roughly comparable machine I’ve made backups of databases less than 10 GB, and it was a matter of minutes. But I know that there are scale problems. Sometimes programs just hang if the data are beyond some size. Is that likely in Postgres if you go from ~ 10 GB to ~100 GB? There isn’t any interdependence among my tables beyond queries I construct on the fly, because I use the database in a single user environment
another factor is restore time. restores have to create indexes. creating indexes on multi-million-row tables can take awhile. (hint, be sure to set maintenance_work_mem to 1GB before doing this!)
-- john r pierce, recycling bits in santa cruz
Carl Karsten wrote: > On Tue, Dec 5, 2017 at 4:15 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> > wrote: > > A backup generated by pg_dump never includes writes that are in flight > > while the backup is being taken. That would make the backup absolutely > > worthless! > > Hmm, i kinda glossed over my point: > if you come up with your own process to chop up the backup into little > pieces, you risk letting writes in, and then yeah, worthless. Ah, sure. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 05, 2017 at 09:52:28PM +0000, Martin Mueller wrote: > Are there rules for thumb for deciding when you can dump a > whole database and when you’d be better off dumping groups of > tables? It seems to me we'd have to define the objective of "dumping" first ? Regards, Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On 12/6/17, 4:39 AM, "karsten.hilbert@gmx.net" <karsten.hilbert@gmx.net> wrote: On Tue, Dec 05, 2017 at 09:52:28PM +0000, Martin Mueller wrote: > Are there rules for thumb for deciding when you can dump a > whole database and when you’d be better off dumping groups of > tables? It seems to me we'd have to define the objective of "dumping" first ? Regards, Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 The objective is to create a backup from which I can restore any or all tables in the event of a crash. In my case, I usePostgres for my own scholarly purposes. Publications of whatever kind are not directly made public via the database. Iam my only customer, and a service interruption, while a nuisance to me, does not create a crisis for others. I don’t wantto lose my work, but a service interruption of a day or a week is no big deal.
On Wed, Dec 06, 2017 at 12:52:53PM +0000, Martin Mueller wrote: >> Are there rules for thumb for deciding when you can dump a >> whole database and when you’d be better off dumping groups of >> tables? >> It seems to me we'd have to define the objective of "dumping" first ? > The objective is to create a backup from which I can > restore any or all tables in the event of a crash. I see. "Any or all" speaks in recommendation of non-plain output formats _if_ using pg_dump. > In my case, I use Postgres for my own scholarly purposes. > Publications of whatever kind are not directly made public > via the database. I am my only customer, and a service > interruption, while a nuisance to me, does not create a > crisis for others. I don’t want to lose my work, but a > service interruption of a day or a week is no big deal. In that case I would stick to pg_dump, perhaps with directory format and then tarred and compressed, until you notice actual problems (unbearable slowdown of the machine during backup, running out of disk space). My 2 cents, Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
John, all, * John R Pierce (pierce@hogranch.com) wrote: > On 12/5/2017 2:09 PM, Martin Mueller wrote: > >Time is not really a problem for me, if we talk about hours rather > >than days. On a roughly comparable machine I’ve made backups of > >databases less than 10 GB, and it was a matter of minutes. But I > >know that there are scale problems. Sometimes programs just hang > >if the data are beyond some size. Is that likely in Postgres if > >you go from ~ 10 GB to ~100 GB? There isn’t any interdependence > >among my tables beyond queries I construct on the fly, because I > >use the database in a single user environment > > another factor is restore time. restores have to create > indexes. creating indexes on multi-million-row tables can take > awhile. (hint, be sure to set maintenance_work_mem to 1GB before > doing this!) I'm sure you're aware of this John, but for others following along, just to be clear: indexes have to be recreated when restoring from a *logical* (eg: pg_dump based) backups. Indexes don't have to be recreated for *physical* (eg: file-based) backups. Neither pg_dump nor the various physical-backup utilities should hang or have issues with larger data sets. Thanks! Stephen
The objective is to create a backup from which I can restore any or all tables in the event of a crash. In my case, I use Postgres for my own scholarly purposes. Publications of whatever kind are not directly made public via the database. I am my only customer, and a service interruption, while a nuisance to me, does not create a crisis for others. I don’t want to lose my work, but a service interruption of a day or a week is no big deal.
The objective is to create a backup from which I can restore any or all tables in the event of a crash. In my case, I use Postgres for my own scholarly purposes. Publications of whatever kind are not directly made public via the database. I am my only customer, and a service interruption, while a nuisance to me, does not create a crisis for others. I don’t want to lose my work, but a service interruption of a day or a week is no big deal.
On 12/6/17, 4:39 AM, "karsten.hilbert@gmx.net" <karsten.hilbert@gmx.net> wrote:
On Tue, Dec 05, 2017 at 09:52:28PM +0000, Martin Mueller wrote:
> Are there rules for thumb for deciding when you can dump a
> whole database and when you’d be better off dumping groups of
> tables?
It seems to me we'd have to define the objective of "dumping" first ?
Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346