Thread: pg_basebackup inconsistent performance
We have pg_basebackup running on two of our DB servers that are replicated. We are running postgres 11, and it seems we are getting inconsistent performance from the backups and unsure as of why. We start out at 5-6 hours over a dedicated 10G port for 7TB. It creeps up to 8-9 hours then all of a sudden takes 12-16 hours. There seems to be no rhyme or reason for the extended backup times. The command we use for backups is as follows:
On server 2 (secondary), starts at 4PM
pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` --format=plain --write-recovery-conf --no-sync --wal-method=stream --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums --host=<server1> --username=replication --port=5432
On server 1 (Primary), starts at Midnight
pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` --format=plain --write-recovery-conf --no-sync --wal-method=stream --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums --host=<server2> --username=replication --port=5432
I’m not sure why or how we are running into the weeds. I am the SysAdmin and am not familiar with the inner workings of the DB. I can pass any commands that need run along to our DBA.
From the OS perspective, we are not seeing any problems with CPU, memory or disk. We are running on RHEL 7.7
Thanks!
On 5/6/20 5:44 AM, Jasen Lentz wrote: > We have pg_basebackup running on two of our DB servers that are > replicated. We are running postgres 11, and it seems we are getting > inconsistent performance from the backups and unsure as of why. We > start out at 5-6 hours over a dedicated 10G port for 7TB. It creeps up > to 8-9 hours then all of a sudden takes 12-16 hours. There seems to be I'm guessing the above happens from one run to another correct? Where are the machines you are backing up from/to relative to each on the network? Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups? > no rhyme or reason for the extended backup times. The command we use > for backups is as follows: > > On server 2 (secondary), starts at 4PM > > pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` > --format=plain --write-recovery-conf --no-sync --wal-method=stream > --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums > --host=<server1> --username=replication --port=5432 > > On server 1 (Primary), starts at Midnight > > pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` > --format=plain --write-recovery-conf --no-sync --wal-method=stream > --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums > --host=<server2> --username=replication --port=5432 > > I’m not sure why or how we are running into the weeds. I am the > SysAdmin and am not familiar with the inner workings of the DB. I can > pass any commands that need run along to our DBA. > > From the OS perspective, we are not seeing any problems with CPU, > memory or disk. We are running on RHEL 7.7 > > Thanks! > -- Adrian Klaver adrian.klaver@aklaver.com
On 5/6/20 5:44 AM, Jasen Lentz wrote: > We have pg_basebackup running on two of our DB servers that are > replicated. We are running postgres 11, and it seems we are getting > inconsistent performance from the backups and unsure as of why. We > start out at 5-6 hours over a dedicated 10G port for 7TB. It creeps up > to 8-9 hours then all of a sudden takes 12-16 hours. There seems to be I'm guessing the above happens from one run to another correct? Where are the machines you are backing up from/to relative to each on the network? Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups? > no rhyme or reason for the extended backup times. The command we use > for backups is as follows: > > On server 2 (secondary), starts at 4PM > > pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` > --format=plain --write-recovery-conf --no-sync --wal-method=stream > --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums > --host=<server1> --username=replication --port=5432 > > On server 1 (Primary), starts at Midnight > > pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` > --format=plain --write-recovery-conf --no-sync --wal-method=stream > --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums > --host=<server2> --username=replication --port=5432 > > I’m not sure why or how we are running into the weeds. I am the > SysAdmin and am not familiar with the inner workings of the DB. I can > pass any commands that need run along to our DBA. > > From the OS perspective, we are not seeing any problems with CPU, > memory or disk. We are running on RHEL 7.7 > > Thanks! > -- Adrian Klaver adrian.klaver@aklaver.com
I'm guessing the above happens from one run to another correct? Yes Where are the machines you are backing up from/to relative to each on the network? Direct ethernet connection between 10G network interfaces Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups? Not according to sar reports Jasen M. Lentz, M.Ed Lead Systems Administrator Sesco Enterprises, LLC 4977 State Route 30 East (Mailing Address Only) Greensburg, PA 15601 W: (724) 837-1991 x207 C: (412) 848-5612 -----Original Message----- From: Adrian Klaver <adrian.klaver@aklaver.com> Sent: Wednesday, May 6, 2020 10:28 AM To: Jasen Lentz <jlentz@sescollc.com>; pgsql-general@lists.postgresql.org Subject: Re: pg_basebackup inconsistent performance On 5/6/20 5:44 AM, Jasen Lentz wrote: > We have pg_basebackup running on two of our DB servers that are > replicated. We are running postgres 11, and it seems we are getting > inconsistent performance from the backups and unsure as of why. We > start out at 5-6 hours over a dedicated 10G port for 7TB. It creeps > up to 8-9 hours then all of a sudden takes 12-16 hours. There seems > to be I'm guessing the above happens from one run to another correct? Where are the machines you are backing up from/to relative to each on the network? Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups? > no rhyme or reason for the extended backup times. The command we use > for backups is as follows: > > On server 2 (secondary), starts at 4PM > > pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` > --format=plain --write-recovery-conf --no-sync --wal-method=stream > --checkpoint=fast --label=`hostname`-`echo $DATE` > --no-verify-checksums --host=<server1> --username=replication > --port=5432 > > On server 1 (Primary), starts at Midnight > > pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` > --format=plain --write-recovery-conf --no-sync --wal-method=stream > --checkpoint=fast --label=`hostname`-`echo $DATE` > --no-verify-checksums --host=<server2> --username=replication > --port=5432 > > I'm not sure why or how we are running into the weeds. I am the > SysAdmin and am not familiar with the inner workings of the DB. I can > pass any commands that need run along to our DBA. > > From the OS perspective, we are not seeing any problems with CPU, > memory or disk. We are running on RHEL 7.7 > > Thanks! > -- Adrian Klaver adrian.klaver@aklaver.com
Greetings, * Jasen Lentz (jlentz@sescollc.com) wrote: > Where are the machines you are backing up from/to relative to each on the network? > Direct ethernet connection between 10G network interfaces Is the backup server shared among other systems..? > Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups? > Not according to sar reports And there's no increased activity on the backup server either? Have you looked at network traffic for the duration? And/or disk i/o on each system? If you ran a backup once and then immediately after and that's the 'fast' case then you may be seeing performance be better due to a lot of data being in the filesystem cache. pg_basebackup being single-threaded probably doesn't help here either, you might want to consider one of the parallel-backup options. Thanks, Stephen
Attachment
Is the backup server shared among other systems..? No, physical system And there's no increased activity on the backup server either? No Have you looked at network traffic for the duration? And/or disk i/o on each system? If you ran a backup once and thenimmediately after and that's the 'fast' case then you may be seeing performance be better due to a lot of data beingin the filesystem cache. pg_basebackup being single-threaded probably doesn't help here either, you might want to considerone of the parallel-backup options. Yes, looked at all the system stats, nothing changed, just the backups running extremely long. Was looking into pgbackrest, just haven't gotten it configured yet -----Original Message----- From: Stephen Frost <sfrost@snowman.net> Sent: Wednesday, May 6, 2020 12:30 PM To: Jasen Lentz <jlentz@sescollc.com> Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org Subject: Re: pg_basebackup inconsistent performance Greetings, * Jasen Lentz (jlentz@sescollc.com) wrote: > Where are the machines you are backing up from/to relative to each on the network? > Direct ethernet connection between 10G network interfaces Is the backup server shared among other systems..? > Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups? > Not according to sar reports And there's no increased activity on the backup server either? Have you looked at network traffic for the duration? And/or disk i/o on each system? If you ran a backup once and thenimmediately after and that's the 'fast' case then you may be seeing performance be better due to a lot of data beingin the filesystem cache. pg_basebackup being single-threaded probably doesn't help here either, you might want to considerone of the parallel-backup options. Thanks, Stephen