Thread: pg_basebackup inconsistent performance

pg_basebackup inconsistent performance

From
Jasen Lentz
Date:

We have pg_basebackup running on two of our DB servers that are replicated.  We are running postgres 11, and it seems we are getting inconsistent performance from the backups and unsure as of why.  We start out at 5-6 hours over a dedicated 10G port for 7TB.  It creeps up to 8-9 hours then all of a sudden takes 12-16 hours.  There seems to be no rhyme or reason for the extended backup times.  The command we use for backups is as follows:

 

On server 2 (secondary), starts at 4PM

pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` --format=plain --write-recovery-conf --no-sync --wal-method=stream --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums --host=<server1> --username=replication --port=5432

 

On server 1 (Primary), starts at Midnight

pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` --format=plain --write-recovery-conf --no-sync --wal-method=stream --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums --host=<server2> --username=replication --port=5432

 

I’m not sure why or how we are running into the weeds.  I am the SysAdmin and am not familiar with the inner workings of the DB.  I can pass any commands that need run along to our DBA.

 

From the OS perspective, we are not seeing any problems with CPU, memory or disk.  We are running on RHEL 7.7

 

Thanks!

 

 

Re: pg_basebackup inconsistent performance

From
Adrian Klaver
Date:
On 5/6/20 5:44 AM, Jasen Lentz wrote:
> We have pg_basebackup running on two of our DB servers that are 
> replicated.  We are running postgres 11, and it seems we are getting 
> inconsistent performance from the backups and unsure as of why.  We 
> start out at 5-6 hours over a dedicated 10G port for 7TB.  It creeps up 
> to 8-9 hours then all of a sudden takes 12-16 hours.  There seems to be 

I'm guessing the above happens from one run to another correct?

Where are the machines you are backing up from/to relative to each on 
the network?

Is there increased activity on the database servers e.g. inserts, 
updates, etc during the extended backups?

> no rhyme or reason for the extended backup times.  The command we use 
> for backups is as follows:
> 
> On server 2 (secondary), starts at 4PM
> 
> pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` 
> --format=plain --write-recovery-conf --no-sync --wal-method=stream 
> --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums 
> --host=<server1> --username=replication --port=5432
> 
> On server 1 (Primary), starts at Midnight
> 
> pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` 
> --format=plain --write-recovery-conf --no-sync --wal-method=stream 
> --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums 
> --host=<server2> --username=replication --port=5432
> 
> I’m not sure why or how we are running into the weeds.  I am the 
> SysAdmin and am not familiar with the inner workings of the DB.  I can 
> pass any commands that need run along to our DBA.
> 
>  From the OS perspective, we are not seeing any problems with CPU, 
> memory or disk.  We are running on RHEL 7.7
> 
> Thanks!
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_basebackup inconsistent performance

From
Adrian Klaver
Date:
On 5/6/20 5:44 AM, Jasen Lentz wrote:
> We have pg_basebackup running on two of our DB servers that are 
> replicated.  We are running postgres 11, and it seems we are getting 
> inconsistent performance from the backups and unsure as of why.  We 
> start out at 5-6 hours over a dedicated 10G port for 7TB.  It creeps up 
> to 8-9 hours then all of a sudden takes 12-16 hours.  There seems to be 

I'm guessing the above happens from one run to another correct?

Where are the machines you are backing up from/to relative to each on 
the network?

Is there increased activity on the database servers e.g. inserts, 
updates, etc during the extended backups?

> no rhyme or reason for the extended backup times.  The command we use 
> for backups is as follows:
> 
> On server 2 (secondary), starts at 4PM
> 
> pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` 
> --format=plain --write-recovery-conf --no-sync --wal-method=stream 
> --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums 
> --host=<server1> --username=replication --port=5432
> 
> On server 1 (Primary), starts at Midnight
> 
> pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE` 
> --format=plain --write-recovery-conf --no-sync --wal-method=stream 
> --checkpoint=fast --label=`hostname`-`echo $DATE` --no-verify-checksums 
> --host=<server2> --username=replication --port=5432
> 
> I’m not sure why or how we are running into the weeds.  I am the 
> SysAdmin and am not familiar with the inner workings of the DB.  I can 
> pass any commands that need run along to our DBA.
> 
>  From the OS perspective, we are not seeing any problems with CPU, 
> memory or disk.  We are running on RHEL 7.7
> 
> Thanks!
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: pg_basebackup inconsistent performance

From
Jasen Lentz
Date:
I'm guessing the above happens from one run to another correct?
Yes

Where are the machines you are backing up from/to relative to each on the network?
Direct ethernet connection between 10G network interfaces

Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups?
Not according to sar reports



Jasen M. Lentz, M.Ed
Lead Systems Administrator
Sesco Enterprises, LLC
4977 State Route 30 East (Mailing Address Only)
Greensburg, PA 15601
W:  (724) 837-1991 x207
C:  (412) 848-5612


-----Original Message-----
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Wednesday, May 6, 2020 10:28 AM
To: Jasen Lentz <jlentz@sescollc.com>; pgsql-general@lists.postgresql.org
Subject: Re: pg_basebackup inconsistent performance

On 5/6/20 5:44 AM, Jasen Lentz wrote:
> We have pg_basebackup running on two of our DB servers that are
> replicated.  We are running postgres 11, and it seems we are getting
> inconsistent performance from the backups and unsure as of why.  We
> start out at 5-6 hours over a dedicated 10G port for 7TB.  It creeps
> up to 8-9 hours then all of a sudden takes 12-16 hours.  There seems
> to be

I'm guessing the above happens from one run to another correct?

Where are the machines you are backing up from/to relative to each on the network?

Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups?

> no rhyme or reason for the extended backup times.  The command we use
> for backups is as follows:
>
> On server 2 (secondary), starts at 4PM
>
> pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE`
> --format=plain --write-recovery-conf --no-sync --wal-method=stream
> --checkpoint=fast --label=`hostname`-`echo $DATE`
> --no-verify-checksums --host=<server1> --username=replication
> --port=5432
>
> On server 1 (Primary), starts at Midnight
>
> pg_basebackup --pgdata=/opt/postgres/pgbackup/`echo $DATE`
> --format=plain --write-recovery-conf --no-sync --wal-method=stream
> --checkpoint=fast --label=`hostname`-`echo $DATE`
> --no-verify-checksums --host=<server2> --username=replication
> --port=5432
>
> I'm not sure why or how we are running into the weeds.  I am the
> SysAdmin and am not familiar with the inner workings of the DB.  I can
> pass any commands that need run along to our DBA.
>
>  From the OS perspective, we are not seeing any problems with CPU,
> memory or disk.  We are running on RHEL 7.7
>
> Thanks!
>


--
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_basebackup inconsistent performance

From
Stephen Frost
Date:
Greetings,

* Jasen Lentz (jlentz@sescollc.com) wrote:
> Where are the machines you are backing up from/to relative to each on the network?
> Direct ethernet connection between 10G network interfaces

Is the backup server shared among other systems..?

> Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups?
> Not according to sar reports

And there's no increased activity on the backup server either?

Have you looked at network traffic for the duration?  And/or disk i/o on
each system?  If you ran a backup once and then immediately after and
that's the 'fast' case then you may be seeing performance be better due
to a lot of data being in the filesystem cache.  pg_basebackup being
single-threaded probably doesn't help here either, you might want to
consider one of the parallel-backup options.

Thanks,

Stephen

Attachment

RE: pg_basebackup inconsistent performance

From
Jasen Lentz
Date:
Is the backup server shared among other systems..?
No,  physical system

And there's no increased activity on the backup server either?
No

Have you looked at network traffic for the duration?  And/or disk i/o on each system?  If you ran a backup once and
thenimmediately after and that's the 'fast' case then you may be seeing performance be better due to a lot of data
beingin the filesystem cache.  pg_basebackup being single-threaded probably doesn't help here either, you might want to
considerone of the parallel-backup options. 

Yes, looked at all the system stats, nothing changed, just the backups running extremely long.

Was looking into pgbackrest, just haven't gotten it configured yet


-----Original Message-----
From: Stephen Frost <sfrost@snowman.net>
Sent: Wednesday, May 6, 2020 12:30 PM
To: Jasen Lentz <jlentz@sescollc.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org
Subject: Re: pg_basebackup inconsistent performance

Greetings,

* Jasen Lentz (jlentz@sescollc.com) wrote:
> Where are the machines you are backing up from/to relative to each on the network?
> Direct ethernet connection between 10G network interfaces

Is the backup server shared among other systems..?

> Is there increased activity on the database servers e.g. inserts, updates, etc during the extended backups?
> Not according to sar reports

And there's no increased activity on the backup server either?

Have you looked at network traffic for the duration?  And/or disk i/o on each system?  If you ran a backup once and
thenimmediately after and that's the 'fast' case then you may be seeing performance be better due to a lot of data
beingin the filesystem cache.  pg_basebackup being single-threaded probably doesn't help here either, you might want to
considerone of the parallel-backup options. 

Thanks,

Stephen