Thread: pg_basebackup is taking more time than expected

pg_basebackup is taking more time than expected

From
Raghavendra Rao J S V
Date:
Hi All,

We have database with the size of 425GB. Out of this 425 GB, Around 40 to 60% of space occupied by the indexes. Ram allocated to this machine is 32GB  and configuration parameters below.

max_connections = 800            
shared_buffers = 9GB     
effective_cache_size = 18GB   
work_mem = 10MB  
maintenance_work_mem = 1536MB   
checkpoint_segments = 50
wal_keep_segments = 80
checkpoint_completion_target = 0.9
wal_buffers = 16MB
Max_prepared_transactions =0  
synchronous_commit = on

In order to take the backup of the database we are using pg_basebackup utility. But it is taking several hours (Around 18hours). Please guide me what are the configuration parameters need to modify to reduce the time taken by the pg_basebackup utility.

Is there any possibility to exclude the index data while taking the pg_basebackup?




--
Regards,
Raghavendra Rao J S V
Mobile- 8861161425

Re: pg_basebackup is taking more time than expected

From
Stephen Frost
Date:
Greetings,

* Raghavendra Rao J S V (raghavendrajsv@gmail.com) wrote:
> We have database with the size of *425GB*. Out of this 425 GB, Around 40 to
> 60% of space occupied by the indexes. Ram allocated to this machine is
> 32GB  and configuration parameters below.
[...]
> In order to take the backup of the database we are using pg_basebackup
> utility. But it is taking several hours (Around 18hours). Please guide me
> what are the configuration parameters need to modify to reduce the time
> taken by the pg_basebackup utility.

Unfortunately, there aren't a lot of great options with pg_basebackup,
but part of the question is- why is it taking so long?  That looks to be
a rate of less than 60Mb/s, assuming I did my calculations right, and
that's pretty slow.  How are you running pg_basebackup?  If you're doing
compression and the pg_basebackup process is consistently at 100% then
that's just the rate which a single CPU can compress data for you.  If
you're running the pg_basebackup across a WAN, then perhaps the
throughput available is only 60Mb/s.

> Is there any possibility to exclude the index data while taking the
> pg_basebackup?

This isn't currently possible, no, and it would seriously increase the
time required to restore the system.  If you aren't concerned about
restore time at all (though, really, you probably should be) then you
could consider using pg_dump instead, which can be run in parallel and
wouldn't include the indexes.  The format is a logical point-in-time
dump though, so you aren't able to do point-in-time recovery (playing
WAL forward) and reloading the data and rebuilding the indexes will take
quite a while.

Lastly, if the issue is that pg_basebackup is single-threaded, or that
you need multiple TCP connections to get higher throughput, then you
should consider one of the alternative physical (file-based) backup
solutions available for PostgreSQL, e.g.: pgBackRest, barman, or WAL-G.

Thanks!

Stephen

Attachment

Re: pg_basebackup is taking more time than expected

From
pinker
Date:
It will not solve the problem, but maybe try with --checkpoint=fast option.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: pg_basebackup is taking more time than expected

From
Raghavendra Rao J S V
Date:
Please let me know below details.

pg_basebackup utility  depends on which are the parameters?

Is there any possibility to run the pg_basebackup in multi thread?

To improve the speed of the backup of database using pg_basebackup utility we shutdown the database and started alone database services. Till that time other sevices won't run. We observed some improvement but any other things we need to perform to reduce the time taken by the  pg_basebackup utility.

We are using below command to take the backup of the database. Any improvements to reduce the time taken by backup statement.

$PGHOME/bin/pg_basebackup -p 5433 -U postgres -P -v -x --format=tar --gzip --compress=6 --pgdata=- -D /opt/backup_db



On Fri, Jan 12, 2018 at 6:37 PM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Raghavendra Rao J S V (raghavendrajsv@gmail.com) wrote:
> We have database with the size of *425GB*. Out of this 425 GB, Around 40 to
> 60% of space occupied by the indexes. Ram allocated to this machine is
> 32GB  and configuration parameters below.
[...]
> In order to take the backup of the database we are using pg_basebackup
> utility. But it is taking several hours (Around 18hours). Please guide me
> what are the configuration parameters need to modify to reduce the time
> taken by the pg_basebackup utility.

Unfortunately, there aren't a lot of great options with pg_basebackup,
but part of the question is- why is it taking so long?  That looks to be
a rate of less than 60Mb/s, assuming I did my calculations right, and
that's pretty slow.  How are you running pg_basebackup?  If you're doing
compression and the pg_basebackup process is consistently at 100% then
that's just the rate which a single CPU can compress data for you.  If
you're running the pg_basebackup across a WAN, then perhaps the
throughput available is only 60Mb/s.

> Is there any possibility to exclude the index data while taking the
> pg_basebackup?

This isn't currently possible, no, and it would seriously increase the
time required to restore the system.  If you aren't concerned about
restore time at all (though, really, you probably should be) then you
could consider using pg_dump instead, which can be run in parallel and
wouldn't include the indexes.  The format is a logical point-in-time
dump though, so you aren't able to do point-in-time recovery (playing
WAL forward) and reloading the data and rebuilding the indexes will take
quite a while.

Lastly, if the issue is that pg_basebackup is single-threaded, or that
you need multiple TCP connections to get higher throughput, then you
should consider one of the alternative physical (file-based) backup
solutions available for PostgreSQL, e.g.: pgBackRest, barman, or WAL-G.

Thanks!

Stephen



--
Regards,
Raghavendra Rao J S V
Mobile- 8861161425

Re: pg_basebackup is taking more time than expected

From
Stephen Frost
Date:
Greetings,

Please don't top-post on the PG mailing lists.

* Raghavendra Rao J S V (raghavendrajsv@gmail.com) wrote:
> pg_basebackup utility  depends on which are the parameters?

I'm not sure what you're asking here.

> Is there any possibility to run the pg_basebackup in multi thread?

No, not today.  There's been discussion about making it multi-threaded
but I seriously doubt that'll happen for v11 at this point.

> To improve the speed of the backup of database using pg_basebackup utility
> we shutdown the database and started alone database services. Till that
> time other sevices won't run. We observed some improvement but any other
> things we need to perform to reduce the time taken by the  pg_basebackup
> utility.

Sure, reducing the load of the system might make pg_basebackup a little
faster, but seems unlikely to help it a lot in this case, and it means
you have downtime which might not be ideal.

> We are using below command to take the backup of the database. Any
> improvements to reduce the time taken by backup statement.
>
> $PGHOME/bin/pg_basebackup -p 5433 -U postgres -P -v -x --format=tar --gzip
> --compress=6 --pgdata=- -D /opt/backup_db

Might be faster if you didn't compress it, but, of course, then you
wouldn't have a compressed backup.

This is exactly the kind of issue that lead to the development of
pgBackRest.  Larger databases really do need multi-threaded backups and
there weren't any backup tools for PG which were multi-threaded when we
started.  There's a few other options now, which is good, but
pg_basebackup isn't one of them.

Thanks!

Stephen

Attachment

Re: pg_basebackup is taking more time than expected

From
Raghavendra Rao J S V
Date:
Thank you very much for your prompt response.

I am asking in my previous mail as, Does the pg_basebackup depends on any of the postgres configuration parameters likes shared buffer/maintanance_work_memory etc? If yes, which are those configuration parameters, I need to take care/increase the value?


Please let me know what does this means.

Please don't top-post on the PG mailing lists.

How to get clarifications on my query?

On Sat, Jan 13, 2018 at 9:52 PM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

Please don't top-post on the PG mailing lists.

* Raghavendra Rao J S V (raghavendrajsv@gmail.com) wrote:
> pg_basebackup utility  depends on which are the parameters?

I'm not sure what you're asking here.

> Is there any possibility to run the pg_basebackup in multi thread?

No, not today.  There's been discussion about making it multi-threaded
but I seriously doubt that'll happen for v11 at this point.

> To improve the speed of the backup of database using pg_basebackup utility
> we shutdown the database and started alone database services. Till that
> time other sevices won't run. We observed some improvement but any other
> things we need to perform to reduce the time taken by the  pg_basebackup
> utility.

Sure, reducing the load of the system might make pg_basebackup a little
faster, but seems unlikely to help it a lot in this case, and it means
you have downtime which might not be ideal.

> We are using below command to take the backup of the database. Any
> improvements to reduce the time taken by backup statement.
>
> $PGHOME/bin/pg_basebackup -p 5433 -U postgres -P -v -x --format=tar --gzip
> --compress=6 --pgdata=- -D /opt/backup_db

Might be faster if you didn't compress it, but, of course, then you
wouldn't have a compressed backup.

This is exactly the kind of issue that lead to the development of
pgBackRest.  Larger databases really do need multi-threaded backups and
there weren't any backup tools for PG which were multi-threaded when we
started.  There's a few other options now, which is good, but
pg_basebackup isn't one of them.

Thanks!

Stephen



--
Regards,
Raghavendra Rao J S V
Mobile- 8861161425

Re: pg_basebackup is taking more time than expected

From
Stephen Frost
Date:
Greetings,

* Raghavendra Rao J S V (raghavendrajsv@gmail.com) wrote:
> I am asking in my previous mail as, Does the pg_basebackup depends on any
> of the postgres configuration parameters likes shared
> buffer/maintanance_work_memory etc? If yes, which are those configuration
> parameters, I need to take care/increase the value?

No, pg_basebackup doesn't really depend on any of those values (except
for things like max_wal_senders and whatnot to allow it to run at all,
of course).

> Please let me know what does this means.
>
> *Please don't top-post on the PG mailing lists.*

It means that you should reply in-line, like I'm doing, and remove text
that isn't relevant to your response.

> *How to get clarifications on my query?*

Either I've answered this above, or I don't know what you're asking
here.

Thanks!

Stephen

Attachment

Re: pg_basebackup is taking more time than expected

From
Raghavendra Rao J S V
Date:
I am looking for the help to minimise the time taken by the pg_basebackup utility.

As informed Earlier we are taking the backup of the database using pg_basbackup utility using below command.

$PGHOME/bin/pg_basebackup -p 5433 -U postgres -P -v -x --format=tar --gzip --compress=6 --pgdata=- -D /opt/backup_db

According to our previous discussion, pg_basebackup is not depend on any of the postgresql configuration parameters. If I go for gzip format we need to compromise on time.

We are planning to take by following below steps. Please correct me if I am wrong.

  1. Identify the larger indexes(whose size is above 256MB) and drop those indexes. Due to this size of the database will reduce.
  2. Take the backup of the database.
  3. Recreate the indexes on the environment where we created the environment which we created using the backup.

I am new to postgres database. Could you  help me to construct the query to drop and create the indexes, please?

Re: pg_basebackup is taking more time than expected

From
"David G. Johnston"
Date:
On Mon, Jan 15, 2018 at 7:57 AM, Raghavendra Rao J S V <raghavendrajsv@gmail.com> wrote:
I am looking for the help to minimise the time taken by the pg_basebackup utility.

As informed Earlier we are taking the backup of the database using pg_basbackup utility using below command.

$PGHOME/bin/pg_basebackup -p 5433 -U postgres -P -v -x --format=tar --gzip --compress=6 --pgdata=- -D /opt/backup_db

According to our previous discussion, pg_basebackup is not depend on any of the postgresql configuration parameters. If I go for gzip format we need to compromise on time.

We are planning to take by following below steps. Please correct me if I am wrong.

  1. Identify the larger indexes(whose size is above 256MB) and drop those indexes. Due to this size of the database will reduce.
​I'm with Stephen on this one - going into standalone mode and dropping indexes seems odd...​

I am new to postgres database. Could you  help me to construct the query to drop and create the indexes, please?


see "CREATE INDEX" and "DROP INDEX" in that listing.​

​You haven't explained the purpose of the backups but its for availability and you are this concerned about speed you should investing it learning and install a hot-standby replication configuration.

Additionally, consider whether you can improve the IO of the setup you are using since, as Stephen said, and I am far from experienced in this area, 18hours for less than 500GB of data seem extraordinarily long.

David J.

RE: pg_basebackup is taking more time than expected

From
Kellner Thiemo
Date:
> According to our previous discussion, pg_basebackup is not depend on any of the postgresql configuration parameters.
IfI go for gzip format we need to compromise on time.
 

You do not necessarily compromise on time when compressing. Actually, a speed gain compared to uncompressed is
possible.The better data is being compressed the less has to be written on (slow) disk. However, it comes with a CPU
loadpenalty. I suggest you experiment with setups. Maybe --compress is doing still well with respect to compression
ratebut consumes considerably less CPU time.