Thread: Performance for initial copy when using pg_logical to upgrade Postgres

Performance for initial copy when using pg_logical to upgrade Postgres

From
"Westwood, Giles"
Date:
At Orcid we're trying to upgrade our Postgres database (10 to 13) using pg_logical for no downtime. The problem we have is how long the initial copy is taking for the ~500GB database. If it takes say 20days to complete, will we need to have 20days of WAL files to start catching up when it's complete?

I read an earlier thread which pointed me to the tool pglogical_create_subscriber which does a pg_basebackup to start the initial replication but this is only going to be useful for logical clusters on the same version.

I had hoped that the COPY could be parallelized more by "max_sync_workers_per_subscription" which is set to 2. However there's only a single process:-

postgres 1022196  6.0 24.5 588340 491564 ?       Ds   Sep22 193:19 postgres: main: xxx xxxx 10.xx.xx.xx(59144) COPY

One of the best resources I've found of real world examples are thead on gitlabs own gitlab about their Postgres migrations. They discussed one method that might work:-

1. Setup 9.6 secondary via streaming
2. Turn physical secondary into logical secondary
3. Shutdown and upgrade secondary to 10
4. Turn secondary back on.

In which case we would only need the time required to perform the upgrade.
 
--
Giles Westwood
Senior Devops Engineer, ORCID

Re: Performance for initial copy when using pg_logical to upgrade Postgres

From
Justin Pryzby
Date:
On Fri, Sep 24, 2021 at 03:28:50PM +0100, Westwood, Giles wrote:
> At Orcid we're trying to upgrade our Postgres database (10 to 13) using
> pg_logical for no downtime. The problem we have is how long the initial
> copy is taking for the ~500GB database. If it takes say 20days to complete,
> will we need to have 20days of WAL files to start catching up when it's
> complete?

Did you see this thread and its suggestions to 1) set bulk load parameters;
and, 2) drop indexes and FKs ?

https://www.postgresql.org/message-id/flat/4A8EFC4E-A264-457D-A8E7-AE324ED9ABD4@thebuild.com

-- 
Justin



Re: Performance for initial copy when using pg_logical to upgrade Postgres

From
"Westwood, Giles"
Date:




On Fri, Sep 24, 2021 at 3:39 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, Sep 24, 2021 at 03:28:50PM +0100, Westwood, Giles wrote:

Did you see this thread and its suggestions to 1) set bulk load parameters;
and, 2) drop indexes and FKs ?

https://www.postgresql.org/message-id/flat/4A8EFC4E-A264-457D-A8E7-AE324ED9ABD4@thebuild.com


I did actually but I wanted to avoid getting my hands dirty with anything schema wise. I've found another person with another similar situation:-


Re: Performance for initial copy when using pg_logical to upgrade Postgres

From
Andrew Dunstan
Date:
On 9/24/21 10:28 AM, Westwood, Giles wrote:
> At Orcid we're trying to upgrade our Postgres database (10 to 13)
> using pg_logical for no downtime. The problem we have is how long the
> initial copy is taking for the ~500GB database. If it takes say 20days
> to complete, will we need to have 20days of WAL files to start
> catching up when it's complete?
>
> I read an earlier thread which pointed me to the tool
> pglogical_create_subscriber which does a pg_basebackup to start the
> initial replication but this is only going to be useful for logical
> clusters on the same version.
>
> I had hoped that the COPY could be parallelized more by
> "max_sync_workers_per_subscription" which is set to 2. However there's
> only a single process:-
>
> postgres 1022196  6.0 24.5 588340 491564 ?       Ds   Sep22 193:19
> postgres: main: xxx xxxx 10.xx.xx.xx(59144) COPY
>
> One of the best resources I've found of real world examples are thead
> on gitlabs own gitlab about their Postgres migrations. They discussed
> one method that might work:-
>
> 1. Setup 9.6 secondary via streaming
> 2. Turn physical secondary into logical secondary
> 3. Shutdown and upgrade secondary to 10
> 4. Turn secondary back on.
>
> In which case we would only need the time required to perform the upgrade.


If you're using the pglogical extension, the best way is often to create
the replica as a physical replica (using pg_basebackup for example), and
then using the extension's utility program pglogical_create_subscriber
to convert the physical replica to a logical replica, which you then
upgrade and switch over to.


Of course, test it out before doing this for real.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




I'm currently doing this with a 2.2 TB database. 

Best way IMO is to (UPDATE pg_index SET indisready = false ... ) for non PK indexes for the largest tables. Then just set it back to indisready = true after its done and run a REINDEX CONCURRENTLY on the indexes that were disabled.

Got about a transfer speed of 100GB per 50 minutes with this method with consistent results.

On Fri, Sep 24, 2021 at 11:49 AM Westwood, Giles <g.westwood@orcid.org> wrote:




On Fri, Sep 24, 2021 at 3:39 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, Sep 24, 2021 at 03:28:50PM +0100, Westwood, Giles wrote:

Did you see this thread and its suggestions to 1) set bulk load parameters;
and, 2) drop indexes and FKs ?

https://www.postgresql.org/message-id/flat/4A8EFC4E-A264-457D-A8E7-AE324ED9ABD4@thebuild.com


I did actually but I wanted to avoid getting my hands dirty with anything schema wise. I've found another person with another similar situation:-


Re: Performance for initial copy when using pg_logical to upgrade Postgres

From
"Westwood, Giles"
Date:



On Fri, Sep 24, 2021 at 5:02 PM Tim <timfosho@gmail.com> wrote:
I'm currently doing this with a 2.2 TB database. 

Best way IMO is to (UPDATE pg_index SET indisready = false ... ) for non PK indexes for the largest tables. Then just set it back to indisready = true after its done and run a REINDEX CONCURRENTLY on the indexes that were disabled.

Got about a transfer speed of 100GB per 50 minutes with this method with consistent results.

Thanks Tim, that has worked great. I'm trying to automate the whole process but I can't see a way of seeing when the initial pglogical copy is complete short of checking the disk space.

All I've found is:-

select * from pglogical.local_sync_status;
 sync_kind | sync_subid | sync_nspname | sync_relname | sync_status | sync_statuslsn
-----------+------------+--------------+--------------+-------------+----------------
 d         | 1821676733 |              |              | d           | 0/0
(1 row)

or

xxx=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+--------------------------------
pid              | 3469521
usesysid         | 77668435
usename          | xxx
application_name | xxxx_snap
client_addr      | 10.44.16.83
client_hostname  |
client_port      | 52594
backend_start    | 2021-10-27 12:51:17.618734+00
backend_xmin     | 221892481
state            | startup
sent_lsn         |
write_lsn        |
flush_lsn        |
replay_lsn       |
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async