Re: Fast logical replication jump start with PG 10 - Mailing list pgsql-general

From Olivier Gautherot
Subject Re: Fast logical replication jump start with PG 10
Date
Msg-id CAJ7S9TUZJfcYEqQOxiLEy7d05NHfG7N0AyZYsG-4ngvb9PwLAQ@mail.gmail.com
Whole thread Raw
In response to Re: Fast logical replication jump start with PG 10  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Fast logical replication jump start with PG 10  (Adrian Klaver <adrian.klaver@aklaver.com>)
Re: Fast logical replication jump start with PG 10  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Hi Adrian!

On Sat, May 26, 2018 at 12:11 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 05/25/2018 06:35 PM, Olivier Gautherot wrote:
Hi Adrian, thanks for your reply. Here is the clarification.

1) It is indeed a pg_upgrade from 9.2 to 10.4. Depending on the test machine, it runs in between 15 and 20 minutes for just over 100GB. I can negotiate this time with our customer. The vacuum process took another 5 to 7 minutes. This this what I was referring to with the 30 minutes (point 3 in your questions)

2) After pg_upgrade, I published the tables on the database (in the sense "CREATE DATABASE") and subscribed to this publication on the second server (logical replication). The data copy processed started immediately and took around 1 hour. I then loaded the indexes, what took > another 2h20m. At that point the active-passive cluster was ready to go.

The index creation was done on the replicated machine I presume, using what command?

The sequence on the replicated machine was (pseudo-code to simplify the syntax):
- pg_dump --section=pre-data -h master_machine master_database | psql -h replication_machine replication_database
# This took seconds, "pre-data" discards the indexes

- psql -h replication_machine -c "CREATE SUBSCRIPTION mysub CONNECTION "..." PUBLICATION mypub;" replication_database
# This took about 1 hour for the initial sync

- pg_dump --section=post-data -h master_machine master_database | psql -h replication_machine replication_database
# This took 2h20m to load the various indexes

This sequence follows the recommendation of section 14.4.3 in https://www.postgresql.org/docs/10/static/populate.html . If I stick to streaming as we do today (e.g. pg_upgrade and then rsync to the replication server), I can be ready in about 1 hour (more acceptable for the customer).

The reasons for the indexes to take so long is the large number of them on big tables (for instance, 7 indexes on a partitioned table, with 3 partitions of 15GB of data in 30M rows). I will skip the reasons that got us there (please no flames, I'm aware of the issue :-) ). I don't have definite execution times for the Production environment (in a datacenter), which tends to be kind of a lottery in terms of execution times compared to testing (on a desktop in the office).
 

Note that the active and the passive databases are on different machines.

4) By "database" I mean the result of "CREATE DATABASE" and we have 1 per server (or "cluster" in your terminology - I tend to use this word for a group of machines). We are currently using a streaming replication

Yeah I understand, it is just that database and cluster have specific meanings in Postgres and it helps to stick to those meanings when discussing replication operations. Lowers the confusion level:)

between the 9.2 servers, so it could be a fall-back option after the upgrade (I wanted to remove part of the indexes on the master to lower the load, reason to use the logical replication... if the execution time is not too excessive).

So the time you showed was with those indexes removed or not?

I did try to synchronize the database with the indexes installed and eventually dropped the replication database after a full week-end of hectic activity (apparently, the initial sync job was not finished...). I will try it again just to make sure but I'm fairly positive that I will get to the same result.



Hope it clarifies the question
Best regards
Olivier


Olivier Gautherot
olivier@gautherot.net <mailto:olivier@gautherot.net>
Cel:+56 98 730 9361
Skype: ogautherot
www.gautherot.net <http://www.gautherot.net>
http://www.linkedin.com/in/ogautherot


On Fri, May 25, 2018 at 7:51 PM, Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

    On 05/25/2018 02:12 PM, Olivier Gautherot wrote:

        Hi,

        I just sent the question on StackOverflow but realized that this
        audience may be more savvy. So sorry in advance for cross-posting...

        I'm in the process of upgrading a PG from 9.2 to 10.4.
        pg_upgrade worked fine on the master and was rather fast. The
        problem is that the database is replicated and I'm planning to
        switch from streaming to logical. The problem is that it is
        rather slow (30 minutes for the master and over 3 hours for the
        replication, between data transfer and indexes).


    I am not clear on what you did, so can you clarify the following:

    1) pg_upgrade from 9.2 master instance to 10.4 master instance, correct?

    2) What replication are you talking about for the 3 hour value?

    3) What is the 30 minute value referring to?

    4) When you say database are you talking about a Postgres cluster or
    a database in the cluster?

        Is there a way to speed up the replication or should I rather
        stick to streaming replication? As I have only 1 database on the
        server, it would not be a show-stopper.

    See 4) above, but if you are talking about a single database in a
    cluster streaming replication will not work for that.

        Thanks in advance
        Olivier Gautherot
        http://www.linkedin.com/in/ogautherot
        <http://www.linkedin.com/in/ogautherot>

    --     Adrian Klaver
    adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>




--
Adrian Klaver
adrian.klaver@aklaver.com

Olivier

pgsql-general by date:

Previous
From: Jayadevan M
Date:
Subject: Re: PostgreSQL backup issue
Next
From: Alexander Farber
Date:
Subject: UPDATE from CTE syntax error