Thread: Logical replication very slow

Logical replication very slow

From
Boris Sagadin
Date:
Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.

Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.

Is there a setting I am missing here? Any ideas appreciated.

Boris



Re: Logical replication very slow

From
Achilleas Mantzios
Date:
On 25/2/19 8:52 π.μ., Boris Sagadin wrote:
Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.

Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?
Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.

Is there a setting I am missing here? Any ideas appreciated.

Boris





-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: Logical replication very slow

From
Boris Sagadin
Date:

Doing an initial replica.

postgres 119454 93.5 25.9 34613692 32649656 ?   Rs   07:16  32:45  \_ postgres: 10/main: bgworker: logical replication worker for subscription 24783 sync 16500

I've cancelled the sync, set the tables to unlogged type and started it again. I think it helped, still much slower than binary, but better, about 40MB/s. Will set them back to logged type after initial replica is done.

After the initial replica, there aren't that many updates, so it's OK then. But if a need for a new slave arises, waiting a few days for initial replica to finish, which in binary replication case is just hours, can be a big problem for us.

Boris


On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 25/2/19 8:52 π.μ., Boris Sagadin wrote:
Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.

Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?
Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.

Is there a setting I am missing here? Any ideas appreciated.

Boris





-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: Logical replication very slow

From
Achilleas Mantzios
Date:
On 25/2/19 9:59 π.μ., Boris Sagadin wrote:

Doing an initial replica.

postgres 119454 93.5 25.9 34613692 32649656 ?   Rs   07:16  32:45  \_ postgres: 10/main: bgworker: logical replication worker for subscription 24783 sync 16500

I've cancelled the sync, set the tables to unlogged type and started it again. I think it helped, still much slower than binary, but better, about 40MB/s. Will set them back to logged type after initial replica is done.
Will this work at all? Doesn't unlogged mean no changes are written to the WAL? What if you just copy by dump and then just add and refresh without copy_data ?

After the initial replica, there aren't that many updates, so it's OK then. But if a need for a new slave arises, waiting a few days for initial replica to finish, which in binary replication case is just hours, can be a big problem for us.

Boris


On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 25/2/19 8:52 π.μ., Boris Sagadin wrote:
Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.

Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?
Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.

Is there a setting I am missing here? Any ideas appreciated.

Boris





-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: Logical replication very slow

From
Boris Sagadin
Date:
I think it should. I set it to unlogged on target/slave server only. One other table which is much smaller and already replicated receives changes from master.

About settings copy_data to false, nice idea, I'll try that too and compare speed.



On Mon, Feb 25, 2019 at 9:51 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 25/2/19 9:59 π.μ., Boris Sagadin wrote:

Doing an initial replica.

postgres 119454 93.5 25.9 34613692 32649656 ?   Rs   07:16  32:45  \_ postgres: 10/main: bgworker: logical replication worker for subscription 24783 sync 16500

I've cancelled the sync, set the tables to unlogged type and started it again. I think it helped, still much slower than binary, but better, about 40MB/s. Will set them back to logged type after initial replica is done.
Will this work at all? Doesn't unlogged mean no changes are written to the WAL? What if you just copy by dump and then just add and refresh without copy_data ?

After the initial replica, there aren't that many updates, so it's OK then. But if a need for a new slave arises, waiting a few days for initial replica to finish, which in binary replication case is just hours, can be a big problem for us.

Boris


On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 25/2/19 8:52 π.μ., Boris Sagadin wrote:
Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.

Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?
Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.

Is there a setting I am missing here? Any ideas appreciated.

Boris





-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: Logical replication very slow

From
Achilleas Mantzios
Date:
On 25/2/19 2:06 μ.μ., Boris Sagadin wrote:
I think it should. I set it to unlogged on target/slave server only. One other table which is much smaller and already replicated receives changes from master.
Ah, ok then.

About settings copy_data to false, nice idea, I'll try that too and compare speed.


Please note that ppl here tend to not encourage top posting. Generally It's ok for business emails elsewhere, but here straight quoting is better and more accurate.



On Mon, Feb 25, 2019 at 9:51 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 25/2/19 9:59 π.μ., Boris Sagadin wrote:

Doing an initial replica.

postgres 119454 93.5 25.9 34613692 32649656 ?   Rs   07:16  32:45  \_ postgres: 10/main: bgworker: logical replication worker for subscription 24783 sync 16500

I've cancelled the sync, set the tables to unlogged type and started it again. I think it helped, still much slower than binary, but better, about 40MB/s. Will set them back to logged type after initial replica is done.
Will this work at all? Doesn't unlogged mean no changes are written to the WAL? What if you just copy by dump and then just add and refresh without copy_data ?

After the initial replica, there aren't that many updates, so it's OK then. But if a need for a new slave arises, waiting a few days for initial replica to finish, which in binary replication case is just hours, can be a big problem for us.

Boris


On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 25/2/19 8:52 π.μ., Boris Sagadin wrote:
Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.

Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?
Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.

Is there a setting I am missing here? Any ideas appreciated.

Boris





-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt