Thread: Logical replication very slow
Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.
Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.
Is there a setting I am missing here? Any ideas appreciated.
Boris
On 25/2/19 8:52 π.μ., Boris Sagadin wrote:
Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.
Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.Is there a setting I am missing here? Any ideas appreciated.Boris
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Doing an initial replica.
postgres 119454 93.5 25.9 34613692 32649656 ? Rs 07:16 32:45 \_ postgres: 10/main: bgworker: logical replication worker for subscription 24783 sync 16500
I've cancelled the sync, set the tables to unlogged type and started it again. I think it helped, still much slower than binary, but better, about 40MB/s. Will set them back to logged type after initial replica is done.
After the initial replica, there aren't that many updates, so it's OK then. But if a need for a new slave arises, waiting a few days for initial replica to finish, which in binary replication case is just hours, can be a big problem for us.
Boris |
On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 25/2/19 8:52 π.μ., Boris Sagadin wrote:Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.Is there a setting I am missing here? Any ideas appreciated.Boris
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
On 25/2/19 9:59 π.μ., Boris Sagadin wrote:
Will this work at all? Doesn't unlogged mean no changes are written to the WAL? What if you just copy by dump and then just add and refresh withoutDoing an initial replica.postgres 119454 93.5 25.9 34613692 32649656 ? Rs 07:16 32:45 \_ postgres: 10/main: bgworker: logical replication worker for subscription 24783 sync 16500I've cancelled the sync, set the tables to unlogged type and started it again. I think it helped, still much slower than binary, but better, about 40MB/s. Will set them back to logged type after initial replica is done.
copy_data ?
After the initial replica, there aren't that many updates, so it's OK then. But if a need for a new slave arises, waiting a few days for initial replica to finish, which in binary replication case is just hours, can be a big problem for us.
Boris On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:On 25/2/19 8:52 π.μ., Boris Sagadin wrote:Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.Is there a setting I am missing here? Any ideas appreciated.Boris
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
I think it should. I set it to unlogged on target/slave server only. One other table which is much smaller and already replicated receives changes from master.
About settings copy_data to false, nice idea, I'll try that too and compare speed.
On Mon, Feb 25, 2019 at 9:51 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 25/2/19 9:59 π.μ., Boris Sagadin wrote:Will this work at all? Doesn't unlogged mean no changes are written to the WAL? What if you just copy by dump and then just add and refresh withoutDoing an initial replica.postgres 119454 93.5 25.9 34613692 32649656 ? Rs 07:16 32:45 \_ postgres: 10/main: bgworker: logical replication worker for subscription 24783 sync 16500I've cancelled the sync, set the tables to unlogged type and started it again. I think it helped, still much slower than binary, but better, about 40MB/s. Will set them back to logged type after initial replica is done.copy_data ?
After the initial replica, there aren't that many updates, so it's OK then. But if a need for a new slave arises, waiting a few days for initial replica to finish, which in binary replication case is just hours, can be a big problem for us.
Boris On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:On 25/2/19 8:52 π.μ., Boris Sagadin wrote:Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.Is there a setting I am missing here? Any ideas appreciated.Boris
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
On 25/2/19 2:06 μ.μ., Boris Sagadin wrote:
Ah, ok then.I think it should. I set it to unlogged on target/slave server only. One other table which is much smaller and already replicated receives changes from master.
About settings copy_data to false, nice idea, I'll try that too and compare speed.
Please note that ppl here tend to not encourage top posting. Generally It's ok for business emails elsewhere, but here straight quoting is better and more accurate.
On Mon, Feb 25, 2019 at 9:51 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:On 25/2/19 9:59 π.μ., Boris Sagadin wrote:Will this work at all? Doesn't unlogged mean no changes are written to the WAL? What if you just copy by dump and then just add and refresh withoutDoing an initial replica.postgres 119454 93.5 25.9 34613692 32649656 ? Rs 07:16 32:45 \_ postgres: 10/main: bgworker: logical replication worker for subscription 24783 sync 16500I've cancelled the sync, set the tables to unlogged type and started it again. I think it helped, still much slower than binary, but better, about 40MB/s. Will set them back to logged type after initial replica is done.copy_data ?
After the initial replica, there aren't that many updates, so it's OK then. But if a need for a new slave arises, waiting a few days for initial replica to finish, which in binary replication case is just hours, can be a big problem for us.
Boris On Mon, Feb 25, 2019 at 8:08 AM Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:On 25/2/19 8:52 π.μ., Boris Sagadin wrote:Is logical replication subscriber in "streaming" state or in initial snapshot? What's the behavior after the initial snapshot, when it gets into streaming state?Doing an initial replica and trying to find a bottleneck, Ubuntu 16.04, NVMe disks, PgSQL v10.7, AWS. With binary replication, DB is replicated at good speed, around 500MB/s. Trying LR now for a big table (about 1.4TB with 2 indexes) and the speed is only about 2MB/s.Checked disk util with iostat and only about 20% utilized on master, 15% on target, CPU load on master is low. On slave I can see the "logical replication worker" process is taking about 70% CPU time on a single core, machine has 16 cores.Is there a setting I am missing here? Any ideas appreciated.Boris
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt