Thread: Logical Replication Issue

Logical Replication Issue

From
Donzell White
Date:
Hi,
I have set up logical replication from an AWS RDS instance to a postgreSQL database on an EC2 instance. I am getting an “out of memory” error during the initial for one of the tables. I have include a snippet of the error at the bottom of this email. The snippet below repeats every four or eight hours, depending on the work_mem and maintenance_work_mem values. The other 500 plus tables load with no problem. 

Does anyone have experience with this issue. Suggestions on what to try would be greatly appreciated.

Both databases are running PostgreSQL 11.5.
The database is 5 TB in size and has over 500 tables.  The "out of memory" error every eight hours.  The table is 578GB in the RDS database and ate up over 9TB before it ran out of space on the EC2 instance.  The table has a bytea column but there is no data in that column. It also has a text column and the largest text length is 978MB.  

The odd thing is that this is not the largest table in the database. The largest table took 57 hours to initial load and it does have bytea data.  The largest table is about 4 times larger from a size perspective.  I have increases the work_mem and the maintenance_work_mem parameters and tried loading only the problem table.  This resulted in the “out of memory” error occurring every 4 hours instead of every eight hours. Note that the source database is static (no changes).

020-06-17 07:52:28.618 UTC [8410] LOCATION:  LogCheckpointStart, xlog.c:8508
2020-06-17 07:52:52.505 UTC [8410] LOG:  00000: checkpoint complete: wrote 69088 buffers (1.6%); 0 WAL file(s) added, 0 removed, 42 recycled; write=23.548 s, sync=0.271 s, total=23.886 s; sync files=14, longest=0.165 s, average=0.019 s; distance=688856 kB, estimate=701162 kB
2020-06-17 07:52:52.505 UTC [8410] LOCATION:  LogCheckpointEnd, xlog.c:8590
2020-06-17 07:53:15.960 UTC [8410] LOG:  00000: checkpoint starting: xlog
2020-06-17 07:53:15.960 UTC [8410] LOCATION:  LogCheckpointStart, xlog.c:8508
2020-06-17 07:53:23.933 UTC [14390] ERROR:  XX000: could not receive data from WAL stream: ERROR:  out of memory        DETAIL:  Cannot enlarge string buffer containing 1073741802 bytes by 28 more bytes.
2020-06-17 07:53:23.933 UTC [14390] CONTEXT:  COPY product, line 15568244
2020-06-17 07:53:23.933 UTC [14390] LOCATION:  libpqrcv_receive, libpqwalreceiver.c:772
2020-06-17 07:53:24.160 UTC [19873] LOG:  00000: logical replication table synchronization worker for subscription "subscription_test_tables", table “product" has started
2020-06-17 07:53:24.160 UTC [19873] LOCATION:  ApplyWorkerMain, worker.c:1662

Re: Logical Replication Issue

From
Michael Lewis
Date:
Per the release notes, there are some enhancements to logical replication that came after 11.5 like 11.8 particularly related to replication identity full. Do you have a primary key or unique index that is being used for the replication identity?

Re: Logical Replication Issue

From
Donzell White
Date:
My table has a primary key.  In addition, this is an initial load issue where the replication identity should not matter.


-----Original Message-----
From: Michael Lewis <mlewis@entrata.com>
To: Donzell White <whitedv@verizon.net>
Cc: pgsql-general <pgsql-general@postgresql.org>
Sent: Tue, Jun 30, 2020 1:05 pm
Subject: Re: Logical Replication Issue

Per the release notes, there are some enhancements to logical replication that came after 11.5 like 11.8 particularly related to replication identity full. Do you have a primary key or unique index that is being used for the replication identity?