Logical Replication Issue - Mailing list pgsql-general

From Donzell White
Subject Logical Replication Issue
Date
Msg-id 6076B24C-E751-4055-84E3-A067F55974AA@verizon.net
Whole thread Raw
Responses Re: Logical Replication Issue  (Michael Lewis <mlewis@entrata.com>)
List pgsql-general
Hi,
I have set up logical replication from an AWS RDS instance to a postgreSQL database on an EC2 instance. I am getting an “out of memory” error during the initial for one of the tables. I have include a snippet of the error at the bottom of this email. The snippet below repeats every four or eight hours, depending on the work_mem and maintenance_work_mem values. The other 500 plus tables load with no problem. 

Does anyone have experience with this issue. Suggestions on what to try would be greatly appreciated.

Both databases are running PostgreSQL 11.5.
The database is 5 TB in size and has over 500 tables.  The "out of memory" error every eight hours.  The table is 578GB in the RDS database and ate up over 9TB before it ran out of space on the EC2 instance.  The table has a bytea column but there is no data in that column. It also has a text column and the largest text length is 978MB.  

The odd thing is that this is not the largest table in the database. The largest table took 57 hours to initial load and it does have bytea data.  The largest table is about 4 times larger from a size perspective.  I have increases the work_mem and the maintenance_work_mem parameters and tried loading only the problem table.  This resulted in the “out of memory” error occurring every 4 hours instead of every eight hours. Note that the source database is static (no changes).

020-06-17 07:52:28.618 UTC [8410] LOCATION:  LogCheckpointStart, xlog.c:8508
2020-06-17 07:52:52.505 UTC [8410] LOG:  00000: checkpoint complete: wrote 69088 buffers (1.6%); 0 WAL file(s) added, 0 removed, 42 recycled; write=23.548 s, sync=0.271 s, total=23.886 s; sync files=14, longest=0.165 s, average=0.019 s; distance=688856 kB, estimate=701162 kB
2020-06-17 07:52:52.505 UTC [8410] LOCATION:  LogCheckpointEnd, xlog.c:8590
2020-06-17 07:53:15.960 UTC [8410] LOG:  00000: checkpoint starting: xlog
2020-06-17 07:53:15.960 UTC [8410] LOCATION:  LogCheckpointStart, xlog.c:8508
2020-06-17 07:53:23.933 UTC [14390] ERROR:  XX000: could not receive data from WAL stream: ERROR:  out of memory        DETAIL:  Cannot enlarge string buffer containing 1073741802 bytes by 28 more bytes.
2020-06-17 07:53:23.933 UTC [14390] CONTEXT:  COPY product, line 15568244
2020-06-17 07:53:23.933 UTC [14390] LOCATION:  libpqrcv_receive, libpqwalreceiver.c:772
2020-06-17 07:53:24.160 UTC [19873] LOG:  00000: logical replication table synchronization worker for subscription "subscription_test_tables", table “product" has started
2020-06-17 07:53:24.160 UTC [19873] LOCATION:  ApplyWorkerMain, worker.c:1662

pgsql-general by date:

Previous
From: Marcin Barczynski
Date:
Subject: Is there a way to execute queries in plpython without increasing txid after each query?
Next
From: Michael Lewis
Date:
Subject: Re: Logical Replication Issue