Re: Logical replication timeout problem - Mailing list pgsql-hackers
From | Fabrice Chapuis |
---|---|
Subject | Re: Logical replication timeout problem |
Date | |
Msg-id | CAA5-nLB2K0YTaYtd-qUwy+QTUA0Q0fZt9_R=mFmb9pOVObgBBA@mail.gmail.com Whole thread Raw |
In response to | Re: Logical replication timeout problem (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Logical replication timeout problem
|
List | pgsql-hackers |
I put the instance with high level debug mode.
I try to do some log interpretation: After having finished writing the modifications generated by the insert in the snap files,
then these files are read (restored). One minute after this work starts, the worker process exit with an error code = 1.
I see that keepalive messages were sent before the work process work leave.
I try to do some log interpretation: After having finished writing the modifications generated by the insert in the snap files,
then these files are read (restored). One minute after this work starts, the worker process exit with an error code = 1.
I see that keepalive messages were sent before the work process work leave.
2021-12-28 10:50:01.894 CET [55792] LOCATION: WalSndKeepalive, walsender.c:3365
...
2021-12-28 10:50:31.854 CET [55792] STATEMENT: START_REPLICATION SLOT "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names '"pub008_s012a00"')2021-12-28 10:50:31.907 CET [55792] DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
2021-12-28 10:50:31.907 CET [55792] LOCATION: ShowTransactionStateRec, xact.c:5075
2021-12-28 10:50:31.907 CET [55792] STATEMENT: START_REPLICATION SLOT "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names '"pub008_s012a00"')
2021-12-28 10:50:31.907 CET [55792] DEBUG: 00000: spill 2271 changes in XID 14312 to disk
2021-12-28 10:50:31.907 CET [55792] LOCATION: ReorderBufferSerializeTXN, reorderbuffer.c:2245
2021-12-28 10:50:31.907 CET [55792] STATEMENT: START_REPLICATION SLOT "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names '"pub008_s012a00"')
2021-12-28 10:50:32.110 CET [55792] DEBUG: 00000: restored 4096/22603999 changes from disk
2021-12-28 10:50:32.110 CET [55792] LOCATION: ReorderBufferIterTXNNext, reorderbuffer.c:1156
2021-12-28 10:50:32.110 CET [55792] STATEMENT: START_REPLICATION SLOT "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names '"pub008_s012a00"')
2021-12-28 10:50:32.138 CET [55792] DEBUG: 00000: restored 4096/22603999 changes from disk
...
2021-12-28 10:50:35.341 CET [55794] DEBUG: 00000: sending replication keepalive
2021-12-28 10:50:35.341 CET [55794] LOCATION: WalSndKeepalive, walsender.c:3365
2021-12-28 10:50:35.341 CET [55794] LOCATION: WalSndKeepalive, walsender.c:3365
...
2021-12-28 10:51:31.995 CET [55791] ERROR: XX000: terminating logical replication worker due to timeout
2021-12-28 10:51:31.995 CET [55791] LOCATION: LogicalRepApplyLoop, worker.c:1267
Could this function in Apply main loop in worker.c help to find a solution?
rc = WaitLatchOrSocket(MyLatch, | |
WL_SOCKET_READABLE | WL_LATCH_SET | | |
WL_TIMEOUT | WL_POSTMASTER_DEATH, | |
fd, wait_time, | |
WAIT_EVENT_LOGICAL_APPLY_MAIN); |
Thanks for your help
Fabrice
On Thu, Dec 23, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 8:50 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Hello Amit,
>
> I was able to reproduce the timeout problem in the lab.
> After loading more than 20 millions of rows in a table which is not replicated (insert command ends without error), errors related to logical replication processes appear in the postgres log.
> Approximately every 5 minutes worker process is restarted. The snap files in the slot directory are still present. The replication system seems to be blocked. Why these snap files are not removed. What do they contain?
>
These contain changes of insert. I think these are not removed for
your case as your long transaction is never finished. As mentioned
earlier, for such cases, it is better to use 'streaming' feature
released as part of PG-14 but anyway here we are trying to debug
timeout problem.
> I will recompile postgres with your patch to debug.
>
Okay, that might help.
--
With Regards,
Amit Kapila.
pgsql-hackers by date: