Re: Logical replication keepalive flood - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Logical replication keepalive flood |
Date | |
Msg-id | CAJcOf-fdJizi7gwvpPa+rZzOuFaHkYpKWC2GyipJxHOwNCjrOw@mail.gmail.com Whole thread Raw |
In response to | Re: Logical replication keepalive flood (Peter Smith <smithpb2250@gmail.com>) |
Responses |
RE: Logical replication keepalive flood
|
List | pgsql-hackers |
On Thu, Aug 12, 2021 at 12:33 PM Peter Smith <smithpb2250@gmail.com> wrote: > > This data shows the special keepalives are now greatly reduced from > 1000s to just 10s. > > Thoughts? > I could easily see the flood of keepalives with the test setup described by the original problem reporter (Abbas Butt). I found that the "v1-0002-WIP-skip-the-keepalive-on-FIRST-loop-iteration.patch" patch reduced the keepalives by about 50% in this case. I also tried the pub/sub setup with the publication on the pgbench_history table. With this pub/sub setup, I found that the patch dramatically reduced the keepalives sent, similar to that reported by Peter. Results (using Kyotoro’s keepalive counting patch) are below: PUB/SUB, publishing the pgbench_history table (1) without patch, 10s pgbench run: 2021-09-08 15:21:56.643 AEST [63720] LOG: Total records: 47019 2021-09-08 15:21:56.643 AEST [63720] LOG: 8: 8 / 16 / 8: 8571 / 882048 2021-09-08 15:21:56.643 AEST [63720] LOG: 16: 5 / 10 / 5: 3649 / 764892 2021-09-08 15:21:56.643 AEST [63720] LOG: 24: 6271 / 12561 / 6331: 113989 / 886115 2021-09-08 15:21:56.643 AEST [63720] LOG: 195: 2 / 0 / 112: 72 / 10945 2021-09-08 15:21:56.643 AEST [63720] LOG: 6856: 1 / 0 / 1: 666232176 / 666232176 2021-09-08 15:21:56.643 AEST [63720] LOG: 7477: 2 / 0 / 298: 27 / 3303 2021-09-08 15:21:56.643 AEST [63720] LOG: 8159: 19 / 32 / 6073: 15 / 1869 (2) with patch, 10s pgbench run 2021-09-08 15:39:14.008 AEST [71431] LOG: Total records: 45858 2021-09-08 15:39:14.008 AEST [71431] LOG: 24: 61 / 18278 / 6168: 108034 / 115228 2021-09-08 15:39:14.008 AEST [71431] LOG: 84: 1 / 1 / 7: 2256 / 295230 2021-09-08 15:39:14.008 AEST [71431] LOG: 110: 1 / 1 / 3: 10629 / 708293 2021-09-08 15:39:14.008 AEST [71431] LOG: 7477: 18 / 18 / 4577: 53 / 7850 Where columns are: size: sent /notsent/ calls: write lag/ flush lag - size is requested data length to logical_read_xlog_page() - sent is the number of keepalives sent in the loop in WalSndWaitForWal - notsent is the number of runs of the loop in WalSndWaitForWal without sending a keepalive - calls is the number of calls to WalSndWaitForWal - write lag is the bytes MyWalSnd->write is behind from sentPtr at the first run of the loop per call to logical_read_xlog_page. - flush lag is the same as the above, but for MyWalSnd->flush. However, the problem I found is that, with the patch applied, there is a test failure when running “make check-world”: t/006_logical_decoding.pl ............ 4/14 # Failed test 'pg_recvlogical acknowledged changes' # at t/006_logical_decoding.pl line 117. # got: 'BEGIN # table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'' # expected: '' # Looks like you failed 1 test of 14. t/006_logical_decoding.pl ............ Dubious, test returned 1 (wstat 256, 0x100) Failed 1/14 subtests To investigate this, I added some additional logging to pg_recvlogical.c and PostgresNode.pm and re-ran 006_logical_decoding.pl without and with the patch (logs attached). When the patch is NOT applied, and when pg_recvlogical is invoked by the test for a 2nd time with the same "--endpos" LSN, it gets a keepalive, detects walEnd>=endpos, and thus returns an empty record. The test is expecting an empty record, so all is OK. When the patch is applied, and when pg_recvlogical is invoked by the test for a 2nd time with the same "--endpos" LSN, it gets a WAL record with THE SAME LSN (== endpos) as previously obtained by the last WAL record when it was invoked the 1st time, but the record data is actually the first row of some records written after endpos, that it wasn't meant to read. This doesn't seem right to me - how can pg_recvlogical receive two different WAL records with the same LSN? With the patch applied, I was expecting pg_reclogical to get WAL records with LSN>endpos, but this doesn't happen. I'm thinking that the patch must have broken walsender in some way, possibly by missing out on calls to ProcessStandbyReplyMessage() because the sending of some keepalives are avoided (see below), so that the MyWalSnd flush and write location are not kept up-to-date. The code comments below seem to hint about this. I don't really like the way keepalives are used for this, but this seems to be the existing functionality. Maybe someone else can confirm that this could indeed break walsender? walsender.c WalSndWaitForWal() /* * We only send regular messages to the client for full decoded * transactions, but a synchronous replication and walsender shutdown * possibly are waiting for a later location. So, before sleeping, we * send a ping containing the flush location. If the receiver is * otherwise idle, this keepalive will trigger a reply. Processing the * reply will update these MyWalSnd locations. */ if (!loop_first_time && /* avoid keepalive on first iteration */ <--- added by the patch MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && !waiting_for_ping_response) { WalSndKeepalive(false); Regards, Greg Nancarrow Fujitsu Australia
Attachment
pgsql-hackers by date: