Home > mailing lists

Logical replication keepalive flood - Mailing list pgsql-hackers

From	Abbas Butt
Subject	Logical replication keepalive flood
Date	June 5, 2021 11:08:00
Msg-id	CALtH27cip5uQNJb4uHjLXtx1R52ELqXVfcP9fhHr=AvFo1dtqw@mail.gmail.com Whole thread Raw
Responses	Re: Logical replication keepalive flood
List	pgsql-hackers

Tree view

Hi,

I have observed the following behavior with PostgreSQL 13.3.

The WAL sender process sends approximately 500 keepalive messages per second to pg_recvlogical.

These keepalive messages are totally un-necessary.

Keepalives should be sent only if there is no network traffic and a certain time (half of wal_sender_timeout) passes.

These keepalive messages not only choke the network but also impact the performance of the receiver,

because the receiver has to process the received message and then decide whether to reply to it or not.

The receiver remains busy doing this activity 500 times a second.

On investigation it is revealed that the following code fragment in function WalSndWaitForWal in file walsender.c is responsible for sending these frequent keepalives:

if (MyWalSnd->flush < sentPtr &&
MyWalSnd->write < sentPtr &&
!waiting_for_ping_response)
WalSndKeepalive(false);

waiting_for_ping_response is normally false, and flush and write will always be less than sentPtr (Receiver's LSNs cannot advance server's LSNs)

Here are the steps to reproduce:

1. Start the database server.

2. Setup pgbench tables.

./pgbench -i -s 50 -h 192.168.5.140 -p 7654 -U abbas postgres

3. Create a logical replication slot.

SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding');

4. Start pg_recvlogical.

./pg_recvlogical --slot=my_slot --verbose -d postgres -h 192.168.5.140 -p 7654 -U abbas --start -f -

5. Run pgbench

./pgbench -U abbas -h 192.168.5.140 -p 7654 -c 2 -j 2 -T 1200 -n postgres

6. Observer network traffic to find the keepalive flood.

Alternately modify the above code fragment to see approx 500 keepalive log messages a second

if (MyWalSnd->flush < sentPtr &&
MyWalSnd->write < sentPtr &&
!waiting_for_ping_response)
{
elog(LOG, "[Keepalive] wrt ptr %X/%X snt ptr %X/%X ",
(uint32) (MyWalSnd->write >> 32),
(uint32) MyWalSnd->write,
(uint32) (sentPtr >> 32),
(uint32) sentPtr);
WalSndKeepalive(false);
}

Opinions?

--
Abbas

Senior Architect

Ph: 92.334.5100153

Skype ID: gabbasb

edbpostgres.com

Follow us on Twitter

@EnterpriseDB

pgsql-hackers by date:

From: Julien Rouhaud
Date: 05 June 2021, 07:36:40
Subject: Re: Are we missing (void) when return value of fsm_set_and_search is ignored?

From: David Christensen
Date: 05 June 2021, 12:21:11
Subject: Re: DELETE CASCADE

Logical replication keepalive flood - Mailing list pgsql-hackers

Previous

Next