BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send
Date
Msg-id 17005-3e1030784d5440c4@postgresql.org
Whole thread Raw
Responses Re: BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17005
Logged by:          Rony Kurniawan
Email address:      rony.kurniawan@oracle.com
PostgreSQL version: 11.7
Operating system:   Oracle Linux Server release 7.9
Description:

Hi,

I measured the throughput of reading the logical replication slot and found
that in smaller row size (512 bytes) the throughput is 50% lower compared to
1024 bytes.

tcpdump shows that ethernet packets sent by the replication server contain
only one message per packet (see tcpdump output below).
May be this is the intended design to achieve low latency but this is not
favorable in application that requires high throughput.

Is it possible for PostgreSQL to enable Nagle's algorithm on the streaming
socket for replication?
Or aggegate the messages manually before sending them in one send()?

Thank you,
Rony

test case:
client and server are on different machines or run the server in a docker.
create table public.test (id integer generated always as identity, name
varchar(512));
alter table public.test replica identity full;
select * from pg_create_logical_replication_slot('testslot',
'test_decoding');
insert into public.test (name) values (rpad('a', 512, 'a'));
...
insert into public.test (name) values (rpad('a', 512, 'a'));

I used pgbench to insert million of records to the test table to measure the
throughput, but one insert is enough to show how the server send the
message.

client terminal 1:
$ sudo tcpdump -D
1.enp0s3
2.virbr0
3.docker0

$ sudo tcpdump -i 3 -w psql.pcap "tcp port 5432"

client terminal 2:
$ pg_recvlogical --start --slot=testslot -d postgres -h 172.17.0.2 -U
postgres -f -

client terminal 1:
$ sudo tcpdump -i 3 -w psql.pcap "tcp port 5432"
ctrl-c
37 packets captured
37 packets received by filter
0 packets dropped by kernel

$ tcpdump --number -nn -A -r psql.pcap
...
   22  16:38:37.217677 IP 172.17.0.1.56140 > 172.17.0.2.5432:
...START_REPLICATION SLOT "testslot" LOGICAL 0/0.
...
   28  16:38:37.218209 IP 172.17.0.2.5432 > 172.17.0.1.56140: ...BEGIN
1888650
...
   30  16:38:37.218332 IP 172.17.0.2.5432 > 172.17.0.1.56140: ...table
public.test: INSERT: id[integer]: 1 name[character
varying]:'aaa...512...aaa'
   31  16:38:37.218345 IP 172.17.0.2.5432 > 172.17.0.1.56140: ...COMMIT
1888650


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #15990: PROCEDURE throws "SQL Error [XX000]: ERROR: no known snapshots" with PostGIS geometries
Next
From: PG Bug reporting form
Date:
Subject: BUG #17006: Process watcher window doesnt appear