On 22 June 2017 at 09:07, Andres Freund <andres@anarazel.de> wrote:
> On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
>> On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote:
>>
>> > I.e. we're doing tiny write send() syscalls (they should be coalesced)
>>
>> That's likely worth doing, but can probably wait for a separate patch.
>
> I don't think so, we should get this right, it could have API influence.
>
>
>> The kernel will usually do some packet aggregation unless we use
>> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
>> is IMO not worth worrying about just yet.
>
> 1)
> /*
> * Select socket options: no delay of outgoing data for
> * TCP sockets, nonblock mode, close-on-exec. Fail if any
> * of this fails.
> */
> if (!IS_AF_UNIX(addr_cur->ai_family))
> {
> if (!connectNoDelay(conn))
> {
> pqDropConnection(conn, true);
> conn->addr_cur = addr_cur->ai_next;
> continue;
> }
> }
>
> 2) Even if nodelay weren't set, this can still lead to smaller packets
> being sent, because you start sending normal sized tcp packets,
> rather than jumbo ones, even if configured (pretty common these
> days).
>
> 3) Syscall overhead is actually quite significant.
Fair enough, and *headdesk* re not checking NODELAY. I thought I'd
checked for our use of that before, but I must've remembered wrong.
We could use TCP_CORK but it's not portable and it'd be better to just
collect up a buffer to dispatch.
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services