Re: WaitLatchOrSocket optimization - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: WaitLatchOrSocket optimization
Date
Msg-id c1517e80-bb8f-4d86-cc71-35984a1593f4@postgrespro.ru
Whole thread Raw
In response to Re: WaitLatchOrSocket optimization  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 15.03.2018 20:25, Andres Freund wrote:
> Hi,
>
> On 2018-03-15 19:01:40 +0300, Konstantin Knizhnik wrote:
>> Right now function WaitLatchOrSocket is implemented in very inefficient way:
>> for each invocation it creates epoll instance, registers events and then
>> closes this instance.
> Right, everything performance critical should be migrated to using a
> persistent wait even set.
>
>
>> But there are still lot of places in Postgres where WaitLatchOrSocket or
>> WaitLatch are used.
> Most don't matter performancewise though.
Yes, I didn't see significant difference in performance on workloads not 
including postgres_fdw.
But greeping for all occurrences of WaitLatch I found some places which 
are used to be called quite frequently, for example
waiting for latch in ProcSleep , waiting for more data in shared memory 
message queue, waiting for parallel workers to attach,
waiting for synchronous replica,...
There are about hundred of usages of WaitLatch and I am not sure that 
all of them do not have impact on performance.

>> There are two possible ways of fixing this issue:
>> 1. Patch postgres_fdw to store WaitEventSet in connection.
>> 2. Patch WaitLatchOrSocket to cache created wait event sets.
> I'm strongly opposed to 2).  The lifetime of sockets will make this an
> absolute mess, it'll potentially explode the number of open file
> handles, and some OSs don't handle a large number of sets at the same
> time.
I understand your concern.
But you are not completely right here.
First of all life time of socket doesn't actually matter: if socket is 
closed, then it is automatically removed from event set.
Cite from epoll manual:

       Q6  Will closing a file descriptor cause it to be removed from 
all epoll sets automatically?

        A6  Yes,  but  be  aware  of the following point.  A file 
descriptor is a reference to an open file description (see open(2)).  
Whenever a file
            descriptor is duplicated via dup(2), dup2(2), fcntl(2) 
F_DUPFD, or fork(2), a new file descriptor referring to the same open 
file  descrip-
            tion is created.  An open file description continues to 
exist until all file descriptors referring to it have been closed. A 
file descrip-
            tor is removed from an epoll set only after all the file 
descriptors referring to the underlying open file description have been 
closed (or
            before  if  the file descriptor is explicitly removed using 
epoll_ctl(2) EPOLL_CTL_DEL).  This means that even after a file 
descriptor that
            is part of an epoll set has been closed, events may be 
reported for that file descriptor if other file descriptors referring  
to  the  same
            underlying file description remain open.

So, only file descriptors created by epoll_create affect number of 
descriptors opened by a process.
But this number is limited by size of the cache. Right now I hardcoded 
cache size 113, but in most cases even much smaller cache will be enough.
I do not see that extra let's say 13 open file descriptors can cause 
open file descriptor limit exhaustion.

 From my point of view we should either rewrite WaitLatchOrSocket to not 
use epoll at all (use poll instead), either cache created event set 
descriptors.
Result of pgbench with poll instead epoll is 140k TPS, which is 
certainly much better than 38k TPS with current master, but much worser 
than with cached epoll - 225k TPS.


> Greetings,
>
> Andres Freund

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: missing support of named convention for procedures
Next
From: Pavel Stehule
Date:
Subject: Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs