Re: proposal: LISTEN * - Mailing list pgsql-hackers

From Tom Lane
Subject Re: proposal: LISTEN *
Date
Msg-id 4583.1447958731@sss.pgh.pa.us
Whole thread Raw
In response to Re: proposal: LISTEN *  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Marko Tiikkaja wrote:
>> On the test server I'm running on, it doesn't look quite as bad as the
>> profiles we had in production, but s_lock is still the worst offender in the
>> profiles, called from:
>> 
>> - 80.33% LWLockAcquire
>> + 48.34% asyncQueueReadAllNotifications
>> + 23.09% SIGetDataEntries
>> + 16.92% SimpleLruReadPage_ReadOnly
>> + 10.21% TransactionIdIsInProgress
>> + 1.27% asyncQueueAdvanceTail
>> 
>> which roughly looks like what I recall from our actual production profiles.

> So the problem is in the bad scalability of LWLock, not in async.c itself?
> In master, the spinlock there has been replaced with atomics; does that branch
> work better?

FWIW, I can't get results anything like this.  With 50 notifiers and 50
listeners, I only see LWLockAcquire eating about 6% of the system-wide
runtime:

-    6.23%     0.64%         11850       postmaster  postgres                          [.] LWLockAcquire  -
LWLockAcquire    + 33.16% asyncQueueReadAllNotifications     + 20.76% SimpleLruReadPage_ReadOnly     + 18.02%
TransactionIdIsInProgress    + 8.67% VirtualXactLockTableInsert     + 3.98% ProcArrayEndTransaction     + 3.03%
VirtualXactLockTableCleanup    + 1.77% PreCommit_Notify     + 1.68% ProcessCompletedNotifies     + 1.62%
asyncQueueAdvanceTail    + 1.28% ProcessNotifyInterrupt     + 1.04% StartTransaction     + 1.00% LockAcquireExtended
+ 0.96% ProcSleep     + 0.81% LockReleaseAll     + 0.69% TransactionIdSetPageStatus     + 0.54% GetNewTransactionId
 

There isn't any really sore-thumb place that we might fix.  Interestingly,
a big chunk of the runtime is getting eaten by the kernel, breaking down
more or less like this:

-   36.52%     0.05%           836       postmaster  [kernel.kallsyms]                 [k] system_call_fastpath  -
system_call_fastpath    + 37.47% __GI___kill     + 27.26% __poll     + 9.55% __write_nocancel     + 8.87% __GI___semop
  + 7.31% __read_nocancel     + 5.14% __libc_recv     + 3.77% __libc_send     + 0.53% __GI___setitimer
 

The kill() calls are evidently from async.c's SignalBackends() while most
of the other kernel activity is connected to client/server I/O.

Since the clients aren't doing any actual database work, just notify and
receive notifies, this doesn't seem like a representative workload.
So I can't get that excited about how asyncQueueReadAllNotifications and
subsidiary TransactionIdIsInProgress tests cause a large fraction of the
LWLock acquisitions --- that's only true because nothing else very useful
is happening.

BTW, this is HEAD, but I tried 9.4 quickly and didn't really see anything
much different.  Not sure why the discrepancy with Marko's results.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: COPY (INSERT/UPDATE/DELETE .. RETURNING ..)
Next
From: Peter Geoghegan
Date:
Subject: Re: Using quicksort for every external sort run