Re: [PATCH] Improve performance of NOTIFY over many databases (v2) - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: [PATCH] Improve performance of NOTIFY over many databases (v2)
Date
Msg-id CADWG95uOdnW95XduPz8TR_+06zGG1qDCSu0OaW68=VxdYTKchw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Improve performance of NOTIFY over many databases (v2)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Improve performance of NOTIFY over many databases (v2)
List pgsql-hackers
Hoi Tom,

On Mon, 16 Sep 2019 at 00:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I spent some more time thinking about this, and I'm still not too
> satisfied with this patch's approach.  It seems to me the key insights
> we're trying to make use of are:
>
> 1. We don't really need to keep the global tail pointer exactly
> up to date.  It's bad if it falls way behind, but a few pages back
> is fine.

Agreed.

> 2. When sending notifies, only listening backends connected to our
> own database need be awakened immediately.  Backends connected to
> other DBs will need to advance their queue pointer sometime, but
> again it doesn't need to be right away.

Agreed.

> 3. It's bad for multiple processes to all be trying to do
> asyncQueueAdvanceTail concurrently: they'll contend for exclusive
> access to the AsyncQueueLock.  Therefore, having the listeners
> do it is really the wrong thing, and instead we should do it on
> the sending side.

Agreed, but I'd add that listeners in databases that are largely idle
there may never be a sender, and thus need to be advanced up some
other way.

> However, the patch as presented doesn't go all the way on point 3,
> instead having listeners maybe-or-maybe-not do asyncQueueAdvanceTail
> in asyncQueueReadAllNotifications.  I propose that we should go all
> the way and just define tail-advancing as something that happens on
> the sending side, and only once every few pages.  I also think we
> can simplify the handling of other-database listeners by including
> them in the set signaled by SignalBackends, but only if they're
> several pages behind.  So that leads me to the attached patch;
> what do you think?

I think I like the idea of having SignalBackend do the waking up a
slow backend but I'm not enthused by the "lets wake up (at once)
everyone that is behind". That's one of the issues I was explicitly
trying to solve. If there are any significant number of "slow"
backends then we get the "thundering herd" again. If the number of
slow backends exceeds the number of cores then commits across the
system could be held up quite a while (which is what caused me to make
this patch, multiple seconds was not unusual).

The maybe/maybe not in asyncQueueReadAllNotifications is that "if I
was behind, then I probably got woken up, hence I need to wake up
someone else", thus ensuring the cleanup proceeds in an orderly
fashion, leaving gaps where the lock isn't held allowing COMMITs to
proceed.

> BTW, in my hands it seems like point 2 (skip wakening other-database
> listeners) is the only really significant win here, and of course
> that only wins when the notify traffic is spread across a fair number
> of databases.  Which I fear is not the typical use-case.  In single-DB
> use-cases, point 2 helps not at all.  I had a really hard time measuring
> any benefit from point 3 --- I eventually saw a noticeable savings
> when I tried having one notifier and 100 listen-only backends, but
> again that doesn't seem like a typical use-case.  I could not replicate
> your report of lots of time spent in asyncQueueAdvanceTail's lock
> acquisition.  I wonder whether you're using a very large max_connections
> setting and we already fixed most of the problem with that in bca6e6435.
> Still, this patch doesn't seem to make any cases worse, so I don't mind
> if it's just improving unusual use-cases.

I'm not sure if it's an unusual use-case, but it is my use-case :).
Specifically, there are 100+ instances of the same application running
on the same cluster with wildly different usage patterns. Some will be
idle because no-one is logged in, some will be quite busy. Although
there are only 2 listeners per database, that's still a lot of
listeners that can be behind. Though I agree that bca6e6435 will have
mitigated quite a lot (yes, max_connections is quite high). Another
mitigation would be to spread across more smaller database clusters,
which we need to do anyway.

That said, your approach is conceptually simpler which is also worth
something and it gets essentially all the same benefits for more
normal use cases. If the QUEUE_CLEANUP_DELAY were raised a bit then we
could do mitigation of the rest on the client side by having idle
databases send dummy notifies every now and then to trigger clean up
for their database. The flip-side is that slow backends will then have
further to catch up, thus holding the lock longer. It's not worth
making it configurable so we have to guess, but 16 is perhaps a good
compromise.

Have a nice day,
-- 
Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Peter Eisentraut
Date:
Subject: Re: A problem presentaion about ECPG, DECLARE STATEMENT