Re: LISTEN denial of service with aborted transaction - Mailing list pgsql-hackers
From | Matt Newell |
---|---|
Subject | Re: LISTEN denial of service with aborted transaction |
Date | |
Msg-id | 4705384.iNuYeV9kCG@obsidian Whole thread Raw |
In response to | Re: LISTEN denial of service with aborted transaction (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: LISTEN denial of service with aborted transaction
|
List | pgsql-hackers |
On Tuesday, September 29, 2015 09:58:35 PM Tom Lane wrote: > Matt Newell <newellm@blur.com> writes: > > On Monday, September 28, 2015 07:22:26 PM Tom Lane wrote: > >> Possibly. sinval catchup notification works like that, so you could > >> maybe > >> invent a similar mechanism for notify. Beware that we've had to fix > >> performance issues with sinval catchup; you don't want to release a whole > >> bunch of processes to do work at the same time. > > > > I'll have to think about this more but i don't envision it causing such as > > scenario. > > The extra processing that will happen at signaling time would have been > > done anyway when the aborted transaction is finally rolled back, the only > > extra work is copying the relevant notifications to local memory. > > Hm. An idle-in-transaction listening backend will eventually cause a more > serious form of denial of service: once the pg_notify SLRU area fills up > to its maximum of 8GB, no more new messages can be inserted, and every > transaction that tries to do a NOTIFY will abort. However, that's a > documented hazard and we've not really had complaints about it. In any > case, trying to fix that by having such a backend absorb messages into > local memory doesn't seem like a great idea: if it sits for long enough, > its local memory usage will bloat. Eventually you'd have a failure > anyway. > > > Regardless it might not be worthwhile since my patch for #1 seems to > > address the symptom that bit me. > > I looked at this patch for a bit and found it kind of ugly, particularly > the business about "we allow one process at a time to advance > firstUncommitted by using advancingFirstUncommitted as a mutex". > That doesn't sound terribly safe or performant. > It can be implemented without that but then you'll have multiple backends trying to do the same work. This might not be an issue at all but I couldn't tell at first glance the potential cost of extra calls to TransactionIdIsInProgress. Since there are no extra locks taken I figured ensuring the work is only done once is good for performance. If the cluster only has one database generating notifies then there is practically no extra work with the patch. As far as safety is concerned I think the worst case is that behavior returns to what it is now, where firstUncommitted ends up tracking QUEUE_TAIL. I originally used a boolean then realized I could get some extra safety by using the backendId so that the mutex would be released automatically if the backend dies. > After further thought, I wonder whether instead of having an incoming > listener initialize its "pos" to QUEUE_TAIL, we couldn't have it > initialize to the maximum "pos" among existing listeners (with a floor of > QUEUE_TAIL of course). If any other listener has advanced over a given > message X, then X was committed (or aborted) earlier and the new listener > is not required to return it; so it should be all right to take that > listener's "pos" as a starting point. That's a great idea and I will try it. It does need to be the maximum position of existing listeners *with matching db oid" since asyncQueueReadAllNotifications doesn't check transaction status if the db oid doesn't match it's own. Another advantage of this approach is that a queued notify from an open transaction in one database won't incur additional cost on listens coming from other databases, whereas with my patch such a situation will prevent firstUncommitted from advancing. > > The minimum-code-change way to do that would be to compute the max pos > the hard way while Exec_ListenPreCommit holds the AsyncQueueLock, which > it would now have to do in exclusive mode. That's a little bit annoying > but it's surely not much worse than what SignalBackends has to do after > every notification. Yeah I think it's going to be a win regardless. Could also skip the whole thing if QUEUE_HEAD - QUEUE_TAIL is under a fixed amount, which would probably cover a lot of use cases. > > Alternatively, we could try to maintain a shared pointer that is always > equal to the frontmost backend's "pos". But I think that would require > more locking during queue reading than exists now; so it's far from clear > it would be a win, especially in systems where listening sessions last for > a reasonable amount of time (so that Exec_ListenPreCommit is infrequent). And that would be even more complicated since it has to be per db oid. Matt Newell
pgsql-hackers by date: