Re: LISTEN denial of service with aborted transaction - Mailing list pgsql-hackers

From Tom Lane
Subject Re: LISTEN denial of service with aborted transaction
Date
Msg-id 15532.1443578315@sss.pgh.pa.us
Whole thread Raw
In response to Re: LISTEN denial of service with aborted transaction  (Matt Newell <newellm@blur.com>)
Responses Re: LISTEN denial of service with aborted transaction  (Matt Newell <newellm@blur.com>)
List pgsql-hackers
Matt Newell <newellm@blur.com> writes:
> On Monday, September 28, 2015 07:22:26 PM Tom Lane wrote:
>> Possibly.  sinval catchup notification works like that, so you could maybe
>> invent a similar mechanism for notify.  Beware that we've had to fix
>> performance issues with sinval catchup; you don't want to release a whole
>> bunch of processes to do work at the same time.

> I'll have to think about this more but i don't envision it causing such as 
> scenario.
> The extra processing that will happen at signaling time would have been done 
> anyway when the aborted transaction is finally rolled back, the only extra 
> work is copying the relevant notifications to local memory. 

Hm.  An idle-in-transaction listening backend will eventually cause a more
serious form of denial of service: once the pg_notify SLRU area fills up
to its maximum of 8GB, no more new messages can be inserted, and every
transaction that tries to do a NOTIFY will abort.  However, that's a
documented hazard and we've not really had complaints about it.  In any
case, trying to fix that by having such a backend absorb messages into
local memory doesn't seem like a great idea: if it sits for long enough,
its local memory usage will bloat.  Eventually you'd have a failure
anyway.

> Regardless it might not be worthwhile since my patch for #1 seems to address 
> the symptom that bit me.

I looked at this patch for a bit and found it kind of ugly, particularly
the business about "we allow one process at a time to advance
firstUncommitted by using advancingFirstUncommitted as a mutex".
That doesn't sound terribly safe or performant.

After further thought, I wonder whether instead of having an incoming
listener initialize its "pos" to QUEUE_TAIL, we couldn't have it
initialize to the maximum "pos" among existing listeners (with a floor of
QUEUE_TAIL of course).  If any other listener has advanced over a given
message X, then X was committed (or aborted) earlier and the new listener
is not required to return it; so it should be all right to take that
listener's "pos" as a starting point.

The minimum-code-change way to do that would be to compute the max pos
the hard way while Exec_ListenPreCommit holds the AsyncQueueLock, which
it would now have to do in exclusive mode.  That's a little bit annoying
but it's surely not much worse than what SignalBackends has to do after
every notification.

Alternatively, we could try to maintain a shared pointer that is always
equal to the frontmost backend's "pos".  But I think that would require
more locking during queue reading than exists now; so it's far from clear
it would be a win, especially in systems where listening sessions last for
a reasonable amount of time (so that Exec_ListenPreCommit is infrequent).
        regards, tom lane



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: PATCH: index-only scans with partial indexes
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH 5/6] pg_basebackup: allow GetConnection() to make non-replication connections.