On Wed, Jun 17, 2015 at 5:31 AM, Brendan Jurd <direvus@gmail.com> wrote:
> Hello hackers,
>
> I present a patch to add a new built-in function
> pg_notify_queue_saturation().
>
> The purpose of the function is to allow users to monitor the health of their
> notification queue. In certain cases, a client connection listening for
> notifications might get stuck inside a transaction, and this would cause the
> queue to keep filling up, until finally it reaches capacity and further
> attempts to NOTIFY error out.
>
> The current documentation under LISTEN explains this possible gotcha, but
> doesn't really suggest a useful way to address it, except to mention that
> warnings will show up in the log once you get to 50% saturation of the
> queue. Unless you happen to be eyeballing the logs when it happens, that's
> not a huge help. The choice of 50% as a threshold is also very much
> arbitrary, and by the time you hit 50% the problem has likely been going on
> for quite a while. If you want your nagios (or whatever) to say, alert you
> when the queue goes over 5% or 1%, your options are limited and awkward.
>
> The patch has almost no new code. It makes use of the existing logic for
> the 50% warning. I simply refactored that logic into a separate function
> asyncQueueSaturation, and then added pg_notify_queue_saturation to make that
> available in SQL.
>
> I am not convinced that pg_notify_queue_saturation is the best possible name
> for this function, and am very much open to other suggestions.
>
> The patch includes documentation, a regression test and an isolation test.
*) The documentation should indicate what the range of values mean --
looks like value is returned on 0-1 scale.
*) A note regarding the 50% (0.5) threshold raising warnings in the
log might be appropriate here
*) As you suspect, the name seems a little off to me. 'usage' seems
preferable to 'saturation', I think. Perhaps,
pg_notification_queue_usage()?
merlin