Martijn van Oosterhout <kleptog@gmail.com> writes:
> On Sun, 24 Nov 2019 at 02:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This seems both undesirable for our own testing, and rather bogus
>> from users' standpoints as well. However, I think a simple fix is
>> available: just make the SQL pg_notification_queue_usage() function
>> advance the queue tail before measuring, as in 0002 below. This will
>> restore the behavior of that function to what it was before 51004c717,
>> and it doesn't seem like it'd cost any performance in any plausible
>> use-cases.
> This was one of those open points in the previous patches where it
> wasn't quite clear what the correct behaviour should be. This fixes
> the issue, but the question in my mind is: do we want to document this
> fact and can users rely on this behaviour? If we go with the argument
> that the delay in cleaning up should be entirely invisible, then I
> guess this patch is the correct one that makes the made changes
> invisible. Arguably not doing this means we'd have to document the
> values are possibly out of date.
> So I think this patch does the right thing.
Thanks for looking! In the light of morning, there's one other
thing bothering me about this patch: it means that the function has
side-effects, even though those effects are at the implementation
level and shouldn't be user-visible. We do already have it marked
"volatile", so that's OK, but I notice that it's not parallel
restricted. The isolation test still passes when I set
force_parallel_mode = regress, so apparently it works to run this
code in a parallel worker, but that seems pretty scary to me;
certainly nothing in async.c was written with that in mind.
I think we'd be well advised to adjust pg_proc.dat to mark
pg_notification_queue_usage() as parallel-restricted, so that
it only executes in the main session process. It's hard to
see any use-case for parallelizing it that would justify even
a small chance of new bugs.
regards, tom lane