Re: proposal: make NOTIFY list de-duplication optional - Mailing list pgsql-hackers

From Filip Rembiałkowski
Subject Re: proposal: make NOTIFY list de-duplication optional
Date
Msg-id CAP_rww=n3sPMGeKh4ERb4BpC46uDC-SMkFyMZMUQfb0TTwZwgw@mail.gmail.com
Whole thread Raw
In response to Re: proposal: make NOTIFY list de-duplication optional  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Feb 6, 2016 at 5:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Brendan Jurd <direvus@gmail.com> writes:
>> On Sat, 6 Feb 2016 at 12:50 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yeah, I agree that a GUC for this is quite unappetizing.
>
>> How would you feel about a variant for calling NOTIFY?
>
> If we decide that this ought to be user-visible, then an extra NOTIFY
> parameter would be the way to do it.  I'd much rather it "just works"
> though.  In particular, if we do start advertising user control of
> de-duplication, we are likely to start getting bug reports about every
> case where it's inexact, eg the no-checks-across-subxact-boundaries
> business.

It is not enough to say "database server can decide to deliver a
single notification only." - which is already said in the docs?

The ALL keyword would be a clearly separated "do-nothing" version.

>
>> Optimising the remove-duplicates path is still probably a worthwhile
>> endeavour, but if the user really doesn't care at all about duplication, it
>> seems silly to force them to pay any performance price for a behaviour they
>> didn't want, no?
>
> I would only be impressed with that argument if it could be shown that
> de-duplication was a significant fraction of the total cost of a typical
> NOTIFY cycle.

Even if a typical NOTIFY cycle excludes processing 10k or 100k
messages, why penalize users who have bigger transactions?

> Obviously, you can make the O(N^2) term dominate if you
> try, but I really doubt that it's significant for reasonable numbers of
> notify events per transaction.

Yes, it is hard to observe for less than few thousands messages in one
transaction.
But big data happens. And then the numbers get really bad.
In my test for 40k messages, it is 400 ms versus 9 seconds. 22 times
slower. For 200k messages, it is 2 seconds  versus 250 seconds. 125
times slower.
And I tested with very short payload strings, so strcmp() had not much to do.



pgsql-hackers by date:

Previous
From: Shubham Barai
Date:
Subject: Optimization- Check the set of conditionals on a WHERE clause against CHECK constraints.
Next
From: Tomas Vondra
Date:
Subject: Re: Explanation for bug #13908: hash joins are badly broken