Thread: LISTEN/NOTIFY benchmarks?

LISTEN/NOTIFY benchmarks?

From
prashanth@jibenetworks.com
Date:
Hi,

I'm looking for information on the scalabality of the LISTEN/NOTIFY
mechanism.  How well does it scale with respect to:

- hundreds of clients registered for LISTENs 
   I guess this translates to hundreds of the corresponding backend   processes receiving SIG_USR2 signals.  The
efficiencyof this is   probably OS-dependent.  Would anyone be in a position to give me   signal delivery benchmarks
forFreeBSD on Unix?
 

- each client registered for thousands of LISTENs
   From a look at backend/commands/async.c, it would seem that each   listening backend would get a signal for *every*
LISTENit   registered for, resulting in thousands of signals to the same   listening backend, instead of only one.
Wouldit help if this was   optimized so that a signal was sent only once?  Again, info on   relevant signal delivery
benchmarkswould be useful.  
 

I'm not an expert on signals, not even a novice, so I might be totally
off base, but it seems like the Async Notification implementation does
not scale.  If it does not, does anyone have a solution for the
problem of signalling a each event in a possibly very large set of
events to a large number of clients?

Thanks,

--prashanth


  



Re: LISTEN/NOTIFY benchmarks?

From
Tom Lane
Date:
prashanth@jibenetworks.com writes:
> I'm not an expert on signals, not even a novice, so I might be totally
> off base, but it seems like the Async Notification implementation does
> not scale.

Very possibly.  You didn't even mention the problems that would occur if
the pg_listener table didn't get vacuumed often enough.

The pghackers archives contain some discussion about reimplementing
listen/notify using a non-table-based infrastructure.  But AFAIK no one
has picked up that task yet.
        regards, tom lane



Re: LISTEN/NOTIFY benchmarks?

From
Hannu Krosing
Date:
prashanth@jibenetworks.com kirjutas T, 29.04.2003 kell 04:14:
> Hi,
> 
> I'm looking for information on the scalabality of the LISTEN/NOTIFY
> mechanism.  How well does it scale with respect to:
> 
> - hundreds of clients registered for LISTENs 
> 
>     I guess this translates to hundreds of the corresponding backend
>     processes receiving SIG_USR2 signals.  The efficiency of this is
>     probably OS-dependent.  Would anyone be in a position to give me
>     signal delivery benchmarks for FreeBSD on Unix?
> 
> - each client registered for thousands of LISTENs
> 
>     From a look at backend/commands/async.c, it would seem that each
>     listening backend would get a signal for *every* LISTEN it
>     registered for, resulting in thousands of signals to the same
>     listening backend, instead of only one.

But as the signals are usually generated async, you have no way to know
if a particular backend has already received a signal.

Or do you mean some mechanism that remembers "signals sent" in some
shared structure that the receiving backend can then clear when it
actually receives the signal ?

That could mean lock contention on that shared structure, unless we
decide that it is cheaper to just consult it without locking it and
accept an occasional delivery of unneeded signals.

>     Would it help if this was
>     optimized so that a signal was sent only once?  Again, info on
>     relevant signal delivery benchmarks would be useful.  

I still suspect that replacing pg_listener table from the mechanism
would give gains faster. Of course we could rework the signal mechanism
as well while doing it.

> I'm not an expert on signals, not even a novice, so I might be totally
> off base, but it seems like the Async Notification implementation does
> not scale.  If it does not, does anyone have a solution for the
> problem of signalling a each event in a possibly very large set of
> events to a large number of clients?

-----------------
Hannu



Re: LISTEN/NOTIFY benchmarks?

From
prashanth@jibenetworks.com
Date:
On Mon, Apr 28, 2003 at 10:19:16PM -0400, Tom Lane wrote:

> prashanth@jibenetworks.com writes:
> > I'm not an expert on signals, not even a novice, so I might be totally
> > off base, but it seems like the Async Notification implementation does
> > not scale.
> 
> Very possibly.  You didn't even mention the problems that would occur if
> the pg_listener table didn't get vacuumed often enough.
> 
> The pghackers archives contain some discussion about reimplementing
> listen/notify using a non-table-based infrastructure.  But AFAIK no one
> has picked up that task yet.

I found some messages in 03/2002 that also brought up the performance
issue.  You had suggested the use of shared-memory, and made reference
to a "SI model".  I did find see any alternative non-table-based
suggestions.  What is the "SI model"? 

Thanks,

--prashanth



Re: LISTEN/NOTIFY benchmarks?

From
prashanth@jibenetworks.com
Date:
On Tue, Apr 29, 2003 at 10:10:47AM +0300, Hannu Krosing wrote:
> prashanth@jibenetworks.com kirjutas T, 29.04.2003 kell 04:14:

> > - each client registered for thousands of LISTENs
> > 
> >     From a look at backend/commands/async.c, it would seem that each
> >     listening backend would get a signal for *every* LISTEN it
> >     registered for, resulting in thousands of signals to the same
> >     listening backend, instead of only one.
> 
> But as the signals are usually generated async, you have no way to know
> if a particular backend has already received a signal.
> 
> Or do you mean some mechanism that remembers "signals sent" in some
> shared structure that the receiving backend can then clear when it
> actually receives the signal ?

No, I meant that a listening backend process would be sent multiple
signals from a notifying process, *in the inner loop* of
backend/commands/async.c:AtCommit_Notify().

If the listening backend had registered tens of thousands of LISTENs,
it would be sent an equivalent number of signals during a single run
of AtCommit_Notify().  I'm not sure what the cost of this is, since
I'm not sure how signal delivery works, but the tens of thousands of
system calls cannot be very cheap.

--prashanth



Re: LISTEN/NOTIFY benchmarks?

From
Tom Lane
Date:
prashanth@jibenetworks.com writes:
> I found some messages in 03/2002 that also brought up the performance
> issue.  You had suggested the use of shared-memory, and made reference
> to a "SI model".  I did find see any alternative non-table-based
> suggestions.  What is the "SI model"? 

I meant following the example of the existing shared-cache-invalidation
signaling mechanism --- see
src/backend/storage/ipc/sinvaladt.c
src/backend/storage/ipc/sinval.c
src/include/storage/sinvaladt.h
src/include/storage/sinval.h
        regards, tom lane



Re: LISTEN/NOTIFY benchmarks?

From
Tom Lane
Date:
prashanth@jibenetworks.com writes:
> If the listening backend had registered tens of thousands of LISTENs,
> it would be sent an equivalent number of signals during a single run
> of AtCommit_Notify().

Not unless the notifier had notified all tens of thousands of condition
names in a single transaction.
        regards, tom lane



Re: LISTEN/NOTIFY benchmarks?

From
prashanth@jibenetworks.com
Date:
On Tue, Apr 29, 2003 at 06:21:15PM -0400, Tom Lane wrote:
> prashanth@jibenetworks.com writes:
> > If the listening backend had registered tens of thousands of LISTENs,
> > it would be sent an equivalent number of signals during a single run
> > of AtCommit_Notify().
> 
> Not unless the notifier had notified all tens of thousands of condition
> names in a single transaction.

Unfortunately, that is a possibility in our application.  We are now
working around this non-scalability.

Regardless, it would seem redundant to send more than one SIG_USR2 to the
recipient backend in that loop.

-- prashanth



Re: LISTEN/NOTIFY benchmarks?

From
Sean Chittenden
Date:
> I'm not an expert on signals, not even a novice, so I might be
> totally off base, but it seems like the Async Notification
> implementation does not scale.  If it does not, does anyone have a
> solution for the problem of signalling a each event in a possibly
> very large set of events to a large number of clients?

<brainfart_for_the_archives> Hrm.... I should see about porting
kqueue/kevent as a messaging buss for the listen/notify bits to
postgresql... that does scale and it scales well to tens of thousands
of connections a second (easily over 60K, likely closer to 1M is the
limit)....  </brainfart_for_the_archives>

-- 
Sean Chittenden



Re: LISTEN/NOTIFY benchmarks?

From
Gavin Sherry
Date:
On Tue, 29 Apr 2003, Sean Chittenden wrote:

> > I'm not an expert on signals, not even a novice, so I might be
> > totally off base, but it seems like the Async Notification
> > implementation does not scale.  If it does not, does anyone have a
> > solution for the problem of signalling a each event in a possibly
> > very large set of events to a large number of clients?
> 
> <brainfart_for_the_archives> Hrm.... I should see about porting
> kqueue/kevent as a messaging buss for the listen/notify bits to
> postgresql... that does scale and it scales well to tens of thousands
> of connections a second (easily over 60K, likely closer to 1M is the
> limit)....  </brainfart_for_the_archives>

Except that it is FreeBSD specific -- being system calls and all -- if I
remember correctly. If you're going to move to a system like that, which
is a good idea, best move to a portable system.

Thanks,

Gavin



Re: LISTEN/NOTIFY benchmarks?

From
Sean Chittenden
Date:
> > > I'm not an expert on signals, not even a novice, so I might be
> > > totally off base, but it seems like the Async Notification
> > > implementation does not scale.  If it does not, does anyone have
> > > a solution for the problem of signalling a each event in a
> > > possibly very large set of events to a large number of clients?
> > 
> > <brainfart_for_the_archives> Hrm.... I should see about porting
> > kqueue/kevent as a messaging buss for the listen/notify bits to
> > postgresql... that does scale and it scales well to tens of
> > thousands of connections a second (easily over 60K, likely closer
> > to 1M is the limit)....  </brainfart_for_the_archives>
> 
> Except that it is FreeBSD specific -- being system calls and all --
> if I remember correctly. If you're going to move to a system like
> that, which is a good idea, best move to a portable system.

You can #ifdef abstract things so that select() and poll() work if
available.  Though now that I think about it, a queue that existed
completely in userland would be better... an shm implementation that's
abstracted would be ideal, but shm is a precious resource and can't
scale all that big.  A shared mmap() region, however, is much less
scarce and can scale much higher.  mmap() + semaphore as a gate to a
queue would be ideal, IMHO.

I shouldn't be posti^H^H^H^H^Hrambling though, haven't slept in 72hrs.
:-/  *stops reading email*  -sc

-- 
Sean Chittenden



Re: LISTEN/NOTIFY benchmarks?

From
Sailesh Krishnamurthy
Date:
Sorry for the late response to this, but I've been caught up in
merging TCQ to the 7.3.2 code base.

BTW, an announcement for those interested. We'll be doing a
demonstration of TelegraphCQ during the ACM SIGMOD Conference in
June. This year's SIGMOD is held in San Diego as part of the ACM FCRC
(Federated Computer Research Conf) - visit http://www.sigmod.org for
more details. SIGMOD runs from June 8-12 2003. 

All pgsql hackers (and others) are cordially invited :-) 

Do drop us an email if you're planning to show up. 

>>>>> "Sean" == Sean Chittenden <sean@chittenden.org> writes:
   Sean> You can #ifdef abstract things so that select() and poll()   Sean> work if available.  Though now that I think
aboutit, a   Sean> queue that existed completely in userland would be   Sean> better... an shm implementation that's
abstractedwould be   Sean> ideal, but shm is a precious resource and can't scale all   Sean> that big.  A shared mmap()
region,however, is much less   Sean> scarce and can scale much higher.  mmap() + semaphore as a   Sean> gate to a queue
wouldbe ideal, IMHO.
 

As part of our TelegraphCQ work, we've implemented a generic userland
queue. We support blocking/non-blocking operation at both
enqueue/dequeue time as well as different forms of latching. 

The queue can also live in shared memory, for which we use a new
Shared Memory MemoryContext. This is implemented using libmm - a
memory management library that's came out of the Apache project.

Our current released version is based on the 7.2.1 source
base. However, our internal CVS tip is based on 7.3.2 - we had to make
a few changes to the shm allocator - one more function that's part of
a MemoryContext.

(We can afford to be slightly more profligate in our use of shared
memory as we process all concurrently executing streaming queries in a
single monster query plan. New queries are dynamically folded into a
running query plan on the fly. Since streams represent append-only
data we play fast and loose with transaction isolation ...)

The current version of the code is available at: 
    http://telegraph.cs.berkeley.edu/telegraphcq 

If there is interest, we would love to contribute our queue
infrastructure to PostgreSQL. In fact, we'd love to contribute any of
our stuff that the pgsql folks find interesting/useful.

Our motivations are two-fold:
 (1) We'd like to give back to the pgsql community.
 (2) It's in our interest if things like the Queue/ShMem stuff is part of pgsql as it means one less of a merge hassle
infuture.
 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



Re: LISTEN/NOTIFY benchmarks?

From
Sean Chittenden
Date:
>   (2) It's in our interest if things like the Queue/ShMem stuff is
>   part of pgsql as it means one less of a merge hassle in future.

I'd be quite interested in the work as it would remove my dependence
on jabberd as a distributed event/message bus and I could keep
everything inside of PostgreSQL, which is always a good thing.  :) -sc

-- 
Sean Chittenden



Re: LISTEN/NOTIFY benchmarks?

From
Sailesh Krishnamurthy
Date:
>>>>> "Sean" == Sean Chittenden <sean@chittenden.org> writes:
   >> (2) It's in our interest if things like the Queue/ShMem stuff   >> is part of pgsql as it means one less of a
mergehassle in   >> future.
 
   Sean> I'd be quite interested in the work as it would remove my   Sean> dependence on jabberd as a distributed
event/messagebus and   Sean> I could keep everything inside of PostgreSQL, which is   Sean> always a good thing.  :)
-sc

Sounds great ! Would it make more sense for us to correspond privately
and see if you can use our code and then submit a patch ? 

Or is it better to have a discussion on HACKERS itself and lend itself
to further googling. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



Re: LISTEN/NOTIFY benchmarks?

From
Sean Chittenden
Date:
>     >> (2) It's in our interest if things like the Queue/ShMem stuff
>     >> is part of pgsql as it means one less of a merge hassle in
>     >> future.
> 
>     Sean> I'd be quite interested in the work as it would remove my
>     Sean> dependence on jabberd as a distributed event/message bus and
>     Sean> I could keep everything inside of PostgreSQL, which is
>     Sean> always a good thing.  :) -sc
> 
> Sounds great ! Would it make more sense for us to correspond privately
> and see if you can use our code and then submit a patch ? 
> 
> Or is it better to have a discussion on HACKERS itself and lend itself
> to further googling. 

Do you have a URL for the patch?  If not, send it to me privately.  I
can take any non-critical issues off line but I bet others have an
interest in this code as well.

I'm particularly interested in the API atm to see how hard it would be
to integrate.  -sc

-- 
Sean Chittenden


Re: LISTEN/NOTIFY benchmarks?

From
Sailesh Krishnamurthy
Date:
>>>>> "Sean" == Sean Chittenden <sean@chittenden.org> writes:
   Sean> Do you have a URL for the patch?  If not, send it to me   Sean> privately.  I can take any non-critical issues
offline but   Sean> I bet others have an interest in this code as well.
 

TCQ website: http://telegraph.cs.berkeley.edu/telegraphcq

The code we have on the web is a source distribution based on 7.2 -
not as a patch. 

I think I can produce a patch off of 7.3.2 - it's just a bunch of new
modules, although we had to add a few functions to the changed
semaphore abstractions. 
   Sean> I'm particularly interested in the API atm to see how hard   Sean> it would be to integrate.  -sc

Since the API hasn't changed significantly internally maybe the best
bet is for you to download the src distribution on the link above and
look at the directories src/backend/rqueue as well src/include/rqueue

If things look promising, I can rustle up code that fits the 7.3.x
codebase. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh