Thread: Sending notifications from the master to the standby

Sending notifications from the master to the standby

From
Joachim Wieland
Date:
People have always expressed interest in $subject, so I wondered how
hard it could possibly be and came up with the attached patch.

Notifications that are generated on the master and are forwarded to
the standby can be used as a convenient way to find out which changes
have already made it to the standby. The idea would be that you run a
transaction on the master, add a "NOTIFY changes_made", and listen on
the standby for this event. Once it gets delivered, you know that your
transaction got replayed to the standby.

Note that this feature is only about LISTEN on the standby, it still
doesn't allow sending NOTIFYs out from the standby.

As a reminder, the current implementation of notifications
(LISTEN/NOTIFY) in a few words is:

- a transaction that executes "NOTIFY channel, payload" adds the
transaction to backend-local memory
- upon commit, it inserts the notifications along with its transaction
id into a large SLRU mapped ring buffer and signals any listening
backend

- each backend that's listening has a pointer into this ring buffer.
After each transaction, the backend starts reading from this pointer
position to the end of the ring buffer. It delivers all matching
notifications to its frontend if the transaction that has inserted
them is known to have committed.

In the patch I added a new WAL message type, XLOG_NOTIFY that writes
out WAL records when the notifications are written into the pages of
the SLRU ring buffer. Whenever an SLRU page is found to be full, a new
WAL record will be created, that's just a more or less arbitrary form
of batching a bunch of them together but that's easy to do and most
often, I think there won't be more than at most a few record per
transaction anyway.

The recovery process on the client side adds the notifications into
the standby's SLRU ring buffer and once the last notification has been
added (which might be after a couple more WAL records), it signals the
listening backends.

Theoretically we could also run into a full queue situation on the
standby: Imagine a long-running transaction doesn't advance its
pointer in the ring buffer and no new notifications can be stored in
the buffer. The patch introduces a new type of recovery conflict for
this reason.

One further optimization (that is not included for now) would be to
keep track of how many backends are actually listening on some channel
and if nobody is listening, discard incoming notifications.

Attachment

Re: Sending notifications from the master to the standby

From
Tom Lane
Date:
Joachim Wieland <joe@mcknight.de> writes:
> [ send NOTIFYs to slaves by means of: ]
> In the patch I added a new WAL message type, XLOG_NOTIFY that writes
> out WAL records when the notifications are written into the pages of
> the SLRU ring buffer. Whenever an SLRU page is found to be full, a new
> WAL record will be created, that's just a more or less arbitrary form
> of batching a bunch of them together but that's easy to do and most
> often, I think there won't be more than at most a few record per
> transaction anyway.

I'm having a hard time wrapping my mind around why you'd do it that way.
ISTM there are two fairly serious problems:

1. Emitting WAL records for NOTIFY traffic results in significantly
more overhead, with no benefit whatever, for existing non-replicated
NOTIFY-using applications.  Those folk are going to see a performance
degradation, and they're going to complain.

2. Batching NOTIFY traffic will result in a delay in receipt, which will
annoy anybody who's trying to make actual use of the notifications on
standby servers.  The worst case here happens if notify traffic on the
master is bursty: the last few messages in a burst might not get to the
slave for a long time, certainly long after the commits that the
messages were supposed to be telling people about.

So this design is non-optimal both for existing uses and for the
proposed new uses, which means nobody will like it.  You could
ameliorate #1 by adding a GUC that determines whether NOTIFY actually
writes WAL, but that's pretty ugly.  In any case ISTM that problem #2
means this design is basically broken.


I wonder whether it'd be practical to not involve WAL per se in this
at all, but to transmit NOTIFY messages by having walsender processes
follow the notify stream (as though they were listeners) and send the
notify traffic as a separate message stream interleaved with the WAL
traffic.  We already have, as of a few days ago, the concept of
additional traffic in the walsender stream besides the WAL data itself,
so adding notify traffic as another message type should be
straightforward.  It might be a bit tricky to get walreceivers to inject
the data into the slave-side ring buffer at the right time, ie, not
until after the commit a given message describes has been replayed;
but I don't immediately see a reason to think that's infeasible.

Going in this direction would mean that slave-side LISTEN only works
when using walsender/walreceiver, and not with old-style log shipping.
But personally I don't see a problem with that.  If you're trying to
LISTEN you probably want pretty up-to-date data anyway.
        regards, tom lane


Re: Sending notifications from the master to the standby

From
Simon Riggs
Date:
On Tue, Jan 10, 2012 at 5:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Joachim Wieland <joe@mcknight.de> writes:
>> [ send NOTIFYs to slaves by means of: ]

Good idea.

> I wonder whether it'd be practical to not involve WAL per se in this
> at all, but to transmit NOTIFY messages by having walsender processes
> follow the notify stream (as though they were listeners) and send the
> notify traffic as a separate message stream interleaved with the WAL
> traffic.  We already have, as of a few days ago, the concept of
> additional traffic in the walsender stream besides the WAL data itself,
> so adding notify traffic as another message type should be
> straightforward.

Also good idea.

> It might be a bit tricky to get walreceivers to inject
> the data into the slave-side ring buffer at the right time, ie, not
> until after the commit a given message describes has been replayed;
> but I don't immediately see a reason to think that's infeasible.

When transaction commits it would use full-size commit records and set
a (new) flag in xl_xact_commit.xinfo to show the commit is paired with
notify traffic.

Get messages in walreceiver.c XLogWalRcvProcessMsg() and put them in a
shared hash table. Messages would need to contain xid of notifying
transaction and other info needed for LISTEN.

When we hit xact.c xact_redo_commit() on standby we'd check for
messages in the hash table if the notify flag is set and execute the
normal notify code as if the NOTIFY had run locally on the standby. We
can sweep the hash table clean of any old messages each time we run
ProcArrayApplyRecoveryInfo()

Add new message type to walprotocol.h. Message code 'L' appears to be
available.

Suggest we add something to initial handshake from standby to say
"please send me notify traffic", which we can link to a parameter that
defines size of standby_notify_buffer. We don't want all standbys to
receive such traffic unless they really want it and pg_basebackup
probably doesn't want it either.

If you wanted to get really fancy you could send only some of the
traffic to each standby based on a hash or roundrobin algorithm, so we
can spread the listeners across multiple standbys.

I'll be your reviewer, if you want.

> Going in this direction would mean that slave-side LISTEN only works
> when using walsender/walreceiver, and not with old-style log shipping.
> But personally I don't see a problem with that.  If you're trying to
> LISTEN you probably want pretty up-to-date data anyway.

Which fits the expected use case also.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Sending notifications from the master to the standby

From
Joachim Wieland
Date:
On Tue, Jan 10, 2012 at 12:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So this design is non-optimal both for existing uses and for the
> proposed new uses, which means nobody will like it.  You could
> ameliorate #1 by adding a GUC that determines whether NOTIFY actually
> writes WAL, but that's pretty ugly.  In any case ISTM that problem #2
> means this design is basically broken.

I chose to do it this way because it seemed like the most natural way
to do it (which of course doesn't mean it's the best)  :-). I agree
that there should be a way to avoid the replication of the NOTIFYs.
Regarding your second point though, remember that on the master we
write notifications to the queue in pre-commit. And we also don't
interleave notifications of different transactions. So once the commit
record makes it to the standby, all the notifications are already
there, just as on the master. In a burst of notifications, both
solutions should more or less behave the same way but yes, the one
involving the WAL file would be slower as it goes to the file system
and back.

> I wonder whether it'd be practical to not involve WAL per se in this
> at all, but to transmit NOTIFY messages by having walsender processes
> follow the notify stream (as though they were listeners) and send the
> notify traffic as a separate message stream interleaved with the WAL
> traffic.

Agreed, having walsender/receiver work as NOTIFY proxies is kinda smart...


Joachim


Re: Sending notifications from the master to the standby

From
Simon Riggs
Date:
On Tue, Jan 10, 2012 at 12:56 PM, Joachim Wieland <joe@mcknight.de> wrote:

> I chose to do it this way because it seemed like the most natural way
> to do it (which of course doesn't mean it's the best)  :-).

If its any consolation its exactly how I would have done it also up
until about 2 months ago, and I remember discussing almost exactly the
design you presented with someone in Rome last year.

Anyway its a good feature, so I hope you have time.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Sending notifications from the master to the standby

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Tue, Jan 10, 2012 at 5:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It might be a bit tricky to get walreceivers to inject
>> the data into the slave-side ring buffer at the right time, ie, not
>> until after the commit a given message describes has been replayed;
>> but I don't immediately see a reason to think that's infeasible.

> [ Simon sketches a design for that ]

Seems a bit overcomplicated.  I was just thinking of having walreceiver
note the WAL endpoint at the instant of receipt of a notify message,
and not release the notify message to the slave ring buffer until WAL
replay has advanced that far.  You'd need to lay down ground rules about
how the walsender times the insertion of notify messages relative to
WAL in its output.  But I don't see the need for either explicit markers
in the WAL stream or a hash table.  Indeed, a hash table scares me
because it doesn't clearly guarantee that notifies will be released in
arrival order.

> Suggest we add something to initial handshake from standby to say
> "please send me notify traffic",

+1 on that.
        regards, tom lane


Re: Sending notifications from the master to the standby

From
Simon Riggs
Date:
On Tue, Jan 10, 2012 at 4:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Tue, Jan 10, 2012 at 5:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> It might be a bit tricky to get walreceivers to inject
>>> the data into the slave-side ring buffer at the right time, ie, not
>>> until after the commit a given message describes has been replayed;
>>> but I don't immediately see a reason to think that's infeasible.
>
>> [ Simon sketches a design for that ]
>
> Seems a bit overcomplicated.  I was just thinking of having walreceiver
> note the WAL endpoint at the instant of receipt of a notify message,
> and not release the notify message to the slave ring buffer until WAL
> replay has advanced that far.  You'd need to lay down ground rules about
> how the walsender times the insertion of notify messages relative to
> WAL in its output.

You have to store the messages somewhere until they're needed. If that
somewhere isn't on the standby, very close to the Startup process then
its going to be very slow. Putting a marker in the WAL stream
guarantees arrival order. The hash table was just a place to store
them until they're needed, could be a ring buffer as well.

Inserts into the slave ring buffer already have an xid on them, so the
test will probably already cope with messages inserted but for which
the parent xid has not committed. The only problem is coping with
possible out of sequence messages.

> But I don't see the need for either explicit markers
> in the WAL stream or a hash table.  Indeed, a hash table scares me
> because it doesn't clearly guarantee that notifies will be released in
> arrival order.

The hash table is clearly not the thing providing an arrival order
guarantee, it was just a cache.

You have a few choices: (1) you either send the message while holding
an exclusive lock, or (2) you send them as they come and buffer them,
then reorder them using the WAL log sequence since that matches the
original commit sequence. Or (3) add a sequence number to the messages
sent by WALSender, so that the WALReceiver can buffer them locally and
insert them in the correct order into the normal ring buffer - so in
(3) the message sequence and the WAL sequence match, but the mechanism
is different.

(1) is out because the purpose of offloading to the standby is to give
the master more capcity. If we slow it down in order to serve the
standby we're doing things the wrong way around.

I was choosing (2), maybe you prefer (3) or another design entirely.
They look very similar to me and about the same complexity, its just
copying data and preserving sequence.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Sending notifications from the master to the standby

From
Joachim Wieland
Date:
On Tue, Jan 10, 2012 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> [ Tom sketches a design ]
> Seems a bit overcomplicated.  I was just thinking of having walreceiver
> note the WAL endpoint at the instant of receipt of a notify message,
> and not release the notify message to the slave ring buffer until WAL
> replay has advanced that far.

How about this: We mark a notify message specially if it is the last
message sent by a transaction and also add a flag to
commit/abort-records, indicating whether or not the transaction has
sent notifys. Now if such a last message is being put into the regular
ring buffer on the standby and the xid is known to have committed or
aborted, signal the backends. Also signal from a commit/abort-record
if the flag is set.

If the notify messages make it to the standby first, we just put
messages of a not-yet-committed transaction into the queue, just as on
the master. Listeners will get signaled when the commit record
arrives. If the commit record arrives first, we signal, but the
listeners won't find anything (at least not the latest notifications).
When the last notify of that transaction finally arrives, the
transaction is known to have committed and the listeners will get
signaled.

What could still happen is that the standby receives notifys, the
commit message and more notifys. Listeners would still eventually get
all the messages but potentially not all of them at once. Is this a
problem? If so, then we could add a special "stop reading"-record into
the queue before we write the notifys, that we subsequently change
into a "continue reading"-record once all notifications are in the
queue. Readers would treat a "stop reading" record just like a
not-yet-committed transaction and ignore a "continue reading" record.


>> Suggest we add something to initial handshake from standby to say
>> "please send me notify traffic",
>
> +1 on that.

From what you said I imagined this walsender listener as a regular
listener that listens on the union of all sets of channels that
anybody is listening on on the standby, with the LISTEN transaction on
the standby return from commit once the listener is known to have been
set up on the master.


Joachim


Re: Sending notifications from the master to the standby

From
Tom Lane
Date:
Joachim Wieland <joe@mcknight.de> writes:
> On Tue, Jan 10, 2012 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> Suggest we add something to initial handshake from standby to say
>>> "please send me notify traffic",

>> +1 on that.

> From what you said I imagined this walsender listener as a regular
> listener that listens on the union of all sets of channels that
> anybody is listening on on the standby, with the LISTEN transaction on
> the standby return from commit once the listener is known to have been
> set up on the master.

This seems vastly overcomplicated too.  I'd just vote for a simple
yes/no flag, so that receivers that have no interest in notifies don't
have to deal with them.
        regards, tom lane


Re: Sending notifications from the master to the standby

From
Tom Lane
Date:
BTW ... it occurs to me to ask whether we really have a solid use-case
for having listeners attached to slave servers.  I have personally never
seen an application for LISTEN/NOTIFY in which the listeners were
entirely read-only.  Even if there are one or two cases out there, it's
not clear to me that supporting it is worth the extra complexity that
seems to be needed.
        regards, tom lane


Re: Sending notifications from the master to the standby

From
Simon Riggs
Date:
On Wed, Jan 11, 2012 at 4:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> BTW ... it occurs to me to ask whether we really have a solid use-case
> for having listeners attached to slave servers.  I have personally never
> seen an application for LISTEN/NOTIFY in which the listeners were
> entirely read-only.  Even if there are one or two cases out there, it's
> not clear to me that supporting it is worth the extra complexity that
> seems to be needed.

The idea is to support external caches that re-read the data when it changes.

If we can do that from the standby then we offload from the master.

Yes, there are other applications for LISTEN/NOTIFY and we wouldn't be
able to support them all with this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Sending notifications from the master to the standby

From
Josh Berkus
Date:
Tom,

> BTW ... it occurs to me to ask whether we really have a solid use-case
> for having listeners attached to slave servers.  I have personally never
> seen an application for LISTEN/NOTIFY in which the listeners were
> entirely read-only.  Even if there are one or two cases out there, it's
> not clear to me that supporting it is worth the extra complexity that
> seems to be needed.

Actually, I've seen requests for it from my clients and on IRC.  Not
sure how serious those are, but users have brought it up.  Certainly
users intuitively think they should be able to LISTEN on a standby, and
are surprised when they find out they can't.

The basic idea is that if we can replicate LISTENs, then you can use
replication as a simple distributed (and lossy) queueing system.  This
is especially useful if the replica is geographically distant, and there
are a lot of listeners.

The obvious first use case for this is for cache invalidation.  For
example, we have one application where we're using Redis to queue cache
invalidation messages; if LISTEN/NOTIFY were replicated, we could use it
instead and simplify our infrastructure.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Sending notifications from the master to the standby

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> BTW ... it occurs to me to ask whether we really have a solid use-case
>> for having listeners attached to slave servers.  I have personally never
>> seen an application for LISTEN/NOTIFY in which the listeners were
>> entirely read-only.  Even if there are one or two cases out there, it's
>> not clear to me that supporting it is worth the extra complexity that
>> seems to be needed.

> The basic idea is that if we can replicate LISTENs, then you can use
> replication as a simple distributed (and lossy) queueing system.

Well, this is exactly what I don't believe.  A queueing system requires
that recipients be able to remove things from the queue.  You can't do
that on a slave server, because you can't make any change in the
database that would be visible to other users.

> The obvious first use case for this is for cache invalidation.

Yeah, upthread Simon pointed out that propagating notifies would be
useful for flushing caches in applications that watch the database in a
read-only fashion.  I grant that such a use-case is technically possible
within the limitations of a slave server; I'm just dubious that it's a
sufficiently attractive use-case to justify the complexity and future
maintenance costs of the sort of designs we are talking about.  Or in
other words: so far, cache invalidation is not the "first" use-case,
it's the ONLY POSSIBLE use-case.  That's not useful enough.
        regards, tom lane


Re: Sending notifications from the master to the standby

From
Josh Berkus
Date:
> Yeah, upthread Simon pointed out that propagating notifies would be
> useful for flushing caches in applications that watch the database in a
> read-only fashion.  I grant that such a use-case is technically possible
> within the limitations of a slave server; I'm just dubious that it's a
> sufficiently attractive use-case to justify the complexity and future
> maintenance costs of the sort of designs we are talking about.  Or in
> other words: so far, cache invalidation is not the "first" use-case,
> it's the ONLY POSSIBLE use-case.  That's not useful enough.

Well, cache invalidation is a pretty common task; probably more than 50%
of all database applications need to do it.  Note that we're not just
talking about memcached for web applications here.  For example, one of
the companies quoted for PostgreSQL 9.0 release uses LISTEN/NOTIFY to
inform remote devices (POS systems) that there's new data available for
them. That's a form of cache invalidation.  It's certainly a more common
design pattern than using XML in the database.

However, there's the question of whether or not this patch actually
allows a master-slave replication system to support more Listeners more
efficiently than having them all simply listen to the master.  And what
impact it has on the performance of LISTEN/NOTIFY on standalone systems.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Sending notifications from the master to the standby

From
Peter Geoghegan
Date:
On 11 January 2012 23:51, Josh Berkus <josh@agliodbs.com> wrote:
>
>> Yeah, upthread Simon pointed out that propagating notifies would be
>> useful for flushing caches in applications that watch the database in a
>> read-only fashion.  I grant that such a use-case is technically possible
>> within the limitations of a slave server; I'm just dubious that it's a
>> sufficiently attractive use-case to justify the complexity and future
>> maintenance costs of the sort of designs we are talking about.  Or in
>> other words: so far, cache invalidation is not the "first" use-case,
>> it's the ONLY POSSIBLE use-case.  That's not useful enough.
>
> Well, cache invalidation is a pretty common task; probably more than 50%
> of all database applications need to do it.

I agree that it would be nice to support this type of cache
invalidation - without commenting on the implementation, I think that
the concept is very useful, and of immediate benefit to a significant
number of people.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Sending notifications from the master to the standby

From
Simon Riggs
Date:
On Wed, Jan 11, 2012 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> The obvious first use case for this is for cache invalidation.
>
> Yeah, upthread Simon pointed out that propagating notifies would be
> useful for flushing caches in applications that watch the database in a
> read-only fashion.  I grant that such a use-case is technically possible
> within the limitations of a slave server; I'm just dubious that it's a
> sufficiently attractive use-case to justify the complexity and future
> maintenance costs of the sort of designs we are talking about.  Or in
> other words: so far, cache invalidation is not the "first" use-case,
> it's the ONLY POSSIBLE use-case.  That's not useful enough.

Many people clearly do think this is useful.

I personally don't think it will be that complex. I'm willing to
review and maintain it if the patch works the way we want it to.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Sending notifications from the master to the standby

From
Josh Berkus
Date:
> Many people clearly do think this is useful.

It also comes under the heading of "avoiding surprising behavior".  That
is, users instinctively expect to be able to LISTEN on standbys, and are
surprised when they can't.

> I personally don't think it will be that complex. I'm willing to
> review and maintain it if the patch works the way we want it to.
> 

I think we need some performance testing for the review for it to be valid.

1) How does this patch affect the speed and throughput of LISTEN/NOTIFY
on a standalone server?

2) Can we actually attach more LISTENers to multiple standbys than we
could to a single Master?

Unfortunately, I don't have an application which can LISTEN in a way
which doesn't eclipse any differences in througput or response time we
would see on the DB side.  Does anyone?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com