Thread: notification: pg_notify ?
Jeff Davis asked on -general why NOTIFY doesn't take an optional argument, specifying a message that is passed to the listening backend. This feature is supported by Oracle and other databases and I think it's quite useful, so I've started to implement it. Most of the modifications have been pretty straight-forward, except for 2 issues: (1) Processing notifies. Currently, the only data that is passed from the notifying backend to the listening one is the PID of the notifier, which is stored in the "notification" column of pg_listener. In order to pass messages from notifier to listener, I could add another column to pg_listener, but IMHO that's a bad idea: there is really no reason for this kind of data to be in pg_listener in the first place. pg_listener should simply list the PIDs of listening backends, as well as the conditions upon which they are listening -- any data that is related to specific notifications should be put elsewhere. (2) Multiple notifications on the same condition name in a short time span are delivered as a single notification. This isn't currently a problem because the NOTIFY itself doesn't carry any data (other than backend PID), it just informs the listener that an event has occurred. If we allow NOTIFY to send a message to the listener, this is not good -- the listener should be notified for each and every notification, since the contents of the message could be important. Solution: Create a new system catalog, pg_notify. This should contain 4 columns: relname: the name of the NOTIFY condition that has been sentmessage: the optional message sent by the NOTIFYsender: thePID of the backend that sent the NOTIFYreceiver: the PID of the listening backend AFAICT, this should resolve the two issues mentioned above. The actual notification of a listening backend is still done at transaction commit, by sending a SIGUSR2: however, all this does is to ask the backend to scan through pg_notify, looking for tuples containing its PID in "receiver". Therefore, even if Unix doesn't send multiple signals for multiple notifications, a single signal should be enough to ensure a scan of pg_notify, where any additional notifications will be found. If we continued to add columns to pg_listener, there would be a limit of 1 tuple per listening backend: thus, we would still run into problems with multiple notifications being ignored. Can anyone see a better way to do this? Are there any problems with the implementation I've outlined? Any feedback would be appreciated. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway <nconway@klamath.dyndns.org> writes: > Solution: Create a new system catalog, pg_notify. It's not apparent to me why that helps much. There is a very significant performance problem with LISTEN/NOTIFY via pg_listener: in any application that generates notifications at a significant rate, pg_listener will accumulate dead tuples at that same rate, and we will soon find ourselves wasting lots of time scanning through dead tuples. Frequent VACUUMs might help, but the whole thing is really quite silly: why are we using a storage mechanism that's designed entirely for *stable* storage of data to pass inherently *transient* signals? If the system crashes, we have absolutely zero interest in the former contents of pg_listener (and indeed need to go to some trouble to get rid of them). So if someone wants to undertake a revision of the listen/notify code, I think the first thing to do ought to be to throw away pg_listener entirely and develop some lower-overhead, shared-memory-based communication mechanism. You could do worse than to use the shared cache inval code as a model --- or perhaps even incorporate LISTEN signaling into that mechanism. (Actually that seems like a good plan, so as not to use shared memory inefficiently by dedicating two separate memory pools to parallel purposes.) If you follow the SI model then NOTIFY messages would essentially be broadcast to all backends, and whether any given backend pays attention to one is its own problem; no one else cares. A deficiency of the SI implementation (and probably anything else that relies solely on shared memory) is that it can suffer from buffer overrun, since there's a fixed-size message pool. For the purposes of cache inval, we cope with buffer overrun by just invalidating everything in sight. It might be a workable tradeoff to cope with buffer overrun for LISTEN/NOTIFY by reporting notifies on all conditions currently listened for. Assuming that overrun is infrequent, the net performance gain from being able to use shared memory is probably worth the occasional episode of wasted work. BTW, I would like to see a spec for this "notify with parameter" feature before it's implemented, not after. Exactly what semantics do you have in mind? regards, tom lane
On Thu, 2002-03-21 at 22:41, Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > Solution: Create a new system catalog, pg_notify. > > It's not apparent to me why that helps much. Well, it solves the functional problem at hand -- this feature can now be implemented. However, I agree with you that there are still problems with NOTIFY and pg_listener, as you have outlined. > So if someone wants to undertake a revision of the listen/notify code, > I think the first thing to do ought to be to throw away pg_listener > entirely and develop some lower-overhead, shared-memory-based > communication mechanism. You could do worse than to use the shared > cache inval code as a model --- or perhaps even incorporate LISTEN > signaling into that mechanism. (Actually that seems like a good plan, > so as not to use shared memory inefficiently by dedicating two separate > memory pools to parallel purposes.) That's very interesting. I need to read the code you're referring to before I can comment further, but I'll definately look into this. That's a good idea. > If you follow the SI model then NOTIFY messages would essentially be > broadcast to all backends, My apologies, but what's the SI model? > A deficiency of the SI implementation (and probably anything else that > relies solely on shared memory) is that it can suffer from buffer > overrun, since there's a fixed-size message pool. For the purposes > of cache inval, we cope with buffer overrun by just invalidating > everything in sight. It might be a workable tradeoff to cope with > buffer overrun for LISTEN/NOTIFY by reporting notifies on all conditions > currently listened for. This assumes that the NOTIFY condition we're waiting for is fairly routine (e.g. "table x is updated, refresh the cache"). If a NOTIFY actually represents the occurence of a non-trivial condition, this could be a problem (e.g. "the site crashed, page the sys-admin", and the buffer happens to overflow at 2 AM :-) ). However, it's questionable whether that is an appropriate usage of NOTIFY. > BTW, I would like to see a spec for this "notify with parameter" feature > before it's implemented, not after. What information would you like to know? > Exactly what semantics do you have in mind? The current syntax I'm using is: NOTIFY condition_name [ [WITH MESSAGE] 'my message' ]; But I'm open to suggestions for improvement. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway <nconway@klamath.dyndns.org> writes: >> BTW, I would like to see a spec for this "notify with parameter" feature >> before it's implemented, not after. > The current syntax I'm using is: > NOTIFY condition_name [ [WITH MESSAGE] 'my message' ]; Hm. How are you going to transmit that to the client side without changing the FE/BE protocol? (While we will no doubt find reasons to change the protocol in the future, I'm not eager to force a protocol update right now; at least not without more reason than just NOTIFY parameters.) If we want to avoid a protocol break then it seems like the value transmitted to the client has to be a single string. I guess we could say that what's transmitted is a single string in the formcondition_name.additional_text (or pick some other delimiter instead of dot, but doesn't seem like it matters much). Pretty grotty though. Another thought that comes to mind is that we could reinterpret the parameter of LISTEN as a pattern to match against the strings generated by NOTIFY --- then there's no need to draw a hard-and-fast distinction between condition name and parameter text; it's all in the eye of the beholder. However it's tough to see how to do this without breaking backwards compatibility at the syntax level --- you'd really want LISTEN to be accepting a string literal, rather than a name, to make this happen. That brings up the more general point that you'd want at least the "message" part of NOTIFY to be computable as an SQL expression, not just a literal. It might be entertaining to try to reimplement NOTIFY as something that's internally like a SELECT, just with a funny data destination. I find this attractive because if it were a SELECT then it could have (at least on the inside) a WHERE clause, which'd make it possible to handle NOTIFYs in conditional rules in a less broken fashion than we do now. regards, tom lane
> > Exactly what semantics do you have in mind? > > The current syntax I'm using is: > > NOTIFY condition_name [ [WITH MESSAGE] 'my message' ]; > > But I'm open to suggestions for improvement. Have you considered visiting the oracle site and finding their documentation for their NOTIFY statement and making sure you're using compatible syntax? They might have extra stuff as well. Chris
On Thu, 2002-03-21 at 23:41, Christopher Kings-Lynne wrote: > > > Exactly what semantics do you have in mind? > > > > The current syntax I'm using is: > > > > NOTIFY condition_name [ [WITH MESSAGE] 'my message' ]; > > > > But I'm open to suggestions for improvement. > > Have you considered visiting the oracle site and finding their documentation > for their NOTIFY statement and making sure you're using compatible syntax? Oracle's implementation uses a completely different syntax to begin with: it's called DBMS_ALERT. > They might have extra stuff as well. From a brief scan of their docs, it doesn't look like it. In fact, their implementation seems to be worse than PostgreSQL's in at least one respect: "A waiting application is blocked in the database and cannot do any other work." Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
> On Thu, 2002-03-21 at 23:41, Christopher Kings-Lynne wrote: > > > > Exactly what semantics do you have in mind? > > > > > > The current syntax I'm using is: > > > > > > NOTIFY condition_name [ [WITH MESSAGE] 'my message' ]; > > > > > > But I'm open to suggestions for improvement. > > > > Have you considered visiting the oracle site and finding their > documentation > > for their NOTIFY statement and making sure you're using > compatible syntax? > > Oracle's implementation uses a completely different syntax to begin > with: it's called DBMS_ALERT. OK - not Oracle then. Didn't you say some other db did it - what about their syntax? Chris
On Fri, 2002-03-22 at 06:40, Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > >> BTW, I would like to see a spec for this "notify with parameter" feature > >> before it's implemented, not after. > > > The current syntax I'm using is: > > NOTIFY condition_name [ [WITH MESSAGE] 'my message' ]; > > Hm. How are you going to transmit that to the client side without > changing the FE/BE protocol? (While we will no doubt find reasons > to change the protocol in the future, I'm not eager to force a protocol > update right now; at least not without more reason than just NOTIFY > parameters.) If we want to avoid a protocol break then it seems > like the value transmitted to the client has to be a single string. > > I guess we could say that what's transmitted is a single string in > the form > condition_name.additional_text > (or pick some other delimiter instead of dot, but doesn't seem like > it matters much). Pretty grotty though. > > Another thought that comes to mind is that we could reinterpret the > parameter of LISTEN as a pattern to match against the strings generated > by NOTIFY --- then there's no need to draw a hard-and-fast distinction > between condition name and parameter text; it's all in the eye of the > beholder. That'ts what I suggested a few weeks ago in a well hidden message at the end of reply to somewhat related question ;) > However it's tough to see how to do this without breaking > backwards compatibility at the syntax level --- you'd really want LISTEN > to be accepting a string literal, rather than a name, to make this > happen. Can't we accept both - name for simple things and string for regexes. > That brings up the more general point that you'd want at least > the "message" part of NOTIFY to be computable as an SQL expression, > not just a literal. I think this should be any expression that returns text. I even wouldnt mind if I had to use explicit insert: insert into pg_notify select relname || '.' || cast(myobjectid as text), listenerpid from pg_listener where 'inv' ~ relname Just the delivery has to be automatic. > It might be entertaining to try to reimplement > NOTIFY as something that's internally like a SELECT, just with a > funny data destination. I thought that NOTIFY is implemented as an INSERT internally, no ? > I find this attractive because if it were > a SELECT then it could have (at least on the inside) a WHERE clause, > which'd make it possible to handle NOTIFYs in conditional rules in > a less broken fashion than we do now. -------------- Hannu
On Thu, 2002-03-21 at 22:41, Tom Lane wrote: > It might be a workable tradeoff to cope with > buffer overrun for LISTEN/NOTIFY by reporting notifies on all conditions > currently listened for. Assuming that overrun is infrequent, the net > performance gain from being able to use shared memory is probably worth > the occasional episode of wasted work. I've thought about this some more, and I don't think that solution will be sufficient. Spurious notifications seems like a pretty serious drawback, and I don't think they solve anything. As I mentioned earlier, if the event a notify signifies is non-trivial, this could have serious repercussions. But more importantly, what happens when the buffer overruns and we notify all backends? If a listening backend is in the middle of a transaction when it is notified, it just sets a flag and goes back to processing (i.e. it doesn't clear the buffer). If a listening backend is idle when it is notified, it checks the buffer: but since this is normal behavior, any idle & notified backend will have already checked the buffer! I don't see how the "notify everyone" scheme solves anything -- if a backend _could_ respond quickly, it also would already done so and we wouldn't have an overrun buffer in the first place. If we notify all backends and then clear the notification buffer, backends in the midst of a transaction will check the buffer when they finish their transaction but find it empty. Since this has the potential to destroy legitimate notifications, this is clearly not an option. Ultimately, we're just coming up with kludges to work around a fundamental flaw (we're using a static buffer for a dynamically sized resource). (Am I the only one who keeps running into shared memory limitations? :-) I can see two viable solutions: (1) Use the shared-memory-based buffer scheme you suggested. When a backend executes a NOTIFY, it stores it until transaction commit (as in current sources). When the transaction commits, it checks to see if there would be a buffer overflow if it added the NOTIFY to the buffer -- if so, it complains loudly to the log, and sleeps. When it awakens, it repeats (try to add to buffer; else, sleep). (2) The pg_notify scheme I suggested. It only marginally improves the situation, but it does preserve the behavior we have now. I think #1 isn't as bad as it might at first seem. The notification buffer only overflows in a rare (and arguably broken) situation: when the listening backend is in a (very) long-lived transaction, so that the notification buffer is never checked and eventually fills up. If we strongly suggest to application developers that they avoid this situation in the first place (by not starting long-running transactions in listening backends), and we also make the size of the buffer configurable, this situation is tolerable. Comments? Can anyone see a better solution? Is #1 reasonable behavior? Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway <nconway@klamath.dyndns.org> writes: > (1) Use the shared-memory-based buffer scheme you suggested. When a > backend executes a NOTIFY, it stores it until transaction commit (as in > current sources). When the transaction commits, it checks to see if > there would be a buffer overflow if it added the NOTIFY to the buffer -- > if so, it complains loudly to the log, and sleeps. When it awakens, it > repeats (try to add to buffer; else, sleep). This is NOT an improvement over the current arrangement. It implies that a notification might be postponed indefinitely, thereby allowing listeners to keep using stale data indefinitely. LISTEN/NOTIFY is basically designed for invalidate-your-cache arrangements (which is what led into this discussion originally, no?). In *any* caching arrangement, it is far better to have the occasional spurious data drop than to fail to drop stale data when you need to. Accordingly, a forced cache clear is an appropriate response to overrun of the communications buffer. I can certainly imagine applications where the messages are too important to trust to a not-fully-reliable transmission medium; but I don't think that LISTEN/NOTIFY should be loaded down with that sort of requirement. You can easily build 100% reliable (and correspondingly slow and expensive) communications mechanisms using standard SQL operations. I think the design center for LISTEN/NOTIFY should be exactly the case of maintaining client-side caches --- at least that's what I used it for when I had occasion to use it, several years ago when I first got involved with Postgres. And for that application, a cheap mechanism that never loses a notification, but might occasionally over-notify, is just what you want. regards, tom lane
What if we used a combination of the two approaches? That is, when an overflow occurs, overflow into a table? That way, nothing is lost and spurious random events don't have to occur. That way, things are faster when overflows are not occurring. When the system gets too far behind, it simply overflows into the the existing table until the system can catch up. This way, we don't have to waste resources notifying listens that would otherwise not need to be notified. Greg On Fri, 2002-03-22 at 23:13, Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > (1) Use the shared-memory-based buffer scheme you suggested. When a > > backend executes a NOTIFY, it stores it until transaction commit (as in > > current sources). When the transaction commits, it checks to see if > > there would be a buffer overflow if it added the NOTIFY to the buffer -- > > if so, it complains loudly to the log, and sleeps. When it awakens, it > > repeats (try to add to buffer; else, sleep). > > This is NOT an improvement over the current arrangement. It implies > that a notification might be postponed indefinitely, thereby allowing > listeners to keep using stale data indefinitely. > > LISTEN/NOTIFY is basically designed for invalidate-your-cache > arrangements (which is what led into this discussion originally, no?). > In *any* caching arrangement, it is far better to have the occasional > spurious data drop than to fail to drop stale data when you need to. > Accordingly, a forced cache clear is an appropriate response to > overrun of the communications buffer. > > I can certainly imagine applications where the messages are too > important to trust to a not-fully-reliable transmission medium; > but I don't think that LISTEN/NOTIFY should be loaded down with > that sort of requirement. You can easily build 100% reliable > (and correspondingly slow and expensive) communications mechanisms > using standard SQL operations. I think the design center for > LISTEN/NOTIFY should be exactly the case of maintaining client-side > caches --- at least that's what I used it for when I had occasion > to use it, several years ago when I first got involved with Postgres. > And for that application, a cheap mechanism that never loses a > notification, but might occasionally over-notify, is just what you > want. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Greg Copeland <greg@CopelandConsulting.Net> writes: > What if we used a combination of the two approaches? That is, when an > overflow occurs, overflow into a table? I think this is a really bad idea. The major problem with it is that the overflow path would be complex, infrequently exercised, and therefore almost inevitably buggy. (Look at all the problems we had for so long with SI overflow response. I'd still not like to have to swear there are none left.) Also, I do not think you could get away with merging listen/notify with the system cache inval mechanism if you wanted to have table overflow for listen/notify. SI is too low level --- to point out just one problem, a new backend's access to the SI message queue has to be initialized long before we are ready to do any table access. So you'd be requiring dedicated shared memory space just for listen/notify. That's a hard sell in my book. > That way, nothing is lost and spurious random events don't have to > occur. I think this argument is spurious. Almost any client-side caching arrangement is going to have cases where it's best to issue a "flush everything" kind of event rather than expend the effort to keep track of exactly what has to be invalidated by particular kinds of changes. As long as such changes are infrequent, you have better performance and better reliability by not trying to do the extra bookkeeping for exact invalidation. Why shouldn't the signal transport mechanism be able to do the same thing? Also, the notion that the NOTIFY mechanism can't be lossy misses the fact that you've got a perfectly good non-lossy mechanism at hand already: user tables. The traditional way of using NOTIFY has been to stick the important data into tables and use NOTIFY simply to cue listeners to look in those tables. I don't foresee this changing; it'll simply be possible to give somewhat finer-grain notification of what/where to look. I don't think that forcing NOTIFY to have the same kinds of semantics as SQL tables do is the right design approach. IMHO the only reason NOTIFY exists at all is to provide a simpler, higher-performance communication pathway than you can get with tables. regards, tom lane
On Sat, 2002-03-23 at 12:46, Tom Lane wrote: > Also, the notion that the NOTIFY mechanism can't be lossy misses the > fact that you've got a perfectly good non-lossy mechanism at hand > already: user tables. The traditional way of using NOTIFY has been > to stick the important data into tables and use NOTIFY simply to > cue listeners to look in those tables. I don't foresee this changing; > it'll simply be possible to give somewhat finer-grain notification of > what/where to look. I don't think that forcing NOTIFY to have the > same kinds of semantics as SQL tables do is the right design approach. > IMHO the only reason NOTIFY exists at all is to provide a simpler, > higher-performance communication pathway than you can get with tables. Okay, I agree (of course, it would be nice to have a more reliable NOTIFY mechanism, but I can't see of a way to implement a high-performance, reliable mechanism without at least one serious drawback). And as you rightly point out, there are other methods for people who need more reliability. So the new behavior of NOTIFY should be: when the notifying backend commits its transaction, the notification is stored in a shared memory buffer of fixed size, and the listening backend is sent a SIGUSR2. If the shared memory buffer is full, it is completely emptied. In the listening backend's SIGUSR2 signal handler, a flag is set and the backend goes back to its current transaction. When it becomes idle, it checks the shared buffer: if it can't find any matching elements in the buffer, it knows an overrun has occurred. When informing the front-end, a notification that results from an overrun is signified by a notification with a NULL message and with the PID of the notifying backend sent to some constant (say, -1). This informs the front-end that an overrun has occurred, so it can take appropriate action. Is this behavior acceptable to everyone? I can see 1 potential problem: there is a race condition in the "detect an overrun" logic. If an overrun occurs and the buffer is flushed but then another notification for one of the listening backends arrives, a backend will only inform the front-end about the most recent notification: there will be no indication that an overrun occurred, or that there were other legitimate notifications in the buffer before the overrun. It would be nice to be able to tell clients 100% "an overrun just occurred, be careful", but apparently that's not even possible. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway <nconway@klamath.dyndns.org> writes: > I can see 1 potential problem: there is a race condition in the "detect > an overrun" logic. Only if you do it that way :-(. Take another look at the SI messaging logic: it will *not* lose overrun notifications. regards, tom lane
Tom Lane wrote: > There is a very significant performance problem with LISTEN/NOTIFY > via pg_listener: in any application that generates notifications at > a significant rate, pg_listener will accumulate dead tuples at that > same rate, and we will soon find ourselves wasting lots of time > scanning through dead tuples. Frequent VACUUMs might help, but the That's unfortunate, may be if backend could reuse tuple on updates could help? > whole thing is really quite silly: why are we using a storage mechanism > that's designed entirely for *stable* storage of data to pass inherently > *transient* signals? If the system crashes, we have absolutely zero Because there is no other easy way to guarantee message delivery? > interest in the former contents of pg_listener (and indeed need to go > to some trouble to get rid of them). There is no free beer :) Regards, Mikhail Terekhov
On Wed, 3 Apr 2002, Mikhail Terekhov wrote: > > > Tom Lane wrote: > > > > There is a very significant performance problem with LISTEN/NOTIFY > > via pg_listener: in any application that generates notifications at > > a significant rate, pg_listener will accumulate dead tuples at that > > same rate, and we will soon find ourselves wasting lots of time > > scanning through dead tuples. Frequent VACUUMs might help, but the > > > That's unfortunate, may be if backend could reuse tuple on updates could help? There is already a TODO item to address this. But row reuse is the wrong solution to the problem. See below. > > > > whole thing is really quite silly: why are we using a storage mechanism > > that's designed entirely for *stable* storage of data to pass inherently > > *transient* signals? If the system crashes, we have absolutely zero > > > > Because there is no other easy way to guarantee message delivery? Shared memory is much easier and, to all intents and purposes, as reliable for this kind of usage. It is much faster and is the-right-way-to-do-it. I don't believe that the question 'what happens if there is a buffer overrun?' is a valid criticism of this approach. In the case of the backend cache invalidation system, the backends just blow away their cache to be on the safe side. A buffer overrun (rare as it would be, considering the different usage patterns of the shared memory for notification) would result in an elog(ERROR) from within the backend which has attempted to execute the notification. After all, running out of memory is an error in this case. Gavin
Gavin Sherry <swm@linuxworld.com.au> writes: >> Because there is no other easy way to guarantee message delivery? > Shared memory is much easier and, to all intents and purposes, as reliable > for this kind of usage. It is much faster and is the-right-way-to-do-it. Right. Since we do not attempt to preserve NOTIFY messages over a system crash, there's no good reason to keep the messages in a table. Except for the problem that shared memory is of limited size. But if we are willing to define the semantics in a way that allows buffer overflow recovery, that can be dealt with. > A buffer overrun (rare as it would be, > considering the different usage patterns of the shared memory for > notification) would result in an elog(ERROR) from within the backend which > has attempted to execute the notification. Hmm. That's a different way of attacking the overflow problem. I don't much care for it, but I can see that some applications might prefer this behavior to cache-style overrun response (ie, issue forced NOTIFYs on all conditions). Maybe support both ways? regards, tom lane
Gavin Sherry wrote: > On Wed, 3 Apr 2002, Mikhail Terekhov wrote: >> >>Tom Lane wrote: >> >>>There is a very significant performance problem with LISTEN/NOTIFY >>>via pg_listener: in any application that generates notifications at >>>a significant rate, pg_listener will accumulate dead tuples at that >>>same rate, and we will soon find ourselves wasting lots of time >>>scanning through dead tuples. Frequent VACUUMs might help, but the >>> >>That's unfortunate, may be if backend could reuse tuple on updates could help? > > There is already a TODO item to address this. But row reuse is the wrong > solution to the problem. See below. > It is not a solution to the whole LISTEN/NOTIFY problem, but it is a solution to the dead tuples accumulation. > >> >>>whole thing is really quite silly: why are we using a storage mechanism >>>that's designed entirely for *stable* storage of data to pass inherently >>>*transient* signals? If the system crashes, we have absolutely zero >>> >>Because there is no other easy way to guarantee message delivery? >> > > Shared memory is much easier and, to all intents and purposes, as reliable > for this kind of usage. It is much faster and is the-right-way-to-do-it. > That highly depends on WHAT-you-want-to-do :) If the new shared memory implementation will guarantee message delivery at the same degree as current implementation then it is the-right-way-to-do-it. If not then let's not broke existing functionality! Let's implement it as an additional functionality, say FASTNOTIFY or RIGHTNOTIFY ;)> > I don't believe that the question 'what happens if there is a buffer > overrun?' is a valid criticism of this approach. In the case of the > backend cache invalidation system, the backends just blow away their cache Forgive my ignorance, you mean sending backend? > to be on the safe side. A buffer overrun (rare as it would be, Regards, Mikhail
Tom Lane wrote: > LISTEN/NOTIFY is basically designed for invalidate-your-cache > arrangements (which is what led into this discussion originally, no?). Why do you think so? Even if you are right and original design was just for invalidate-your-cache arrangements, current implementation has much more functionality and can be used as a reliable message transmission mechanism (we use it that way). There is no reason to broke this reliability. > In *any* caching arrangement, it is far better to have the occasional > spurious data drop than to fail to drop stale data when you need to. > Accordingly, a forced cache clear is an appropriate response to > overrun of the communications buffer. > There are not only caching arrangements out there! This resembles me the difference between poll(2) and select(2). They are both useful in different cases. > I can certainly imagine applications where the messages are too > important to trust to a not-fully-reliable transmission medium; That is exactly what we are using LISTEN/NOTIFY for. We don't need separate message passing system, we don't need waste system resources polling database and application is simpler and easier to maintain. > but I don't think that LISTEN/NOTIFY should be loaded down with > that sort of requirement. You can easily build 100% reliable This functionality is already in Postgres. May be it is not perfect but why remove it? > (and correspondingly slow and expensive) communications mechanisms > using standard SQL operations. I think the design center for Could you please elaborate on how to do that without polling? > LISTEN/NOTIFY should be exactly the case of maintaining client-side > caches --- at least that's what I used it for when I had occasion > to use it, several years ago when I first got involved with Postgres. > And for that application, a cheap mechanism that never loses a > notification, but might occasionally over-notify, is just what you > want. > Again, client side cache is not the only one application of LISTEN/NOTIFY. If we need a cheap mechanism for maintaining client side cache let's implement one. Why remove existing functionality!
Mikhail Terekhov <terekhov@emc.com> writes: > Why do you think so? Even if you are right and original design was > just for invalidate-your-cache arrangements, current implementation > has much more functionality and can be used as a reliable message > transmission mechanism (we use it that way). It is *not* reliable, at least not in the sense of "the message is guaranteed to be delivered even if there's a system crash". Which is the normal meaning of "reliable" in SQL environments. If you want that level of reliability, you need to pass your messages by storing them in a regular table. LISTEN/NOTIFY can optimize your message passing by avoiding unnecessary polling of the table in the normal no-crash case. But they are not a substitute for having a table, and I don't see a reason to bog them down with an intermediate level of reliability that isn't buying anything. regards, tom lane
Tom Lane wrote: > It is *not* reliable, at least not in the sense of "the message is > guaranteed to be delivered even if there's a system crash". Which is > the normal meaning of "reliable" in SQL environments. If you want that That is exactly what I mean by "reliable". Please correct me if I'm wrong but the buffer overrun problem in the new LISTEN/NOTOFY mechanism means that it is perfectly possible that sending backend may drop all or some of the pending NOTIFY messages in case of such an overrun. If this is the case then this new mechanism would be step backward in terms of functionality relative to the current implementation. There will be no guaranty even in a no-crash case. > level of reliability, you need to pass your messages by storing them > in a regular table. > That is exactly what I do in my application. I store messages in a regular table and then send a notify to other clients. But I'd like to have a guaranty that without system crash all my notifies will be delivered. I use this method when I need to send some additional information except the notice's name. Another case is similar to your cache invalidation example. The big difference is that I need to maintain a kind of cache for the large number of big tables and I need to know promptly when these tables change. I can't afford to update this cache frequently enough in case of polling. And when there is no NOTIFY delivery guaranty the only solution is polling. Occasional delivery of NOTIFY messages may only improve in some sense the polling strategy. One can not rely on them. > LISTEN/NOTIFY can optimize your message passing by avoiding unnecessary > polling of the table in the normal no-crash case. But they are not a Guaranteed delivery in the normal no-crash case avoids polling completely in case of cache invalidation scenario. DB crash recovery is a very complex task for an application. Some time a recovery is not possible at all. But for cache invalidation a DB crash is nothing more than cache reinitialisation (you will get this crash notification without LISTEN/NOTIFY message ;) Even stronger: you can't receive a crash notification with LISTEN/NOTIFY mechanism). And again, this no-crash case guaranty is already here! We don't need to do anything! > substitute for having a table, and I don't see a reason to bog them down Sure their are not substitute, and I'm not the one who proposed to extend LISTEN/NOTIFY mechanism with additional information ;) This whole thread was started to extend LISTEN/NOTIFY mechanism to support optional messages. If we are agree that LISTEN/NOTIFY is not a substitute for having a table for such a messages, then what is the purpose to reimplement this feature with a loss of functionality? > with an intermediate level of reliability that isn't buying anything.> If you mean reliability in no-crash case then it gives a lot - it eliminates need for polling completely. And once again, we already have this level of reliability. What exactly PG will get with this new LISTEN/NOTIFY mechanism? If the profit has so great value, let's implement it as an additional feature, not as a replacement of the existing one with loss of functionality. Regards Mikhail Terekhov
Mikhail Terekhov <terekhov@emc.com> writes: > Please correct me if I'm wrong but the buffer overrun problem in the new > LISTEN/NOTOFY mechanism means that it is perfectly possible that sending > backend may drop all or some of the pending NOTIFY messages in case of such > an overrun. You would be guaranteed to get *some* notify. You wouldn't be guaranteed to receive the auxiliary info that's proposed to be added to the basic message type; also you might get notify reports for conditions that hadn't actually been signaled. > If this is the case then this new mechanism would be step > backward in terms of functionality relative to the current implementation. The current mechanism is hardly perfect; it drops multiple occurrences of the same NOTIFY. Yes, the behavior would be different, but that doesn't immediately translate to "a step backwards". > That is exactly what I do in my application. I store messages in a regular > table and then send a notify to other clients. But I'd like to have a > guaranty that without system crash all my notifies will be delivered. Please re-read the proposal. It will not break your application. regards, tom lane
On Tue, 9 Apr 2002, Tom Lane wrote: > Mikhail Terekhov <terekhov@emc.com> writes: > > Please correct me if I'm wrong but the buffer overrun problem in the new > > LISTEN/NOTOFY mechanism means that it is perfectly possible that sending > > backend may drop all or some of the pending NOTIFY messages in case of such > > an overrun. > > You would be guaranteed to get *some* notify. You wouldn't be > guaranteed to receive the auxiliary info that's proposed to be added to > the basic message type; also you might get notify reports for conditions > that hadn't actually been signaled. I poked around the notify code and had a think about the ideas which have been put forward. I think the buffer overrun issue can be addressed by allowing users to define the importance of the notify they are making. Eg: NOTIFY HARSH <condition> If there is to be a buffer overrun, all conditions are notified and the buffer is, eventually, reset. NOTIFY SAFE <condition> (Yes, bad keywords). This on the other hand would check if there is to be a buffer overrun and (after a SendPostmasterSignal(PMSIGNAL_WAKEN_CHILDREN) fails to reduce the buffer) it would invalidate the transaction with an elog(ERROR). This can be done since AtCommit_Notify() is run before RecordTransactionCommit(). This does not deal with recovery from a crash. The only way it could is by plugging the listen and notify signals into the xlog. This seems very messy though. Gavin