Thread: Listen / Notify - what to do when the queue is full
We still need to decide what to do with queue full situations in the proposed listen/notify implementation. I have a new version of the patch to allow for a variable payload size. However, the whole notification must fit into one page so the payload needs to be less than 8K. I have also added the XID, so that we can write to the queue before committing to clog which allows for rollback if we encounter write errors (disk full for example). Especially the implications of this change make the patch a lot more complicated. The queue is slru-based, slru uses int page numbers, so we can use up to 2147483647 (INT_MAX) pages with some small changes in slru.c. When do we have a full queue? Well, the idea is that notifications are written to the queue and that they are read as soon as the notifying transaction commits. Only if a listening backend is busy, it won't read the notifications and so it won't update its pointer for some time. With the current space we can acommodate at least 2147483647 notifications or more, depending on the payload length. That gives us something in between of 214 GB (100 Bytes per notification) and 17 TB (8000 Bytes per notification). So in order to have a full queue, we need to generate that amount of notifications while one backend is still busy and is not reading the accumulating notifications. In general chances are not too high that anyone will ever have a full notification queue, but we need to define the behavior anyway... These are the solutions that I currently see: 1) drop new notifications if the queue is full (silently or with rollback)2) block until readers catch up (what if the backendthat tries to write the notifications actually is the "lazy" reader that everybody is waiting for to proceed?)3)invent a new signal reason and send SIGUSR1 to the "lazy" readers, they need to interrupt whatever they aredoing and copy the notifications into their own address space (without delivering the notifications since they are in a transaction at thatmoment). For 1) there can be warnings way ahead of when the queue is actually full, like one when it is 50% full, another one when it is 75% full and so on and they could point to the backend that is most behind in reading notifications... I think that 2) is the least practical approach. If there is a pile of at least 2,147,483,647 notifications, then a backend hasn't read the notifications for a long long time... Chances are low that it will read them within the next few seconds. In a sense 2) implies 3) for the special case that the writing backend is the one that everybody is waiting for to proceed reading notifications, in the end this backend is waiting for itself. For 3) the question is if we can just invent a new signal reason PROCSIG_NOTIFYCOPY_INTERRUPT or similar and upon reception the backend copies the notification data to its private address space? Would this function be called by every backend after at most a few seconds even if it is processing a long running query? Admittedly, once 3) is in place we can also put a smaller queue into shared memory and remove the slru thing alltogether but we need to be sure that we can interrupt the backends at any time since the queue size would be a lot smaller than 200 GB... Joachim
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > We still need to decide what to do with queue full situations in > the proposed listen/notify implementation. I have a new version > of the patch to allow for a variable payload size. However, the > whole notification must fit into one page so the payload needs > to be less than 8K. That sounds fine to me, FWIW. > I have also added the XID, so that we can write to the queue before > committing to clog which allows for rollback if we encounter write > errors (disk full for example). Especially the implications of this > change make the patch a lot more complicated. Can you elaborate on the use case for this? > so it won't update its pointer for some time. With the current space we can > acommodate at least 2147483647 notifications or more, depending on the > payload length. That's a whole lot of notifications. I doubt any program out there is using anywhere near that number at the moment. In my applications, having a few hundred notifications active at one time is "a lot" in my book. :) > These are the solutions that I currently see: > > 1) drop new notifications if the queue is full (silently or with rollback) I like this one best, but not with silence of course. While it's not the most polite thing to do, this is for a super extreme edge case. I'd rather just throw an exception if the queue is full rather than start messing with the readers. It's a possible denial of service attack too, but so is the current implementation in a way - at least I don't think apps would perform very optimally with 2147483647 entries in the pg_listener table :) If you need some real-world use cases involving payloads, let me know, I've been waiting for this feature for some time and have it all mapped out. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200911160902 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAksBXC0ACgkQvJuQZxSWSsh5XQCg2qPh+MovjPAdbxTmlOGu51HF 6OYAn0f+tt6lXJhVKoAAmh1QlWfRC4kl =Izb1 -----END PGP SIGNATURE-----
On Mon, Nov 16, 2009 at 9:05 AM, Greg Sabino Mullane >> We still need to decide what to do with queue full situations in >> the proposed listen/notify implementation. I have a new version >> of the patch to allow for a variable payload size. However, the >> whole notification must fit into one page so the payload needs >> to be less than 8K. > > That sounds fine to me, FWIW. +1! I think this should satisfy everyone. >> I have also added the XID, so that we can write to the queue before >> committing to clog which allows for rollback if we encounter write >> errors (disk full for example). Especially the implications of this >> change make the patch a lot more complicated. > > Can you elaborate on the use case for this? Tom specifically asked for it: "The old implementation was acid so the new one should be to" >> so it won't update its pointer for some time. With the current space we can >> acommodate at least 2147483647 notifications or more, depending on the >> payload length. > > That's a whole lot of notifications. I doubt any program out there is using > anywhere near that number at the moment. In my applications, having a > few hundred notifications active at one time is "a lot" in my book. :) > >> These are the solutions that I currently see: >> >> 1) drop new notifications if the queue is full (silently or with rollback) > > I like this one best, but not with silence of course. While it's not the most > polite thing to do, this is for a super extreme edge case. I'd rather just > throw an exception if the queue is full rather than start messing with the > readers. It's a possible denial of service attack too, but so is the current > implementation in a way - at least I don't think apps would perform very > optimally with 2147483647 entries in the pg_listener table :) > > If you need some real-world use cases involving payloads, let me know, I've > been waiting for this feature for some time and have it all mapped out. me too. Joachim: when I benchmarked the original patch, I was seeing a few log messages that suggested there might be something going inside. In any event, the performance was fantastic. merlin
Greg Sabino Mullane wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > >> We still need to decide what to do with queue full situations in >> the proposed listen/notify implementation. I have a new version >> of the patch to allow for a variable payload size. However, the >> whole notification must fit into one page so the payload needs >> to be less than 8K. > > That sounds fine to me, FWIW. > Agreed. Thank you for all your work. >> 1) drop new notifications if the queue is full (silently or with rollback) > > I like this one best, but not with silence of course. While it's not the most > polite thing to do, this is for a super extreme edge case. I'd rather just > throw an exception if the queue is full rather than start messing with the +1 -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On Sun, Nov 15, 2009 at 7:19 PM, Joachim Wieland <joe@mcknight.de> wrote: > These are the solutions that I currently see: > 1) [...] > 2) [...] > 3) [...] 4) Allow readers to read uncommitted notifications as well. Instead of delivering them, the backends just copy them over into their own address space and deliver them later on... Going with option 4) allows readers to always read all notifications in the queue... This also allows a backend to send more notifications than the queue can hold. So we are only limited by the backends' memory. Every notification that is sent will eventually be delivered. The queue can still fill up if one of the backends is busy for a long long long time... Then the next writer just blocks and waits. Attached patch implements this behavior as well as a variable payload size, limited to 8000 characters. The variable payload also offers an automatic speed control... The smaller your notifications are, the more efficiently a page can be used and the faster you are. :-) Once we are fine that this is the way to go, I'll submit a documentation patch. Joachim
Attachment
Joachim Wieland <joe@mcknight.de> writes: > 4) Allow readers to read uncommitted notifications as well. The question that strikes me here is one of timing --- apparently, readers will now have to check the queue *without* having received a signal? That could amount to an unpleasant amount of extra overhead when the notify system isn't even in use. (Users who don't care about notify will define "unpleasant amount" as "not zero".) I haven't read the patch, so maybe you have some cute solution to that, but if so please explain what. regards, tom lane
On Mon, Nov 16, 2009 at 2:35 PM, Andrew Chernow <ac@esilo.com> wrote: > >>> 1) drop new notifications if the queue is full (silently or with >>> rollback) >> >> I like this one best, but not with silence of course. While it's not the >> most >> polite thing to do, this is for a super extreme edge case. I'd rather just >> throw an exception if the queue is full rather than start messing with the > > +1 So if you guys are going to insist on turning the notification mechanism isn't a queueing mechanism I think it at least behooves you to have it degrade gracefully into a notification mechanism and not become entirely useless by dropping notification messages. That is, if the queue overflows what you should do is drop the payloads and condense all the messages for a given class into a single notification for that class with "unknown payload". That way if a cache which wants to invalidate specific objects gets a queue overflow condition then at least it knows it should rescan the original data and rebuild the cache and not just serve invalid data. I still think you're on the wrong path entirely and will end up with a mechanism which serves neither use case very well instead of two separate mechanisms that are properly designed for the two use cases. -- greg
On Thu, Nov 19, 2009 at 1:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joachim Wieland <joe@mcknight.de> writes: >> 4) Allow readers to read uncommitted notifications as well. > > The question that strikes me here is one of timing --- apparently, > readers will now have to check the queue *without* having received > a signal? That could amount to an unpleasant amount of extra overhead > when the notify system isn't even in use. (Users who don't care about > notify will define "unpleasant amount" as "not zero".) The sequence in CommitTransaction() is like that: 1) add notifications to queue 2) commit to clog 3) signal backends Only those backends are signalled that listen to at least one channel, if the notify system isn't in use, then nobody will ever be signalled anyway. If a backend is reading a transaction id that has not yet committed, it will not deliver the notification. It knows that eventually it will receive a signal from that transaction and then it first checks its list of uncommitted notifications it has already read and then checks the queue for more pending notifications. Joachim
> > That is, if the queue overflows what you should do is drop the > payloads and condense all the messages for a given class into a single > notification for that class with "unknown payload". That way if a > cache which wants to invalidate specific objects gets a queue overflow > condition then at least it knows it should rescan the original data > and rebuild the cache and not just serve invalid data. > That's far more complicated than throwing an error and it discards user payload information. Let the error indicate a rescan is needed. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On 11/16/09 3:19 AM, Joachim Wieland wrote: > 1) drop new notifications if the queue is full (silently or with rollback) > 2) block until readers catch up (what if the backend that tries to write the > notifications actually is the "lazy" reader that everybody is waiting for to > proceed?) > 3) invent a new signal reason and send SIGUSR1 to the "lazy" readers, they > need to interrupt whatever they are doing and copy the > notifications into their > own address space (without delivering the notifications since they are in a > transaction at that moment). (4) drop *old* notifications if the queue is full. Since everyone has made the point that LISTEN is not meant to be a full queueing system, I have no problem dropping notifications LRU-style. If we've run out of room, the oldest notifications should go first; we probably don't care about them anyway. We should probably also log the fact that we ran out of room, so that the DBA knows that they ahve a design issue. For volume reasons, I don't think we want to log every dropped message. Alternately, it would be great to have a configuration option which would allow the DBA to choose any of 3 behaviors via GUC: drop-oldest (as above) drop-largest (if we run out of room, drop the largest payloads first to save space) error (if we run out of room, error and rollback) --Josh Berkus
> We should probably also log the fact that we ran out of room, so that > the DBA knows that they ahve a design issue. Can't they just bump allowed memory and avoid a redesign? > Alternately, it would be great to have a configuration option which > would allow the DBA to choose any of 3 behaviors via GUC: > > drop-oldest (as above) > drop-largest (if we run out of room, drop the largest payloads first to > save space) > error (if we run out of room, error and rollback) > I mentioned this up thread. I completely agree that overflow behavior should be tunable. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Joachim Wieland <joe@mcknight.de> writes: > The sequence in CommitTransaction() is like that: > 1) add notifications to queue > 2) commit to clog > 3) signal backends > Only those backends are signalled that listen to at least one channel, > if the notify system isn't in use, then nobody will ever be signalled > anyway. > If a backend is reading a transaction id that has not yet committed, > it will not deliver the notification. But you were saying that this patch would enable sending more data than would fit in the queue. How will that happen if the other backends don't look at the queue until you signal them? regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: > (4) drop *old* notifications if the queue is full. > Since everyone has made the point that LISTEN is not meant to be a full > queueing system, I have no problem dropping notifications LRU-style. NO, NO, NO, a thousand times no! That turns NOTIFY into an unreliable signaling system, and if I haven't made this perfectly clear yet, any such change will be committed over my dead body. If we are unable to insert a new message into the queue, the correct recourse is to fail the transaction that is trying to insert the *new* message. Not to drop messages from already-committed transactions. Failing the current transaction still leaves things in a consistent state, ie, you don't get messages from aborted transactions but that's okay because they didn't change the database state. I think Greg has a legitimate concern about whether this redesign reduces the usefulness of NOTIFY for existing use-cases, though. Formerly, since pg_listener would effectively coalesce notifies across multiple sending transactions instead of only one, it was impossible to "overflow the queue", unless maybe you managed to bloat pg_listener to the point of being out of disk space, and even that was pretty hard. There will now be a nonzero chance of transactions failing at commit because of queue full. If the chance is large this will be an issue. (Is it sane to wait for the queue to be drained?) BTW, did we discuss the issue of 2PC transactions versus notify? The current behavior of 2PC with notify is pretty cheesy and will become more so if we make this change --- you aren't really guaranteed that the notify will happen, even though the prepared transaction did commit. I think it might be better to disallow NOTIFY inside a prepared xact. regards, tom lane
Andrew Chernow <ac@esilo.com> writes: > I mentioned this up thread. I completely agree that overflow behavior should be > tunable. There is only one correct overflow behavior. regards, tom lane
Tom Lane wrote: > Andrew Chernow <ac@esilo.com> writes: >> I mentioned this up thread. I completely agree that overflow behavior should be >> tunable. > > There is only one correct overflow behavior. > I count three. 1. wait 2. error 3. skip #1 and #2 are very similar to a file system. If FS buffers are full on write, it makes you wait. In non-blocking mode, it throws an EAGAIN error. IMHO those two behaviors are totally acceptable for handling notify overflow. #3 is pretty weak but I *think* there are uses for it. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow <ac@esilo.com> writes: > Tom Lane wrote: >> There is only one correct overflow behavior. > I count three. Waiting till you can insert is reasonable (especially if we have some behavior that nudges other backends to empty the queue). If by "skip" you mean losing the notify but still committing, that's incorrect. There is no room for debate about that. regards, tom lane
Tom Lane wrote: > Andrew Chernow <ac@esilo.com> writes: >> Tom Lane wrote: >>> There is only one correct overflow behavior. > >> I count three. > > Waiting till you can insert is reasonable (especially if we have some > behavior that nudges other backends to empty the queue). If by "skip" > you mean losing the notify but still committing, that's incorrect. > There is no room for debate about that. Yeah like I said, skip felt weak. In regards to waiting, what would happen if other backends couldn't help empty the queue because they to are clogged? ISTM that any attempt to flush to other non-disk queues is doomed to possible overflows as well. Then what? Personally, I would just wait until room became available or the transaction was canceled. We could get fancy and tack a timeout value onto the wait. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow <ac@esilo.com> writes: > Personally, I would just wait until room became available or the transaction was > canceled. Works for me, as long as there's a CHECK_FOR_INTERRUPTS in there to allow a cancel to happen. The current patch seems to have a lot of pointless logging and no CHECK_FOR_INTERRUPTS ;-) regards, tom lane
On Wed, 18 Nov 2009 22:12:18 -0500 Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > (4) drop *old* notifications if the queue is full. > > > Since everyone has made the point that LISTEN is not meant to be a full > > queueing system, I have no problem dropping notifications LRU-style. > > NO, NO, NO, a thousand times no! > > That turns NOTIFY into an unreliable signaling system, and if I haven't > made this perfectly clear yet, any such change will be committed over my > dead body. > > If we are unable to insert a new message into the queue, the correct > recourse is to fail the transaction that is trying to insert the *new* > message. Not to drop messages from already-committed transactions. > Failing the current transaction still leaves things in a consistent > state, ie, you don't get messages from aborted transactions but that's > okay because they didn't change the database state. +1 And in addition i don't like the idea of having the sender sitting around until there's room for more messages in the queue, because some very old backends didn't remove the stuff from the same. So, yes, just failing the current transaction seems reasonable. We are talking about millions of messages in the queue ... Bye -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On Thu, Nov 19, 2009 at 4:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > There will now be a nonzero chance > of transactions failing at commit because of queue full. If the > chance is large this will be an issue. (Is it sane to wait for > the queue to be drained?) Exactly. The whole idea of putting the notification system to an slru queue was to make this nonzero chance a very-close-to-zero nonzero chance. Currently with pages from 0..0xFFFF we can have something between 160,000,000 (no payload) and 2,000,000 (biggest payload) notifications in the queue at the same time. We are free to remove the slru limitation by making slru.c work with 8 character file names. Then you can multiply both limits by 32,000 and then it should be very-close-to-zero, at least in my point of view... The actual queue-full behavior is then (or maybe is already now) just a theoretical aspect that we need to agree on to make the whole concept sound. The current patch would just wait until some space becomes available in the queue and it guarantees that no notification is lost. Furthermore it guarantees that a transaction can listen on an unlimited number of channels and that it can send an unlimited number of notifications, not related to the size of the queue. It can also send that unlimited number of notifications if it is one of the listeners of those notifications. The only real limit is now the backend's memory but as long as nobody proves that he needs unlimited notifications with a limited amount of memory we just keep it like that. I will add a CHECK_FOR_INTERRUPTS() and resubmit so that you can cancel a NOTIFY while the queue is full. Also I've put in an optimization to only signal those backends in a queue full situation that are not yet up-to-date (which will probably turn out to be only one backend - the slowest that is in a long running transaction - after some time...). > BTW, did we discuss the issue of 2PC transactions versus notify? > The current behavior of 2PC with notify is pretty cheesy and will > become more so if we make this change --- you aren't really > guaranteed that the notify will happen, even though the prepared > transaction did commit. I think it might be better to disallow > NOTIFY inside a prepared xact. Yes, I have been thinking about that also. So what should happen when you prepare a transaction that has sent a NOTIFY before? Joachim
On Thu, Nov 19, 2009 at 1:51 PM, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de> wrote: > And in addition i don't like the idea of having the sender sitting > around until there's room for more messages in the queue, because some > very old backends didn't remove the stuff from the same. The only valid reason why a backend has not processed the notifications in the queue must be a backend that is still in a transaction since then (and has executed LISTEN some time before). Joachim
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 >> (4) drop *old* notifications if the queue is full. >> >> Since everyone has made the point that LISTEN is not meant to be a full >> queueing system, I have no problem dropping notifications LRU-style. > > NO, NO, NO, a thousand times no! +1. Don't even think about going there. </me gives horrified shudder> ... > even that was pretty hard. There will now be a nonzero chance > of transactions failing at commit because of queue full. If the > chance is large this will be an issue. (Is it sane to wait for > the queue to be drained?) I think this chance will be pretty small - you need a *lot* of unread notifications before this edge case is reached, so I think we can be pretty severe in our response, and put the responsibility on cleanup on the user, rather than having the backend try to move things around, cleanup the queue selectively, etc. > BTW, did we discuss the issue of 2PC transactions versus notify? > The current behavior of 2PC with notify is pretty cheesy and will > become more so if we make this change --- you aren't really > guaranteed that the notify will happen, even though the prepared > transaction did commit. I think it might be better to disallow > NOTIFY inside a prepared xact. That's a tough one. On the one hand, simply stating that NOTIFY and 2PC don't play together in the docs would be a straightforward solution (and not a bad one, as 2PC is already rare and delicate and should not be used lightly). But what I really don't like the is the idea of a notify that *may* work or may not - so let's keep it boolean: it either works 100% of the time with 2PC, or doesn't at all. Should we throw a warning or error if a client attempts to combine the two? - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200911190857 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAksFTxEACgkQvJuQZxSWSsjkiACfYeevKZ0QngZcZXUoTPP6wXh6 iOMAoLvkPlEV6ywGqyaaloqQrnoryILU =rioB -----END PGP SIGNATURE-----
Joachim Wieland wrote: > On Thu, Nov 19, 2009 at 4:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, did we discuss the issue of 2PC transactions versus notify? >> The current behavior of 2PC with notify is pretty cheesy and will >> become more so if we make this change --- you aren't really >> guaranteed that the notify will happen, even though the prepared >> transaction did commit. I think it might be better to disallow >> NOTIFY inside a prepared xact. That will make anyone currently using 2PC with notify/listen unhappy. > Yes, I have been thinking about that also. So what should happen > when you prepare a transaction that has sent a NOTIFY before? From the user's point of view, nothing should happen at prepare. At a quick glance, it doesn't seem hard to support 2PC. Messages should be put to the queue at prepare, as just before normal commit, but the backends won't see them until they see that the XID has committed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Joachim Wieland wrote: >> On Thu, Nov 19, 2009 at 4:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yes, I have been thinking about that also. So what should happen >> when you prepare a transaction that has sent a NOTIFY before? > > From the user's point of view, nothing should happen at prepare. > > At a quick glance, it doesn't seem hard to support 2PC. Messages should > be put to the queue at prepare, as just before normal commit, but the > backends won't see them until they see that the XID has committed. Yeah, but if the server is restarted after the PREPARE but before the COMMIT, the notification will be lost, since all notification queue entries are lost upon restart with the slru design, no? best regards, Florian Pflug
Florian G. Pflug wrote: > Heikki Linnakangas wrote: >> At a quick glance, it doesn't seem hard to support 2PC. Messages should >> be put to the queue at prepare, as just before normal commit, but the >> backends won't see them until they see that the XID has committed. > > Yeah, but if the server is restarted after the PREPARE but before the > COMMIT, the notification will be lost, since all notification queue > entries are lost upon restart with the slru design, no? That's why they're stored in the 2PC state file in pg_twophase. See AtPrepare_Notify(). Hmm, thinking about this a bit more, I don't think the messages should be sent until commit (ie. 2nd phase). Although the information is safe in the state file, if anyone starts to LISTEN between the PREPARE TRANSACTION and COMMIT PREPARED calls, he would miss the notifications. I'm not sure if it's well-defined what happens if someone starts to LISTEN while another transaction has already sent a notification, but it would be rather surprising if such a window existed where it doesn't exist with non-prepared transactions. A better approach is to do something similar to what we do now: at prepare, just store the notifications in the state file like we do already. In notify_twophase_postcommit(), copy the messages to the shared queue. Although it's the same approach we have now, it becomes a lot cleaner with the patch, because we're not piggybacking the messages on the backend-private queue of the current transaction, but sending the messages directly on behalf of the prepared transaction being committed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > A better approach is to do something similar to what we do now: at > prepare, just store the notifications in the state file like we do > already. In notify_twophase_postcommit(), copy the messages to the > shared queue. Although it's the same approach we have now, it becomes a > lot cleaner with the patch, because we're not piggybacking the messages > on the backend-private queue of the current transaction, but sending the > messages directly on behalf of the prepared transaction being committed. This is still ignoring the complaint: you are creating a clear risk that COMMIT PREPARED will fail. I'm not sure that it's really worth it, but one way this could be made safe would be for PREPARE to "reserve" the required amount of queue space, such that nobody else could use it during the window from PREPARE to COMMIT PREPARED. On the whole I'd be just as happy to disallow NOTIFY in a 2PC transaction. We have no evidence that anyone out there is using the combination, and if they are, they can do the work to make it safe. regards, tom lane
On Thu, 19 Nov 2009 14:23:57 +0100 Joachim Wieland wrote: > On Thu, Nov 19, 2009 at 1:51 PM, Andreas 'ads' Scherbaum > <adsmail@wars-nicht.de> wrote: > > And in addition i don't like the idea of having the sender sitting > > around until there's room for more messages in the queue, because some > > very old backends didn't remove the stuff from the same. > > The only valid reason why a backend has not processed the > notifications in the queue > must be a backend that is still in a transaction since then (and has > executed LISTEN > some time before). Yes, i know. The same backend is probably causing more trouble anyway (blocking vacuum, xid wraparound, ...). -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> A better approach is to do something similar to what we do now: at >> prepare, just store the notifications in the state file like we do >> already. In notify_twophase_postcommit(), copy the messages to the >> shared queue. Although it's the same approach we have now, it >> becomes a lot cleaner with the patch, because we're not >> piggybacking the messages on the backend-private queue of the >> current transaction, but sending the messages directly on behalf of >> the prepared transaction being committed. > > This is still ignoring the complaint: you are creating a clear risk > that COMMIT PREPARED will fail. > > I'm not sure that it's really worth it, but one way this could be > made safe would be for PREPARE to "reserve" the required amount of > queue space, such that nobody else could use it during the window > from PREPARE to COMMIT PREPARED. I'd see no problem with "COMMIT PREPARED" failing, as long as it was possible to retry the COMMIT PREPARED at a later time. There surely are other failure cases for COMMIT PREPARED too, like an IO error that prevents the clog bit from being set, or a server crash half-way through COMMIT PREPARED. best regards, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Tom Lane wrote: >> This is still ignoring the complaint: you are creating a clear risk >> that COMMIT PREPARED will fail. > I'd see no problem with "COMMIT PREPARED" failing, as long as it was > possible to retry the COMMIT PREPARED at a later time. There surely are > other failure cases for COMMIT PREPARED too, like an IO error that > prevents the clog bit from being set, or a server crash half-way through > COMMIT PREPARED. Yes, there are failure cases that are outside our control. That's no excuse for creating one that's within our control. regards, tom lane
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Tom Lane wrote: >>> This is still ignoring the complaint: you are creating a clear >>> risk that COMMIT PREPARED will fail. > >> I'd see no problem with "COMMIT PREPARED" failing, as long as it >> was possible to retry the COMMIT PREPARED at a later time. There >> surely are other failure cases for COMMIT PREPARED too, like an IO >> error that prevents the clog bit from being set, or a server crash >> half-way through COMMIT PREPARED. > > Yes, there are failure cases that are outside our control. That's no > excuse for creating one that's within our control. True. On the other hand, people might prefer having to deal with (very unlikely) COMMIT PREPARED *transient* failures over not being able to use NOTIFY together with 2PC at all. Especially since any credible distributed transaction manager has to deal with COMMIT PREPARED failures anyway. Just my $0.02, though. best regards, Florian Pflug
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> A better approach is to do something similar to what we do now: at >> prepare, just store the notifications in the state file like we do >> already. In notify_twophase_postcommit(), copy the messages to the >> shared queue. Although it's the same approach we have now, it becomes a >> lot cleaner with the patch, because we're not piggybacking the messages >> on the backend-private queue of the current transaction, but sending the >> messages directly on behalf of the prepared transaction being committed. > > This is still ignoring the complaint: you are creating a clear risk > that COMMIT PREPARED will fail. > > I'm not sure that it's really worth it, but one way this could be made > safe would be for PREPARE to "reserve" the required amount of queue > space, such that nobody else could use it during the window from > PREPARE to COMMIT PREPARED. Hmm, ignoring 2PC for a moment, I think the patch suffers from a little race condition: Session 1: BEGIN; Session 1: INSERT INTO foo ..; Session 1: NOTIFY 'foo'; Session 1: COMMIT -- commit begins Session 1: [commit processing runs AtCommit_NotifyBeforeCommit()] Session 2: LISTEN 'foo'; Session 2: SELECT * FROM foo; Session 1: [AtCommit_NotifyAfterCommit() signals listening backends] Session 2: [waits for notifications] Because session 2 began listening after session 1 had already sent its notifications, it missed them. But the SELECT didn't see the INSERT, because the inserting transaction hadn't fully finished yet. The window isn't as small as it might seem at first glance, because the WAL is fsynced between the BeforeCommit and AfterCommit actions. I think we could fix that by arranging things so that a backend refrains from advancing its own 'pos' beyond the first notification it has written itself, until commit is completely finished. I'm not sure but might already be true if we don't receive interrupts between BeforeCommit and AfterCommit. LISTEN can then simply start reading from QUEUE_TAIL instead of QUEUE_HEAD, and in the above example session 2 will see the notifications sent by session 1. That will handle 2PC as well. We can send the notifications in prepare-phase, and any LISTEN that starts after the prepare-phase will see the notifications because they're still in the queue. There is no risk of running out of disk space in COMMIT PREPARED, because the notifications have already been written to disk. However, the notification queue can't be truncated until the prepared transaction finishes; does anyone think that's a show-stopper? > On the whole I'd be just as happy to disallow NOTIFY in a 2PC > transaction. We have no evidence that anyone out there is using the > combination, and if they are, they can do the work to make it safe. Yeah, I doubt we'd hear many complaints in practice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
greg@turnstep.com ("Greg Sabino Mullane") writes: >> BTW, did we discuss the issue of 2PC transactions versus notify? >> The current behavior of 2PC with notify is pretty cheesy and will >> become more so if we make this change --- you aren't really >> guaranteed that the notify will happen, even though the prepared >> transaction did commit. I think it might be better to disallow >> NOTIFY inside a prepared xact. > > That's a tough one. On the one hand, simply stating that NOTIFY and 2PC > don't play together in the docs would be a straightforward solution > (and not a bad one, as 2PC is already rare and delicate and should not > be used lightly). But what I really don't like the is the idea of a > notify that *may* work or may not - so let's keep it boolean: it either > works 100% of the time with 2PC, or doesn't at all. Should we throw > a warning or error if a client attempts to combine the two? +1 from me... It should either work, or not work, as opposed to something nondeterministic. While it's certainly a nice thing for features to be orthogonal, and for interactions to "just work," I can see making a good case for NOTIFY and 2PC not playing together. -- select 'cbbrowne' || '@' || 'gmail.com'; http://linuxfinances.info/info/slony.html Why isn't phonetic spelled the way it sounds?
On Thu, Nov 19, 2009 at 6:55 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Hmm, ignoring 2PC for a moment, I think the patch suffers from a little > race condition: > > Session 1: BEGIN; > Session 1: INSERT INTO foo ..; > Session 1: NOTIFY 'foo'; > Session 1: COMMIT -- commit begins > Session 1: [commit processing runs AtCommit_NotifyBeforeCommit()] ----> Session 2 must not read uncommited notifications selectively > Session 2: LISTEN 'foo'; > Session 2: SELECT * FROM foo; > Session 1: [AtCommit_NotifyAfterCommit() signals listening backends] > Session 2: [waits for notifications] > > Because session 2 began listening after session 1 had already sent its > notifications, it missed them. I think you are right. However note that session 1 does not actively send notifications to anybody, it just puts them into the queue. It's every backend's own job to process the queue and see which messages are interesting and which are not. The example you brought up fails if Session 2 disregards the notifications based on the current set of channels that it is listening to at this point. If I understand you correctly what you are suggesting is to not read uncommitted notifications from the queue in a reading backend or read all notifications (regardless of which channel it has been sent to), such that the backend can apply the check ("Am i listening on this channel?") later on. > I think we could fix that by arranging things so that a backend refrains > from advancing its own 'pos' beyond the first notification it has > written itself, until commit is completely finished. In the end this is similar to the idea to not read uncommitted notifications which was what I did at the beginning. However then you run into a full queue a lot faster. Imagine a queue length of 1000 with 3 transactions writing 400 notifications each... All three might fail if they run in parallel, even though space would be sufficient for at least two of them, and if they are executed in a sequence, all of them could deliver their notifications. Given your example, what I am proposing now is to stop reading from the queue once we see a not-yet-committed notification but once the queue is full, read the uncommitted notifications, effectively copying them over into the backend's own memory... Once the transaction commits and sends a signal, we can process, send and discard the previously copied notifications. In the above example, at some point one, two or all three backends would see that the queue is full and everybody would read the uncommitted notifications of the other one, copy them into the own memory and space will be freed in the queue. > That will handle 2PC as well. We can send the notifications in > prepare-phase, and any LISTEN that starts after the prepare-phase will > see the notifications because they're still in the queue. There is no > risk of running out of disk space in COMMIT PREPARED, because the > notifications have already been written to disk. However, the > notification queue can't be truncated until the prepared transaction > finishes; does anyone think that's a show-stopper? Note that we don't preserve notifications when the database restarts. But 2PC can cope with restarts. How would that fit together? Also I am not sure how you are going to deliver notifications that happen between the PREPARE TRANSACTION and the COMMIT PREPARED (because you have only one queue pointer which you are not going to advance...) ? Joachim
Joachim Wieland wrote: > The example you brought up fails if > Session 2 disregards the notifications based on the current set of > channels that it is listening to at this point. Right. Session 2 might not be listening at all yet. > If I understand you > correctly what you are suggesting is to not read uncommitted > notifications from the queue in a reading backend or read all > notifications (regardless of which channel it has been sent to), such > that the backend can apply the check ("Am i listening on this > channel?") later on. Right. > Note that we don't preserve notifications when the database restarts. > But 2PC can cope with restarts. How would that fit together? The notifications are written to the state file at prepare. They can be recovered from there and written to the queue again at server start (see twophase_rmgr.c). > Also I am > not sure how you are going to deliver notifications that happen > between the PREPARE TRANSACTION and the COMMIT PREPARED (because you > have only one queue pointer which you are not going to advance...) ? Yeah, that's a problem. One uncommitted notification will block all others too. In theory you have the same problem without 2PC, but it's OK because you don't expect one COMMIT to take much longer to finish than others. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Nov 20, 2009 at 7:51 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> Note that we don't preserve notifications when the database restarts. >> But 2PC can cope with restarts. How would that fit together? > > The notifications are written to the state file at prepare. They can be > recovered from there and written to the queue again at server start (see > twophase_rmgr.c). Okay, but which of the backends would then leave its pointer at that place in the queue upon restart? This is also an issue for the non-restart case, what if you prepare the transaction in one backend and commit in the other? Joachim
On Thu, Nov 19, 2009 at 11:04 PM, Joachim Wieland <joe@mcknight.de> wrote: > Given your example, what I am proposing now is to stop reading from > the queue once we see a not-yet-committed notification but once the > queue is full, read the uncommitted notifications, effectively copying > them over into the backend's own memory... Once the transaction > commits and sends a signal, we can process, send and discard the > previously copied notifications. In the above example, at some point > one, two or all three backends would see that the queue is full and > everybody would read the uncommitted notifications of the other one, > copy them into the own memory and space will be freed in the queue. Attached is the patch that implements the described modifications. Joachim
Attachment
Joachim Wieland wrote: > On Fri, Nov 20, 2009 at 7:51 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >>> Note that we don't preserve notifications when the database restarts. >>> But 2PC can cope with restarts. How would that fit together? >> The notifications are written to the state file at prepare. They can be >> recovered from there and written to the queue again at server start (see >> twophase_rmgr.c). > > Okay, but which of the backends would then leave its pointer at that > place in the queue upon restart? > > This is also an issue for the non-restart case, what if you prepare > the transaction in one backend and commit in the other? The dummy procs that represent prepared transactions need to be treated as backends. Each prepared transaction needs a slot of its own in the backends array. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Joachim Wieland wrote: > On Thu, Nov 19, 2009 at 11:04 PM, Joachim Wieland <joe@mcknight.de> wrote: >> Given your example, what I am proposing now is to stop reading from >> the queue once we see a not-yet-committed notification but once the >> queue is full, read the uncommitted notifications, effectively copying >> them over into the backend's own memory... Once the transaction >> commits and sends a signal, we can process, send and discard the >> previously copied notifications. In the above example, at some point >> one, two or all three backends would see that the queue is full and >> everybody would read the uncommitted notifications of the other one, >> copy them into the own memory and space will be freed in the queue. > > Attached is the patch that implements the described modifications. That's still not enough if session 2 that issues the LISTEN wasn't previously subscribed to any channels. It will still miss the notification. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2009-11-20 at 10:35 +0100, Joachim Wieland wrote: > On Thu, Nov 19, 2009 at 11:04 PM, Joachim Wieland <joe@mcknight.de> wrote: > > Given your example, what I am proposing now is to stop reading from > > the queue once we see a not-yet-committed notification but once the > > queue is full, read the uncommitted notifications, effectively copying > > them over into the backend's own memory... Once the transaction > > commits and sends a signal, we can process, send and discard the > > previously copied notifications. In the above example, at some point > > one, two or all three backends would see that the queue is full and > > everybody would read the uncommitted notifications of the other one, > > copy them into the own memory and space will be freed in the queue. > > Attached is the patch that implements the described modifications. This is a first-pass review. Comments: * Why don't we read all notifications into backend-local memory at every opportunity? It looks like sometimes it's only reading the committed ones, and I don't see the advantage of leaving it in the SLRU. Code comments: * I see a compiler warning: ipci.c: In function ‘CreateSharedMemoryAndSemaphores’: ipci.c:222: warning: implicit declaration of function ‘AsyncShmemInit’* sanity_check test fails, needs updating* guc testfails, needs updating* no docs The overall design looks pretty good to me. From my very brief pass over the code, it looks like:* When the queue is full, the transaction waits until there is room before it's committed. This may have an effect on 2PC, but the consensus seemed to be that we could restrict the combination of 2PC + NOTIFY. This also introduces the possibility of starvation, I suppose, but that seems remote.* When the queue is full, the inserter tries to signal the listening backends, and tries to make room in the queue.* Backends read the notifications when signaled, or when inserting (in case the inserting backend is also the one preventing the queue from shrinking). I haven't looked at everything yet, but this seems like it's in reasonable shape from a high level. Joachim, can you clean the patch up, include docs, and fix the tests? If so, I'll do a full review. Regards,Jeff Davis
Hi Jeff, the current patch suffers from what Heikki recently spotted: If one backend is putting notifications in the queue and meanwhile another backend executes LISTEN and commits, then this listening backend committed earlier and is supposed to receive the notifications of the notifying backend - even though its transaction started later. I have a new version that deals with this problem but I need to clean it up a bit. I am planning to post it this week. On Mon, Nov 30, 2009 at 6:15 AM, Jeff Davis <pgsql@j-davis.com> wrote: > * Why don't we read all notifications into backend-local memory at > every opportunity? It looks like sometimes it's only reading the > committed ones, and I don't see the advantage of leaving it in the SLRU. Exactly because of the problem above we cannot do it. Once the notification is removed from the queue, then no other backend can execute a LISTEN anymore because there is no way for it to get that information. Also we'd need to read _all_ notifications, not only the committed ones because we don't know what our backend will LISTEN to in the future. On the other hand, reading uncommitted notifications guarantees that we can send an unlimited number of notifications (limited by main memory) and that we don't run into a full queue in this example: Queue length: 1000 3 notifying backends, 400 notifications to be sent by each backend. If all of them send their notifications at the same time, we risk that all three run into a full queue... We could still preserve that behavior on the cost that we allow LISTEN to block until the queue is within its limits again. > * When the queue is full, the inserter tries to signal the listening > backends, and tries to make room in the queue. > * Backends read the notifications when signaled, or when inserting (in > case the inserting backend is also the one preventing the queue from > shrinking). Exactly, but it doesn't solve the problem described above. :-( ISTM that we have two options: a) allow LISTEN to block if the queue is full - NOTIFY will never fail (but block as well) and will eventually succeed b) NOTIFY could fail and make the transaction roll back - LISTEN always succeeds immediately Again: This is corner-case behavior and only happens after some hundreds of gigabytes of notifications have been put to the queue and have not yet been processed by all listening backends. I like a) better, but b) is easier to implement... > I haven't looked at everything yet, but this seems like it's in > reasonable shape from a high level. Joachim, can you clean the patch up, > include docs, and fix the tests? If so, I'll do a full review. As soon as everybody is fine with the approach, I will work on the docs patch. Joachim
On Mon, 2009-11-30 at 14:14 +0100, Joachim Wieland wrote: > I have a new version that deals with this problem but I need to clean > it up a bit. I am planning to post it this week. Are planning to send a new version soon? As it is, we're 12 days from the end of this commitfest, so we don't have much time to hand the patch back and forth before we're out of time. Regards,Jeff Davis
Jeff Davis wrote: <blockquote cite="mid:1259907460.19545.158.camel@jdavis" type="cite"><pre wrap="">On Mon, 2009-11-30 at14:14 +0100, Joachim Wieland wrote: </pre><blockquote type="cite"><pre wrap="">I have a new version that deals with thisproblem but I need to clean it up a bit. I am planning to post it this week. </pre></blockquote><pre wrap=""> Are planning to send a new version soon? As it is, we're 12 days from the end of this commitfest, so we don't have much time to hand the patch back and forth before we're out of time. </pre></blockquote> Joachim has been making good progress here, but given the current code maturity vs. implementationcomplexity of this work my guess is that even if we got an updated version today it's not going to hit commitquality given the time left. I'm going to mark this one "returned with feedback", and I do hope that work continueson this patch so it's early in the queue for the final CommitFest for this version. It seems like it just needsa bit more time to let the design mature and get the kinks worked out and it could turn into a useful feature--I knowI've wanted NOTIFY with a payload for a years.<br /><br /><pre class="moz-signature" cols="72">-- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support <a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a> <a class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.com">www.2ndQuadrant.com</a> </pre>
Hi, On Mon, Dec 7, 2009 at 5:38 AM, Greg Smith <greg@2ndquadrant.com> wrote: > JI'm going to mark this one "returned with feedback", and I > do hope that work continues on this patch so it's early in the queue for the > final CommitFest for this version. It seems like it just needs a bit more > time to let the design mature and get the kinks worked out and it could turn > into a useful feature--I know I've wanted NOTIFY with a payload for a years. I am perfectly fine with postponing the patch to the next commitfest. To get some more feedback and to allow everyone to play with it, I am attaching the latest version of the patch. What has changed: Transactional processing is now hopefully correct: Examples: Backend 1: Backend 2: transaction starts NOTIFY foo; commit starts transaction starts LISTEN foo; commit starts commit to clog commit to clog => Backend 2 will receive Backend 1's notification. Backend 1: Backend 2: transaction starts NOTIFY foo; commit starts transaction starts UNLISTEN foo; commit starts commit to clog commit to clog => Backend 2 will not receive Backend 1's notification. This is done by introducing an additional "AsyncCommitOrderLock". It is grabbed exclusively from transactions that execute LISTEN / UNLISTEN and in shared mode for transactions that executed NOTIFY only. LISTEN/UNLISTEN transactions then register the XIDs of the NOTIFYing transactions that are about to commit at the same time in order to later find out which notifications are visible and which ones are not. If the queue is full, any other transaction that is trying to place a notification to the queue is rolled back! This is basically a consequence of the former. There are two warnings that will show up in the log once the queue is more than 50% full and another one if it is more than 75% full. The biggest threat to run into a full queue are probably backends that are LISTENing and are idle in transaction. I have added a function pg_listening() which just contains the names of the channels that a backend is listening to. I especially invite people who know more about the transactional stuff than I do to take a close look at what I have done regarding notification visibility. One open question regarding the payload is if we need to limit it to ASCII to not risk conversion issues between different backend character sets? The second open issue is what we should do regarding 2PC. These options have been brought up so far: 1) allow NOTIFY in 2PC but it can happen that the transaction needs to be rolled back if the queue is full 2) disallow NOTIFY in 2PC alltogether 3) put notifications to the queue on PREPARE TRANSACTION and make backends not advance their pointers further than those notifications but wait for the 2PC transaction to commit. 2PC transactions would never fail but you effectively stop the notification system until the 2PC transaction commits. Comments? Best regards, Joachim
Attachment
On Wed, Dec 9, 2009 at 5:43 AM, Joachim Wieland <joe@mcknight.de> wrote: > Hi, > > On Mon, Dec 7, 2009 at 5:38 AM, Greg Smith <greg@2ndquadrant.com> wrote: >> JI'm going to mark this one "returned with feedback", and I >> do hope that work continues on this patch so it's early in the queue for the >> final CommitFest for this version. It seems like it just needs a bit more >> time to let the design mature and get the kinks worked out and it could turn >> into a useful feature--I know I've wanted NOTIFY with a payload for a years. > > I am perfectly fine with postponing the patch to the next commitfest. To get > some more feedback and to allow everyone to play with it, I am attaching the > latest version of the patch. > > What has changed: > > Transactional processing is now hopefully correct: > > Examples: > > Backend 1: Backend 2: > > transaction starts > NOTIFY foo; > commit starts > transaction starts > LISTEN foo; > commit starts > commit to clog > commit to clog > > => Backend 2 will receive Backend 1's notification. > > > Backend 1: Backend 2: > > transaction starts > NOTIFY foo; > commit starts > transaction starts > UNLISTEN foo; > commit starts > commit to clog > commit to clog > > => Backend 2 will not receive Backend 1's notification. > > This is done by introducing an additional "AsyncCommitOrderLock". It is grabbed > exclusively from transactions that execute LISTEN / UNLISTEN and in shared mode > for transactions that executed NOTIFY only. LISTEN/UNLISTEN transactions then > register the XIDs of the NOTIFYing transactions that are about to commit > at the same time in order to later find out which notifications are visible and > which ones are not. > > If the queue is full, any other transaction that is trying to place a > notification to the queue is rolled back! This is basically a consequence of > the former. There are two warnings that will show up in the log once the queue > is more than 50% full and another one if it is more than 75% full. The biggest > threat to run into a full queue are probably backends that are LISTENing and > are idle in transaction. > > > I have added a function pg_listening() which just contains the names of the > channels that a backend is listening to. > > > I especially invite people who know more about the transactional stuff than I > do to take a close look at what I have done regarding notification visibility. > > > One open question regarding the payload is if we need to limit it to ASCII to > not risk conversion issues between different backend character sets? > > > The second open issue is what we should do regarding 2PC. These options have > been brought up so far: > > 1) allow NOTIFY in 2PC but it can happen that the transaction needs to be > rolled back if the queue is full > 2) disallow NOTIFY in 2PC alltogether > 3) put notifications to the queue on PREPARE TRANSACTION and make backends not > advance their pointers further than those notifications but wait for the > 2PC transaction to commit. 2PC transactions would never fail but you > effectively stop the notification system until the 2PC transaction commits. > > Comments? Joachim - This no longer applies - please rebase, repost, and add a link to the new version to the commitfest app. Jeff - Do you want to continue reviewing this for the next CommitFest, or hand off to another reviewer? Thanks, ...Robert
On Thu, 2010-01-07 at 13:40 -0500, Robert Haas wrote: > Joachim - This no longer applies - please rebase, repost, and add a > link to the new version to the commitfest app. > > Jeff - Do you want to continue reviewing this for the next CommitFest, > or hand off to another reviewer? Sure, I'll review it. Regards,Jeff Davis
On Thu, Jan 7, 2010 at 7:40 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Joachim - This no longer applies - please rebase, repost, and add a > link to the new version to the commitfest app. Updated patch attached. >From my point of view these are the current open questions: - is the general approach reasonable (i.e. to put notifications into an slru-based queue instead of into shared memory to allow for a better handling of notification bursts) - is the transactional behavior correct (see upthread) - do we need to limit the payload to pure ASCII ? I think yes, we need to. I also think we need to reject other payloads with elog(ERROR...). - how to deal with 2PC (see upthread) ? - how to deal with hot standby (has not been discussed yet) Joachim
Attachment
On Fri, Jan 8, 2010 at 7:48 AM, Joachim Wieland <joe@mcknight.de> wrote: > - do we need to limit the payload to pure ASCII ? I think yes, we need > to. I also think we need to reject other payloads with elog(ERROR...). Just noticed this...don't you mean UTF8? Are we going to force non English speaking users to send all payloads in English? merlin
Merlin Moncure wrote: > On Fri, Jan 8, 2010 at 7:48 AM, Joachim Wieland <joe@mcknight.de> wrote: >> - do we need to limit the payload to pure ASCII ? I think yes, we need >> to. I also think we need to reject other payloads with elog(ERROR...). > > Just noticed this...don't you mean UTF8? Are we going to force non > English speaking users to send all payloads in English? hmm ASCII only sounds weird though doing UTF8 would have obvious conversion problems in some circumstances - what about bytea (or why _do_ we have to limit this to something?). Stefan
Merlin Moncure <mmoncure@gmail.com> writes: > On Fri, Jan 8, 2010 at 7:48 AM, Joachim Wieland <joe@mcknight.de> wrote: >> - do we need to limit the payload to pure ASCII ? I think yes, we need >> to. I also think we need to reject other payloads with elog(ERROR...). > Just noticed this...don't you mean UTF8? Are we going to force non > English speaking users to send all payloads in English? No, he meant ASCII. Otherwise we're going to have to deal with encoding conversion issues. regards, tom lane
On Fri, Jan 8, 2010 at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Fri, Jan 8, 2010 at 7:48 AM, Joachim Wieland <joe@mcknight.de> wrote: >>> - do we need to limit the payload to pure ASCII ? I think yes, we need >>> to. I also think we need to reject other payloads with elog(ERROR...). > >> Just noticed this...don't you mean UTF8? Are we going to force non >> English speaking users to send all payloads in English? > > No, he meant ASCII. Otherwise we're going to have to deal with encoding > conversion issues. That seems pretty awkward...instead of forcing an ancient, useless to 90% of the world encoding, why not send bytea (if necessary hex/b64 encoded)? I'm just trying to imagine how databases encoded in non ascii superset encodings would use this feature... If we must use ascii, we should probably offer conversion functions to/from text, right? I definitely understand the principle of the utility of laziness, but is this a proper case of simply dumping the problem onto the user? merlin
Andrew Chernow wrote: >> conversion problems in some circumstances - what about bytea (or why >> _do_ we have to limit this to something?). >> > > I agree with bytea. Zero conversions and the most flexible. Payload > encoding/format should be decided by the user. yeah that is exactly why I think they this would be the most flexible option... Stefan
> conversion problems in some circumstances - what about bytea (or why > _do_ we have to limit this to something?). > I agree with bytea. Zero conversions and the most flexible. Payload encoding/format should be decided by the user. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Merlin Moncure <mmoncure@gmail.com> writes: > On Fri, Jan 8, 2010 at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> No, he meant ASCII. �Otherwise we're going to have to deal with encoding >> conversion issues. > That seems pretty awkward...instead of forcing an ancient, useless to > 90% of the world encoding, why not send bytea You mean declare the notify parameter as bytea instead of text, and dump all encoding conversion issues onto the user? We could do that I guess. I'm not convinced that it's either more functional or more convenient to use than a parameter that's declared as text and restricted to ASCII. > If we must use ascii, we should probably offer conversion functions > to/from text, right? There's encode() and decode(), which is what people would wind up using quite a lot of the time if we declare the parameter to be bytea. regards, tom lane
On Fri, Jan 8, 2010 at 4:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Fri, Jan 8, 2010 at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> No, he meant ASCII. Otherwise we're going to have to deal with encoding >>> conversion issues. > >> That seems pretty awkward...instead of forcing an ancient, useless to >> 90% of the world encoding, why not send bytea > > You mean declare the notify parameter as bytea instead of text, and dump > all encoding conversion issues onto the user? We could do that I guess. > I'm not convinced that it's either more functional or more convenient to > use than a parameter that's declared as text and restricted to ASCII. > >> If we must use ascii, we should probably offer conversion functions >> to/from text, right? > > There's encode() and decode(), which is what people would wind up using > quite a lot of the time if we declare the parameter to be bytea. all good points...IF notify payload can be result of expression so for non const cases where you are expecting non ascii data you can throw a quick encode. I had assumed it didn't (wrongly?) because current notify relname takes a strict literal, so this would work (If the below works, disregard the rest and I concede ascii is fine): notify test encode('some_string', 'hex'); A quick look at gram.y changes suggest this wont work if I read it correctly. How would someone from Japan issue a notify/payload from the console? merlin
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 >>> - do we need to limit the payload to pure ASCII ? I think yes, we need >>> to. I also think we need to reject other payloads with elog(ERROR...). >...[snip other followups] On the one hand, I don't see the problem with ASCII here - the payload is meant as a quick shorthand convenience, not a literal payload of important information. On the other, it should at least match the current rules for the listen and notify names themselves, which means allowing more than ASCII. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 201001102303 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAktKo40ACgkQvJuQZxSWSshg9ACg2uiDYuhBnRQqFS6Ej3O9VLcC 2TgAn035OrYcdERn4I1VI4NRQFBIcXZ/ =yJmK -----END PGP SIGNATURE-----
On mån, 2010-01-11 at 04:05 +0000, Greg Sabino Mullane wrote: > On the one hand, I don't see the problem with ASCII here - the > payload is meant as a quick shorthand convenience, not a literal payload > of important information. Is it not? The notify name itself is already a quick shorthand convenience. Who knows what the payload is actually meant for. Have use cases been presented and analyzed?
A use case : use NOTIFY in a rule to send the primary key of a row that has been updated (for instance to manage a cache). This requires a patch on top of this one, and it really is a separate concern, but I thought I'd give the use case anyway, since I believe it is relevant to the issues here. I can see four kinds of NOTIFY statements : 1) The existing case : NOTIFY channel 2) With Joachim's patch : NOTIFY channel 'payload' 3) My use case : NOTIFY channel 'pay'||'load' (actually NOTIFY channel '<table_name>#'||OLD.id) 4) Taken one step further : NOTIFY channel (SELECT payload FROM payloads WHERE ...) I'm working on a proof of concept patch to use Joachim's new notify function to introduce case 3. I think this means going through the planner and executor, so I might as well do case 4 as well. A use case I can see for case 4 is sending information in a rule or trigger about an updated object, when that information is stored in a separate table (versioning or audit information for example). Cases 1 and 2 could remain utility commands, while cases 3 and 4 could go through the planner and the executor, the notify plan node calling Joachim's new notify function on execution. Best regards, Arnaud Betremieux On 11/01/2010 07:58, Peter Eisentraut wrote: > On mån, 2010-01-11 at 04:05 +0000, Greg Sabino Mullane wrote: > >> On the one hand, I don't see the problem with ASCII here - the >> payload is meant as a quick shorthand convenience, not a literal payload >> of important information. >> > Is it not? The notify name itself is already a quick shorthand > convenience. Who knows what the payload is actually meant for. Have > use cases been presented and analyzed? > > > >
Arnaud Betremieux <arnaud.betremieux@keyconsulting.fr> writes: > 3) My use case : NOTIFY channel 'pay'||'load' (actually NOTIFY > channel '<table_name>#'||OLD.id) > 4) Taken one step further : NOTIFY channel (SELECT payload FROM payloads > WHERE ...) > I'm working on a proof of concept patch to use Joachim's new notify > function to introduce case 3. I think this means going through the > planner and executor, so I might as well do case 4 as well. It would be a lot less work to introduce a function like send_notify() that could be invoked within a regular SELECT. Pushing a utility statement through the planner/executor code path will do enough violence to the system design that such a patch would probably be rejected out of hand. regards, tom lane
Arnaud Betremieux wrote: > A use case : use NOTIFY in a rule to send the primary key of a row that > has been updated (for instance to manage a cache). > > This requires a patch on top of this one, and it really is a separate > concern, but I thought I'd give the use case anyway, since I believe it > is relevant to the issues here. > > I can see four kinds of NOTIFY statements : > > 1) The existing case : NOTIFY channel > 2) With Joachim's patch : NOTIFY channel 'payload' > 3) My use case : NOTIFY channel 'pay'||'load' (actually NOTIFY > channel '<table_name>#'||OLD.id) > 4) Taken one step further : NOTIFY channel (SELECT payload FROM payloads > WHERE ...) > I know I'd be looking to send utf8 and byteas. Can notify as it stands today take an expression for the payload (#4)? The other issue is that libpq expects a string, so if non-c-string safe data is to be sent a protocol change is needed or the server must hex encode all payloads before transit and libpq must decode it; also requiring an 'payload_len' member be added to PGnotify. The latter is better IMHO as protocol changes are nasty. Although, only needed to support bytea. If all we want is utf8, then there is no issue with libpq. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On Mon, Jan 11, 2010 at 8:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I'm working on a proof of concept patch to use Joachim's new notify >> function to introduce case 3. I think this means going through the >> planner and executor, so I might as well do case 4 as well. > > It would be a lot less work to introduce a function like send_notify() > that could be invoked within a regular SELECT. Pushing a utility +1 IMO, this neatly solves the problem and addresses some of the concerns I was raising upthread. A bytea version would be nice but a text only version is workable. merlin
On 11/01/2010 14:25, Tom Lane wrote: > Arnaud Betremieux<arnaud.betremieux@keyconsulting.fr> writes: > >> 3) My use case : NOTIFY channel 'pay'||'load' (actually NOTIFY >> channel '<table_name>#'||OLD.id) >> 4) Taken one step further : NOTIFY channel (SELECT payload FROM payloads >> WHERE ...) >> > >> I'm working on a proof of concept patch to use Joachim's new notify >> function to introduce case 3. I think this means going through the >> planner and executor, so I might as well do case 4 as well. >> > It would be a lot less work to introduce a function like send_notify() > that could be invoked within a regular SELECT. Pushing a utility > statement through the planner/executor code path will do enough violence > to the system design that such a patch would probably be rejected out of > hand. > Introducing a send_notify function does sound a lot simpler and cleaner, and I think I'll try it this way. The only thing that bothers me is the syntax : ... DO ALSO SELECT send_notify('payload') ... DO ALSO SELECT send_notify(a) FROM b How about a new grammar for NOTIFY <channel> a_expr, which would go through the rewriter to be transformed as a SELECT ? so NOTIFY (SELECT a FROM b) would become SELECT send_notify(SELECT a FROM b) ?
Initial comments: * compiler warnings ipci.c: In function ‘CreateSharedMemoryAndSemaphores’: ipci.c:228: warning: implicit declaration of function ‘AsyncShmemInit’ * 2PC Adds complexity, and I didn't see any clear, easy solution after reading the thread. I don't see this as a showstopper, so I'd leave this until later. * Hot Standby It would be nice to have NOTIFY work with HS, but I don't think that's going to happen this cycle. I don't think this is a showstopper, either. * ASCII? I tend to think that encoding should be handled properly here, or we should use BYTEA. My reasoning is that TEXT is a datatype, whereas ASCII is not, and I don't see a reason to invent it now. We might as well use real TEXT (with encoding support) or use BYTEA which works fine for ASCII anyway. * send_notify() Someone suggested a function like this, which sounds useful to me. Not a showstopper, but if it's not too difficult it might be worth including. (Incidentally, the function signature should match the type of the payload, which is unclear if we're going to use ASCII.) * AsyncCommitOrderLock This is a big red flag to me. The name by itself is scary. The fact that it is held across a series of fairly interesting operations is even scarier. As you say in the comments, it's acquired before recording the transaction commit in the clog, and released afterward. What you actually have is something like: AtCommit_NotifyBeforeCommit(); HOLD_INTERRUPTS(); s->state = TRANS_COMMIT; latestXid = RecordTransactionCommit(false); TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid); ProcArrayEndTransaction(MyProc, latestXid); CallXactCallbacks(XACT_EVENT_COMMIT); ResourceOwnerRelease(TopTransactionResourceOwner, RESOURCE_RELEASE_BEFORE_LOCKS, true, true); AtEOXact_Buffers(true); AtEOXact_RelationCache(true); AtEarlyCommit_Snapshot(); AtEOXact_Inval(true); smgrDoPendingDeletes(true); AtEOXact_MultiXact(); AtCommit_NotifyAfterCommit(); That is a lot of stuff happening between the acquisition and release. There are two things particularly scary about this (to me): * holding an LWLock while performing I/O (in smgrDoPendingDeletes()) * holding an LWLock while acquiring another LWLock (e.g. ProcArrayEndTransaction()) An LWLock-based deadlock is a hard deadlock -- not detected by the deadlock detector, and there's not much you can do even if it were -- right in the transaction-completing code. That means that the whole system would be locked up. I'm not sure that such a deadlock condition exists, but it seems likely (if not, it's only because other areas of the code avoided this practice), and hard to prove that it's safe. And it's probably bad for performance to increase the length of time transactions are serialized, even if only for cases that involve LISTEN/NOTIFY. I believe this needs a re-think. What is the real purpose for AsyncCommitOrderLock, and can we acheive that another way? It seems that you're worried about a transaction that issues a LISTEN and committing not getting a notification from a NOTIFYing transaction that commits concurrently (and slightly after the LISTEN). But SignalBackends() is called after transaction commit, and should signal all backends who committed a LISTEN before that time, right? * The transaction IDs are used because Send_Notify() is called before the AsyncCommitOrderLock acquire, and so the backend could potentially be reading uncommitted notifications that are "about" to be committed (or aborted). Then, the queue is not read further until that transaction completes. That's not really commented effectively, and I suspect the process could be simpler. For instance, why can't the backend always read all of the data from the queue, notifying if the transaction is committed and saving to a local list otherwise (which would be checked on the next wakeup)? * Can you clarify the big comment at the top of async.c? It's a helpful overview, but it glosses over some of the touchy synchronization steps going on. I find myself trying to make a map of these steps myself. Regards,Jeff Davis
Hi Jeff, thanks a lot for your review. I will reply to your review again in detail but I'd like to answer your two main questions already now. On Tue, Jan 19, 2010 at 8:08 AM, Jeff Davis <pgsql@j-davis.com> wrote: > * AsyncCommitOrderLock > > I believe this needs a re-think. What is the real purpose for > AsyncCommitOrderLock, and can we acheive that another way? It seems > that you're worried about a transaction that issues a LISTEN and > committing not getting a notification from a NOTIFYing transaction > that commits concurrently (and slightly after the LISTEN). Yes, that is exactly the point. However I am not worried about a notification getting lost but rather about determining the visibility of the notifications. In the end we need to be able to know about the order of LISTEN, UNLISTEN and NOTIFY commits to find out who should receive which notifications. As you cannot determine if xid1 has committed before or after xid2 retrospectively I enforced the order by an LWLock and by saving the list of xids currently being committed. There are also two examples in http://archives.postgresql.org/pgsql-hackers/2009-12/msg00790.php about that issue. > But SignalBackends() is called after transaction commit, and should signal > all backends who committed a LISTEN before that time, right? Yes, any listening backend is being signaled but that doesn't help to find out about the exact order of the almost-concurrent events that happened before. > * The transaction IDs are used because Send_Notify() is called before > the AsyncCommitOrderLock acquire, and so the backend could potentially > be reading uncommitted notifications that are "about" to be committed > (or aborted). Then, the queue is not read further until that transaction > completes. That's not really commented effectively, and I suspect the > process could be simpler. For instance, why can't the backend always > read all of the data from the queue, notifying if the transaction is > committed and saving to a local list otherwise (which would be checked > on the next wakeup)? It's true that the backends could always read up to the end of the queue and copy everything into the local memory. However you still need to apply the same checks before you deliver the notifications: You need to make sure that the transaction has committed and that you were listening to the channels of the notifications at the time they got sent / committed. Also you need to copy really _everything_ because you could start to listen to a channel after copying its uncommitted notifications. There are other reasons (tail pointer management and signaling strategy) but in the end it seemed more straightforward to stop as soon as we hit an uncommitted notification. We will receive a signal for it eventually anyway and can then start again and read further. Also I think (but I have no numbers about it) that it makes the backends work more on the same slru pages. Joachim
Regarding the send_notify function, I have been working on it and have a patch (attached) that applies on top of Joachim's. It introduces a send_notify SQL function that calls the Async_Notify C function. It's pretty straightforward and will need to be refined to take into account the encoding decisions that are made in Async_Notify, but it should be a good starting point, and it doesn't introduce much in the way of complexity. On a side note, since that might be a discussion for another thread, it would be nice and more DBA friendly, to have some kind of syntactic sugar around it to be able to use NOTIFY channel (SELECT col FROM table); instead of SELECT send_notify('channel', (SELECT col FROM table)); Best regards, Arnaud Betremieux On 19/01/2010 08:08, Jeff Davis wrote: > Initial comments: > > * compiler warnings > > ipci.c: In function ‘CreateSharedMemoryAndSemaphores’: > ipci.c:228: warning: implicit declaration of function ‘AsyncShmemInit’ > > * 2PC > > Adds complexity, and I didn't see any clear, easy solution after > reading the thread. I don't see this as a showstopper, so I'd leave > this until later. > > * Hot Standby > > It would be nice to have NOTIFY work with HS, but I don't think that's > going to happen this cycle. I don't think this is a showstopper, > either. > > * ASCII? > > I tend to think that encoding should be handled properly here, or we > should use BYTEA. My reasoning is that TEXT is a datatype, whereas ASCII > is not, and I don't see a reason to invent it now. We might as well use > real TEXT (with encoding support) or use BYTEA which works fine for > ASCII anyway. > > * send_notify() > > Someone suggested a function like this, which sounds useful to me. Not > a showstopper, but if it's not too difficult it might be worth > including. (Incidentally, the function signature should match the type > of the payload, which is unclear if we're going to use ASCII.) > > * AsyncCommitOrderLock > > This is a big red flag to me. The name by itself is scary. The fact > that it is held across a series of fairly interesting operations is > even scarier. > > As you say in the comments, it's acquired before recording the > transaction commit in the clog, and released afterward. What you > actually have is something like: > > AtCommit_NotifyBeforeCommit(); > > HOLD_INTERRUPTS(); > > s->state = TRANS_COMMIT; > > latestXid = RecordTransactionCommit(false); > > TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid); > > ProcArrayEndTransaction(MyProc, latestXid); > > CallXactCallbacks(XACT_EVENT_COMMIT); > > ResourceOwnerRelease(TopTransactionResourceOwner, > RESOURCE_RELEASE_BEFORE_LOCKS, > true, true); > > AtEOXact_Buffers(true); > > AtEOXact_RelationCache(true); > > AtEarlyCommit_Snapshot(); > > AtEOXact_Inval(true); > > smgrDoPendingDeletes(true); > > AtEOXact_MultiXact(); > > AtCommit_NotifyAfterCommit(); > > That is a lot of stuff happening between the acquisition and > release. There are two things particularly scary about this (to me): > > * holding an LWLock while performing I/O (in smgrDoPendingDeletes()) > > * holding an LWLock while acquiring another LWLock (e.g. > ProcArrayEndTransaction()) > > An LWLock-based deadlock is a hard deadlock -- not detected by the > deadlock detector, and there's not much you can do even if it were -- > right in the transaction-completing code. That means that the whole > system would be locked up. I'm not sure that such a deadlock condition > exists, but it seems likely (if not, it's only because other areas of > the code avoided this practice), and hard to prove that it's safe. And > it's probably bad for performance to increase the length of time > transactions are serialized, even if only for cases that involve > LISTEN/NOTIFY. > > I believe this needs a re-think. What is the real purpose for > AsyncCommitOrderLock, and can we acheive that another way? It seems > that you're worried about a transaction that issues a LISTEN and > committing not getting a notification from a NOTIFYing transaction > that commits concurrently (and slightly after the LISTEN). But > SignalBackends() is called after transaction commit, and should signal > all backends who committed a LISTEN before that time, right? > > * The transaction IDs are used because Send_Notify() is called before > the AsyncCommitOrderLock acquire, and so the backend could potentially > be reading uncommitted notifications that are "about" to be committed > (or aborted). Then, the queue is not read further until that transaction > completes. That's not really commented effectively, and I suspect the > process could be simpler. For instance, why can't the backend always > read all of the data from the queue, notifying if the transaction is > committed and saving to a local list otherwise (which would be checked > on the next wakeup)? > > * Can you clarify the big comment at the top of async.c? It's a helpful > overview, but it glosses over some of the touchy synchronization steps > going on. I find myself trying to make a map of these steps myself. > > Regards, > Jeff Davis > > >
Attachment
On Wed, 2009-12-09 at 11:43 +0100, Joachim Wieland wrote: > Examples: > > Backend 1: Backend 2: > > transaction starts > NOTIFY foo; > commit starts > transaction starts > LISTEN foo; > commit starts > commit to clog > commit to clog > > => Backend 2 will receive Backend 1's notification. How does the existing notification mechanism solve this problem? Is it really a problem? Why would Backend2 expect to receive the notification? > > Backend 1: Backend 2: > > transaction starts > NOTIFY foo; > commit starts > transaction starts > UNLISTEN foo; > commit starts > commit to clog > commit to clog > > => Backend 2 will not receive Backend 1's notification. This is the same problem, except that it doesn't matter. A spurious notification is not a bug, right? Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > How does the existing notification mechanism solve this problem? Is it > really a problem? Why would Backend2 expect to receive the notification? The intended way to use LISTEN/NOTIFY for status tracking is 1. LISTEN foo; (and commit the listen)2. examine current database state3. assume that we'll get a NOTIFY for any change thatcommits subsequently to what we saw in step 2 In the current implementation, a transaction that is in process of commit during step 1 might possibly not see your pg_listener record as committed, and so it might not send you a NOTIFY for whatever it did. If it still hasn't committed when you perform step 2, then you'd fail to see its changes as part of your initial state, *and* you'd not get a NOTIFY when the changes did become visible. The way we prevent this race condition is that a listener takes exclusive lock on pg_listener before entering its record, and doesn't release the lock until after committing. Notifiers likewise take exclusive lock. This serializes things so that either the modifying transaction commits before the listener completes step 1 (and hence the listener will see its updates in step 2), or the listener is guaranteed to have a committed record in pg_listener when the modifying process determines whom to notify. I guess Joachim is trying to provide a similar guarantee for the new implementation, but I'm not clear on why it would require locking. The new implementation is broadcast and ISTM it shouldn't require the modifying transaction to know which processes are listening. I haven't read the patch but I agree that the description you give is pretty scary from a performance standpoint. More locks around transaction commit doesn't seem like a good idea. If they're only taken when an actual LISTEN or NOTIFY has happened in the current transaction, that'd be okay (certainly no worse than what happens now) but the naming suggested that this'd happen unconditionally. regards, tom lane
On Tue, 2010-01-19 at 19:05 -0500, Tom Lane wrote: > I guess Joachim is trying to provide a similar guarantee for the new > implementation, but I'm not clear on why it would require locking. > The new implementation is broadcast and ISTM it shouldn't require the > modifying transaction to know which processes are listening. I think there is a better way. I'll dig into it a little more. > I haven't read the patch but I agree that the description you give is > pretty scary from a performance standpoint. More locks around > transaction commit doesn't seem like a good idea. I was also worried about holding multiple LWLocks at once -- is such practice generally avoided in the rest of the code? > If they're only taken > when an actual LISTEN or NOTIFY has happened in the current transaction, > that'd be okay (certainly no worse than what happens now) but the naming > suggested that this'd happen unconditionally. It appears that the locks are only taken when LISTEN or NOTIFY is involved. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > I was also worried about holding multiple LWLocks at once -- is such > practice generally avoided in the rest of the code? It's allowed but remember that there is no deadlock detection in lwlock.c. You must be very certain that there is only one possible order in which such locks could be taken. Interactions with heavyweight locks would be bad news as well. > It appears that the locks are only taken when LISTEN or NOTIFY is > involved. On the whole it might be better if a heavyweight lock were used, such that it'll automatically clean up after commit. (I'm still wondering if we couldn't do without the lock altogether though.) regards, tom lane
On Tue, 2010-01-19 at 19:24 -0500, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > I was also worried about holding multiple LWLocks at once -- is such > > practice generally avoided in the rest of the code? > > It's allowed but remember that there is no deadlock detection in lwlock.c. > You must be very certain that there is only one possible order in which > such locks could be taken. Interactions with heavyweight locks would be > bad news as well. That was my worry initially. > On the whole it might be better if a heavyweight lock were used, > such that it'll automatically clean up after commit. (I'm still > wondering if we couldn't do without the lock altogether though.) Yes, I think there's a better way as well. I'll look into it. Regards,Jeff Davis
On Wed, Jan 20, 2010 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I guess Joachim is trying to provide a similar guarantee for the new > implementation, but I'm not clear on why it would require locking. > The new implementation is broadcast and ISTM it shouldn't require the > modifying transaction to know which processes are listening. It is rather about a listening backend seeing a notification in the global queue without knowing if it should deliver the notification to its frontend or not. The backend needs to know if its own LISTEN committed before or after the NOTIFY committed that it sees in the queue. As I have understood there is no way to find out if a transaction has committed before or after another transaction. If we had this, it would be easy without a lock. > I haven't read the patch but I agree that the description you give is > pretty scary from a performance standpoint. More locks around > transaction commit doesn't seem like a good idea. If they're only taken > when an actual LISTEN or NOTIFY has happened in the current transaction, > that'd be okay (certainly no worse than what happens now) but the naming > suggested that this'd happen unconditionally. The lock is taken exclusively by transactions doing LISTEN/UNLISTEN and in shared mode by transactions that execute only NOTIFY. It should really not degrade performance but I understand Jeff's concerns about deadlocks. Joachim
Joachim Wieland <joe@mcknight.de> writes: > On Wed, Jan 20, 2010 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I guess Joachim is trying to provide a similar guarantee for the new >> implementation, but I'm not clear on why it would require locking. > It is rather about a listening backend seeing a notification in the > global queue without knowing if it should deliver the notification to > its frontend or not. The backend needs to know if its own LISTEN > committed before or after the NOTIFY committed that it sees in the > queue. In that case I think you've way overcomplicated matters. Just deliver the notification. We don't really care if the listener gets additional notifications; the only really bad case would be if it failed to get an event that was generated after it committed a LISTEN. regards, tom lane
On Wed, Jan 20, 2010 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In that case I think you've way overcomplicated matters. Just deliver > the notification. We don't really care if the listener gets additional > notifications; the only really bad case would be if it failed to get an > event that was generated after it committed a LISTEN. Okay, what about unprocessed notifications in the queue and a backend executing UNLISTEN: can we assume that it is not interested in notifications anymore once it executes UNLISTEN and discard all of them even though there might be notifications that have been sent (and committed) before the UNLISTEN committed? Joachim
Joachim Wieland <joe@mcknight.de> writes: > Okay, what about unprocessed notifications in the queue and a backend > executing UNLISTEN: can we assume that it is not interested in > notifications anymore once it executes UNLISTEN and discard all of > them even though there might be notifications that have been sent (and > committed) before the UNLISTEN committed? Yes. That is the case with the existing implementation as well, no? We don't consider sending notifies until transaction end, so anything that commits during the xact in which you UNLISTEN will get dropped. Again, a little bit of sloppiness here doesn't seem important. Issuing UNLISTEN implies the client is not interested anymore. regards, tom lane
On Wed, 2010-01-20 at 15:54 -0500, Tom Lane wrote: > Joachim Wieland <joe@mcknight.de> writes: > > Okay, what about unprocessed notifications in the queue and a backend > > executing UNLISTEN: can we assume that it is not interested in > > notifications anymore once it executes UNLISTEN and discard all of > > them even though there might be notifications that have been sent (and > > committed) before the UNLISTEN committed? > > Yes. That is the case with the existing implementation as well, no? > We don't consider sending notifies until transaction end, so anything > that commits during the xact in which you UNLISTEN will get dropped. Only if the transaction containing UNLISTEN commits. Are you saying it would also be OK to drop NOTIFYs if a backend's UNLISTEN transaction aborts? > Again, a little bit of sloppiness here doesn't seem important. Issuing > UNLISTEN implies the client is not interested anymore. Thinking out loud: If we're taking this approach, I wonder if it might be a good idea to PreventTransactionChain for LISTEN and UNLISTEN? It might simplify things for users because they wouldn't be expecting transaction-like behavior, except for the NOTIFYs themselves. Regards,Jeff Davis
On Wed, Jan 20, 2010 at 11:08 PM, Jeff Davis <pgsql@j-davis.com> wrote: >> Yes. That is the case with the existing implementation as well, no? >> We don't consider sending notifies until transaction end, so anything >> that commits during the xact in which you UNLISTEN will get dropped. > > Only if the transaction containing UNLISTEN commits. Are you saying it > would also be OK to drop NOTIFYs if a backend's UNLISTEN transaction > aborts? If the backend's UNLISTEN transaction aborts, then it has never executed UNLISTEN... So it will continue to get notifications (if it has executed a LISTEN before). Joachim
Jeff Davis <pgsql@j-davis.com> writes: > On Wed, 2010-01-20 at 15:54 -0500, Tom Lane wrote: >> Yes. That is the case with the existing implementation as well, no? >> We don't consider sending notifies until transaction end, so anything >> that commits during the xact in which you UNLISTEN will get dropped. > Only if the transaction containing UNLISTEN commits. Are you saying it > would also be OK to drop NOTIFYs if a backend's UNLISTEN transaction > aborts? No, I would say not, but that wasn't being proposed was it? The decisions about what to do are only made at/after commit. > Thinking out loud: If we're taking this approach, I wonder if it might > be a good idea to PreventTransactionChain for LISTEN and UNLISTEN? That shouldn't be necessary IMO. There's never been such a restriction before. regards, tom lane
On Tue, 2010-01-19 at 19:24 -0500, Tom Lane wrote: > (I'm still > wondering if we couldn't do without the lock altogether though.) Here's the problem as I see it: If we insert the notifications into the queue before actually recording the commit, there's a window in between where another backend could perform the expected sequence as you wrote: 1. LISTEN foo; (and commit the listen) 2. examine current database state 3. assume that we'll get a NOTIFYfor any change that commits subsequently to what we saw in step 2 and miss the NOTIFYs, and not see the updated database state. But I don't think that the NOTIFYs will actually be missed. Once put into the queue, the notification will only be removed from the queue after all backends have read it. But no backend will advance past it as long as the notification is from an uncommitted transaction. By the time the notifying transaction is committed, the listening transaction will also be committed, and therefore subscribed to the queue. The newly-listening backend will be awakened properly as well, because that's done after the notifying transaction commits, and therefore will wake up any listening transactions that committed earlier. However, there's still a problem inserting into the queue when no backends are listening. Perhaps that can be solved right before we wake up the listening backends after the notifying transaction commits: if there are no listening backends, clear the queue. We still might get spurious notifications if they were committed before the LISTEN transaction was committed. And we also might get spurios notifications if the UNLISTEN doesn't take effect quite quickly enough. Those are both acceptable. If the above scheme is too complex, we can always use a heavyweight lock. However, there's no pg_listener so it's not obvious what LOCKTAG to use. We can just pick something arbitrary, like the Oid of the new pg_listening() function, I suppose. Is there any precedent for that? Thoughts? Regards,Jeff Davis
On Thu, Jan 21, 2010 at 3:06 AM, Jeff Davis <pgsql@j-davis.com> wrote: > Here's the problem as I see it: You are writing a lot of true facts but I miss to find a real problem... What exactly do you see as a problem? The only time you are writing "problem" is in this paragraph: > However, there's still a problem inserting into the queue when no > backends are listening. Perhaps that can be solved right before we wake > up the listening backends after the notifying transaction commits: if > there are no listening backends, clear the queue. This gets already done, in SignalBackends(), a notifying transactions counts the number of listening backends. If no other backend is listening, then it signals itself so that the queue gets cleaned (i.e. the global pointer gets forwarded and the slru pages will be truncated if possible). I have been working on simplifying the patch yesterday, I still need to adapt comments and review it again but I am planning to post the new version tonight. Then we have a common base again to discuss further :-) Thanks for your review, Joachim
On Thu, 2010-01-21 at 10:14 +0100, Joachim Wieland wrote: > On Thu, Jan 21, 2010 at 3:06 AM, Jeff Davis <pgsql@j-davis.com> wrote: > > Here's the problem as I see it: > > You are writing a lot of true facts but I miss to find a real > problem... What exactly do you see as a problem? I worded that in a confusing way, I apologize. My point was that I don't think we need a lock, because I don't see any situation in which the notifications would be lost. > I have been working on simplifying the patch yesterday, I still need > to adapt comments and review it again but I am planning to post the > new version tonight. Then we have a common base again to discuss > further :-) Sounds good. Regards,Jeff Davis
On Thu, Jan 21, 2010 at 6:21 PM, Jeff Davis <pgsql@j-davis.com> wrote: >> I have been working on simplifying the patch yesterday, I still need >> to adapt comments and review it again but I am planning to post the >> new version tonight. Then we have a common base again to discuss >> further :-) > > Sounds good. Attached is a new version of the patch. I haven't changed it with regards to 2PC or the ASCII / bytea issues but now the transactional behavior is less strict which allows to remove the AsyncCommitOrderLock. I will leave the commitfest status on "Waiting on author" but I will be travelling for the weekend and wanted to share my current version. Joachim
Attachment
Comments: * In standard_ProcessUtility(), case NotifyStmt, you add a comment: /* XXX the new listen/notify version can be enabled * for Hot Standby */ but I don't think that's correct. We may be able to support LISTEN on the standby, but not NOTIFY (right?). I don't thinkwe should be adding speculative comments anyway, because there is some work still needed before HS can support LISTEN(notably, WAL-logging NOTIFY). * You have a TODO list as a comment. Can you remove it and explain those items on list if they aren't already? * You have the comment: /* * How long can a payload string possibly be? Actually it needs to be one * byte less to provide space forthe trailing terminating '\0'. */ That should be written more simply, like "Maximum size of the payload, including terminating NULL." * You have the Assert: Assert(strlen(n->payload) <= NOTIFY_PAYLOAD_MAX_LENGTH); which is inconsistent with the earlier test: if (stmt->payload && strlen(stmt->payload) > NOTIFY_PAYLOAD_MAX_LENGTH - 1) ereport(ERROR, ... * ASCII-only is still an open issue. * 2PC is still an open issue (notifications are lost on restart, and there may be other problems, as well). I think it'seasy enough to throw an error on PREPARE TRANSACTION if there are any notifications, right? * There's a bug where an UNLISTEN can abort, and yet you still miss the notification. This is because you unlisten beforeyou actually commit the transaction, and an error between those times will cause the UNLISTEN to take effect eventhough the rest of the transaction fails. For example: -- session 1 LISTEN foo; BEGIN; UNLISTEN foo; -- session 2 NOTIFY foo; -- gdb in session 1 (gdb) break AtCommit_NotifyBeforeCommit (gdb) c -- session 1 COMMIT; -- gdb in session 1 (gdb) finish (gdb) p op_strict(7654322) (gdb) quit The notification is missed. It's fixed easily enough by doing the UNLISTEN step in AtCommit_NotifyAfterCommit. I'm still looking through some of the queue stuff, and I'll send an update soon. I wanted to give you some feedback now though. Regards,Jeff Davis
On Sat, Jan 30, 2010 at 12:02 AM, Jeff Davis <pgsql@j-davis.com> wrote: > Comments: > > * In standard_ProcessUtility(), case NotifyStmt, you add a comment: > > /* XXX the new listen/notify version can be enabled > * for Hot Standby */ > > but I don't think that's correct. We may be able to support LISTEN > on the standby, but not NOTIFY (right?). I don't think we should > be adding speculative comments anyway, because there is some work > still needed before HS can support LISTEN (notably, WAL-logging > NOTIFY). I admit that it was not clear what I meant. The comment should only address LISTEN / NOTIFY on the standby server. Do you see any problems allowing it? Of course it's not all that useful because all transactions are read-only but it still allows different clients to communicate. Of course listening on the standby server to notifications from the primary server is not possible currently but in my point of view this is a completely new use case for LISTEN/NOTIFY as it is not local but across servers. > * You have a TODO list as a comment. Can you remove it and explain > those items on list if they aren't already? Sure. I was wondering if we should have a hard limit on the maximal number of notifications per transaction. You can now easily fill up your backend's memory with notifications. However we had the same problem with the old implementation already and nobody complained. The difference is just that now you can fill it up a lot faster because you can send a large payload. The second doubt I had is about the truncation behavior of slru. ISTM that it doesn't truncate at the end of the page range once the head pointer has already wrapped around. There is the following comment in slru.c describing this fact: /* * While we are holding the lock, make an important safety check: the * planned cutoff point must be <= the current endpoint page. Otherwise we * have already wrapped around, and proceeding with the truncation would * risk removing the current segment. */ I wanted to check if we can do anything about it and if we need to do anything at all... > * You have the comment: > That should be written more simply [...] done > * You have the Assert: done > * ASCII-only is still an open issue. still an open issue... > * 2PC is still an open issue (notifications are lost on restart, > and there may be other problems, as well). I think it's easy enough > to throw an error on PREPARE TRANSACTION if there are any > notifications, right? we could forbid them, yes. Notifications aren't lost on restart however, they get recorded in the 2PC state files. Currently the patch works fine with 2PC (including server restart) with the exception that in the queue full case we might need to roll back the prepared transaction. Now the question is, should we forbid NOTIFY for 2PC altogether only because in the unlikely event of a full queue we cannot guarantee that we can commit the transaction? One solution is to treat a 2PC transaction like a backend with its own pointer to the queue. As long as the prepared transaction is not committed, its pointer does not move and so we don't move forward the global tail pointer. Here the NOTIFYs sent by the 2PC transaction are already in the queue and the transaction can always commit. The drawback of this idea is that if you forget to commit the prepared transaction and leave it around uncommitted, your queue will fill up inevitably because you do not truncate anymore... > * There's a bug where an UNLISTEN can abort, and yet you still miss > the notification. > [...] > The notification is missed. It's fixed easily enough by doing the > UNLISTEN step in AtCommit_NotifyAfterCommit. Thanks, very well spotted... Actually the same is true for LISTEN... I have reworked the patch to do the changes to listenChannels only in the post-commit functions. I have also included Arnaud Betremieux's send_notify function. Should we really call the function send_notify? What about pg_send_notification or just pg_notify? I am not a native English speaker but to me it sounds strange to send a verb (notify) and I'd rather prefix the name with pg_...? Currently the function always returns void. Is this the preferred return type? Joachim
Attachment
On Wed, 2010-02-03 at 00:10 +0100, Joachim Wieland wrote: > I admit that it was not clear what I meant. The comment should only > address LISTEN / NOTIFY on the standby server. Do you see any problems > allowing it? The original comment was a part of the NotifyStmt case, and I don't think we can support NOTIFY issued on a standby system -- surely there's no way for the standby to communicate the notification to the master. Anyway, this is getting a little sidetracked; I don't think we need to worry about HS right now. > I was wondering if we should have a hard limit on the maximal number > of notifications per transaction. You can now easily fill up your > backend's memory with notifications. However we had the same problem > with the old implementation already and nobody complained. The > difference is just that now you can fill it up a lot faster because > you can send a large payload. I don't see a need for that. The only way that is likely to happen is with triggers, and there are other ways to fill up the memory using triggers. Even if the option is there, all we can do is abort. > The second doubt I had is about the truncation behavior of slru. ISTM > that it doesn't truncate at the end of the page range once the head > pointer has already wrapped around. > There is the following comment in slru.c describing this fact: > > /* > * While we are holding the lock, make an important safety check: the > * planned cutoff point must be <= the current endpoint page. Otherwise we > * have already wrapped around, and proceeding with the truncation would > * risk removing the current segment. > */ > > I wanted to check if we can do anything about it and if we need to do > anything at all... I'll have to take a look in more detail. > Now the question is, should we forbid NOTIFY for 2PC > altogether only because in the unlikely event of a full queue we > cannot guarantee that we can commit the transaction? I believe that was the consensus in the thread. The people who use 2PC are the people that want the most rock-solid guarantees. I don't like forbidding feature combinations, but I don't see a good alternative. > One solution is to treat a 2PC transaction like a backend with its own > pointer to the queue. As long as the prepared transaction is not > committed, its pointer does not move and so we don't move forward the > global tail pointer. Here the NOTIFYs sent by the 2PC transaction are > already in the queue and the transaction can always commit. The > drawback of this idea is that if you forget to commit the prepared > transaction and leave it around uncommitted, your queue will fill up > inevitably because you do not truncate anymore... There's also a problem if the power goes out, you restart, and then the queue fills up before you COMMIT PREPARED. > > * There's a bug where an UNLISTEN can abort, and yet you still miss > > the notification. > > [...] > > The notification is missed. It's fixed easily enough by doing the > > UNLISTEN step in AtCommit_NotifyAfterCommit. > > Thanks, very well spotted... Actually the same is true for LISTEN... I > have reworked the patch to do the changes to listenChannels only in > the post-commit functions. I'm worried that this creates the opposite problem: that a LISTEN transaction might commit before a NOTIFY transaction, and yet miss the notification. It seems safest to me to add a backend (LISTEN) to the list before commit, and remove a backend (UNLISTEN) after commit. That way we are sure to only receive spurious notifications, and can't miss any. > I have also included Arnaud Betremieux's send_notify function. Should > we really call the function send_notify? What about > pg_send_notification or just pg_notify? Agreed: pg_notify sounds good to me. > Is [void] the preferred return type? I can't think of anything else that it might return. NOTIFY doesn't return much ;) Regards,Jeff Davis
On Wed, Feb 3, 2010 at 2:05 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> Thanks, very well spotted... Actually the same is true for LISTEN... I >> have reworked the patch to do the changes to listenChannels only in >> the post-commit functions. > > I'm worried that this creates the opposite problem: that a LISTEN > transaction might commit before a NOTIFY transaction, and yet miss the > notification. See the following comment and let me know if you agree... ! /* ! * Exec_ListenBeforeCommit --- subroutine for AtCommit_NotifyBeforeCommit ! * ! * Note that we do only set our pointer here and do not yet add the channel to ! * listenChannels. Since our transaction could still roll back we do this only ! * after commit. We know that our tail pointer won't move between here and ! * directly after commit, so we won't miss a notification. ! */ However this introduces a new problem when an initial LISTEN aborts: Then we are not listening to anything but for other backends it looks like we were. This is tracked by the boolean variable backendExecutesInitialListen and gets cleaned up in AtAbort_Notify(). > It seems safest to me to add a backend (LISTEN) to the list before > commit, and remove a backend (UNLISTEN) after commit. That way we are > sure to only receive spurious notifications, and can't miss any. If a LISTEN aborted we would not only receive a few spurious notifications from it but would receive notifications on this channel forever even though we have never executed LISTEN on it successfully. Joachim
On Wed, Feb 3, 2010 at 4:34 AM, Joachim Wieland <joe@mcknight.de> wrote: > On Wed, Feb 3, 2010 at 2:05 AM, Jeff Davis <pgsql@j-davis.com> wrote: >>> Thanks, very well spotted... Actually the same is true for LISTEN... I >>> have reworked the patch to do the changes to listenChannels only in >>> the post-commit functions. >> >> I'm worried that this creates the opposite problem: that a LISTEN >> transaction might commit before a NOTIFY transaction, and yet miss the >> notification. > > See the following comment and let me know if you agree... > > ! /* > ! * Exec_ListenBeforeCommit --- subroutine for AtCommit_NotifyBeforeCommit > ! * > ! * Note that we do only set our pointer here and do not yet add the channel to > ! * listenChannels. Since our transaction could still roll back we do this only > ! * after commit. We know that our tail pointer won't move between here and > ! * directly after commit, so we won't miss a notification. > ! */ > > However this introduces a new problem when an initial LISTEN aborts: > Then we are not listening to anything but for other backends it looks > like we were. This is tracked by the boolean variable > backendExecutesInitialListen and gets cleaned up in AtAbort_Notify(). > > >> It seems safest to me to add a backend (LISTEN) to the list before >> commit, and remove a backend (UNLISTEN) after commit. That way we are >> sure to only receive spurious notifications, and can't miss any. > > If a LISTEN aborted we would not only receive a few spurious > notifications from it but would receive notifications on this channel > forever even though we have never executed LISTEN on it successfully. Jeff, do you think this patch is ready for committer? If so, please mark it as such on commitfest.postgresql.org - otherwise, please clarify what you think the action items are. Thanks, ...Robert
On Sun, 2010-02-07 at 00:18 -0500, Robert Haas wrote: > Jeff, do you think this patch is ready for committer? If so, please > mark it as such on commitfest.postgresql.org - otherwise, please > clarify what you think the action items are. I'll post an update tomorrow. Regards,Jeff Davis
On Wed, Feb 3, 2010 at 2:05 AM, Jeff Davis <pgsql@j-davis.com> wrote: > The original comment was a part of the NotifyStmt case, and I don't > think we can support NOTIFY issued on a standby system -- surely there's > no way for the standby to communicate the notification to the master. > Anyway, this is getting a little sidetracked; I don't think we need to > worry about HS right now. True but I was not talking about moving any notifications to different servers. Clients listening on one server should receive the notifications from NOTIFYs executed on this server, no matter if it is a standby or the master server. It seems to me that currently we cannot support Listen/Notify on the Hot Standby because it calls GetCurrentTransactionId() which is not available there. On the other hand this is just done because we fear that a transaction could rollback while actually committing. On the read-only standby however a transaction that commits always commits sucessfully because it has not changed any data (right?)... >> The second doubt I had is about the truncation behavior of slru. ISTM >> that it doesn't truncate at the end of the page range once the head >> pointer has already wrapped around. >> There is the following comment in slru.c describing this fact: >> >> /* >> * While we are holding the lock, make an important safety check: the >> * planned cutoff point must be <= the current endpoint page. Otherwise we >> * have already wrapped around, and proceeding with the truncation would >> * risk removing the current segment. >> */ >> >> I wanted to check if we can do anything about it and if we need to do >> anything at all... > > I'll have to take a look in more detail. This is still kind of an open item but it's an slru issue and should also be true for other functionality that uses slru queues. >> Now the question is, should we forbid NOTIFY for 2PC >> altogether only because in the unlikely event of a full queue we >> cannot guarantee that we can commit the transaction? > > I believe that was the consensus in the thread. The people who use 2PC > are the people that want the most rock-solid guarantees. I don't like > forbidding feature combinations, but I don't see a good alternative. If this was consensus, then fine... I have forbidden it in the latest patch now. >> One solution is to treat a 2PC transaction like a backend with its own >> pointer to the queue. As long as the prepared transaction is not >> committed, its pointer does not move and so we don't move forward the >> global tail pointer. Here the NOTIFYs sent by the 2PC transaction are >> already in the queue and the transaction can always commit. The >> drawback of this idea is that if you forget to commit the prepared >> transaction and leave it around uncommitted, your queue will fill up >> inevitably because you do not truncate anymore... > > There's also a problem if the power goes out, you restart, and then the > queue fills up before you COMMIT PREPARED. This I don't understand... If power goes out and we restart, we'd first put all notifications from the prepared transactions into the queue. We know that they fit because they have fit earlier as well (we wouldn't allow user connections until we have worked through all 2PC state files). > I'm worried that this creates the opposite problem: that a LISTEN > transaction might commit before a NOTIFY transaction, and yet miss the > notification. (see my other email on this item...) >> I have also included Arnaud Betremieux's send_notify function. Should >> we really call the function send_notify? What about >> pg_send_notification or just pg_notify? > > Agreed: pg_notify sounds good to me. Function name changed. There was another problem that the slru files did not all get deleted at server restart, which is fixed now. Regarding the famous ASCII-restriction open item I have now realized what I haven't thought of previously: notifications are not transferred between databases, they always stay in one database. Since libpq does the conversion between server and client encoding, it is questionable if we really need to restrict this at all... But I am not an encoding expert so whoever feels like he can confirm or refute this, please speak up. Joachim
Attachment
Joachim Wieland wrote: > + typedef struct AsyncQueueEntry > + { > + /* > + * this record has the maximal length, but usually we limit it to > + * AsyncQueueEntryEmptySize + strlen(payload). > + */ > + Size length; > + Oid dboid; > + TransactionId xid; > + int32 srcPid; > + char channel[NAMEDATALEN]; > + char payload[NOTIFY_PAYLOAD_MAX_LENGTH]; > + } AsyncQueueEntry; > + #define AsyncQueueEntryEmptySize \ > + (sizeof(AsyncQueueEntry) - NOTIFY_PAYLOAD_MAX_LENGTH + 1) These are the on-disk notifications, right? It seems to me a bit wasteful to store channel name always as NAMEDATALEN bytes. Can we truncate it at its strlen? I realize that this would cause the struct definition to be uglier (you will no longer be able to have both channel and payload pointers, only a char[1] pointer to a data area to which you write both). Typical channel names should be short, so IMHO this is worthwhile. Besides, I think the uglification of code this causes should be fairly contained ... -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
In this version of the patch, there is a compiler warning: async.c: In function ‘AtPrepare_Notify’: async.c:593: warning: unused variable ‘p’ and also two trivial merge conflicts: an OID conflict for the functions you added, and a trivial code conflict. On Sun, 2010-02-07 at 17:32 +0100, Joachim Wieland wrote: > On Wed, Feb 3, 2010 at 2:05 AM, Jeff Davis <pgsql@j-davis.com> wrote: > > The original comment was a part of the NotifyStmt case, and I don't > > think we can support NOTIFY issued on a standby system -- surely there's > > no way for the standby to communicate the notification to the master. > > Anyway, this is getting a little sidetracked; I don't think we need to > > worry about HS right now. > > True but I was not talking about moving any notifications to different > servers. Clients listening on one server should receive the > notifications from NOTIFYs executed on this server, no matter if it is > a standby or the master server. I'm not sure I agree with that philosophy. If the driving use-case for LISTEN/NOTIFY is cache invalidation, then a NOTIFY on the master should make it to all listening backends on the slaves. > This is still kind of an open item but it's an slru issue and should > also be true for other functionality that uses slru queues. I haven't found out anything new here. > This I don't understand... If power goes out and we restart, we'd > first put all notifications from the prepared transactions into the > queue. We know that they fit because they have fit earlier as well (we > wouldn't allow user connections until we have worked through all 2PC > state files). > Ok, it appears you've thought the 2PC interaction through more than I have. Even if we don't include it this time, I'm glad to hear that there is a plan to do so. Feel free to include support if you have it ready, but it's late in the CF so I don't want you to get sidetracked on that issue. > There was another problem that the slru files did not all get deleted > at server restart, which is fixed now. Good catch. > Regarding the famous ASCII-restriction open item I have now realized > what I haven't thought of previously: notifications are not > transferred between databases, they always stay in one database. Since > libpq does the conversion between server and client encoding, it is > questionable if we really need to restrict this at all... But I am not > an encoding expert so whoever feels like he can confirm or refute > this, please speak up. I would like to support encoded text, but I think there are other problems. For instance, what if one server has a client_encoding that doesn't support some of the glyphs being sent by the notifying backend? Then we have a mess, because we can't deliver it. I think we just have to say "ASCII only" here, because there's no reasonable way to handle this, regardless of implementation. If the user issues SELECT and the client_encoding can't support some of the glyphs, we can throw an error. But for a notification? We just have no mechanism for that. Regards,Jeff Davis
On Mon, 2010-02-08 at 22:13 -0800, Jeff Davis wrote: > I would like to support encoded text, but I think there are other > problems. For instance, what if one server has a client_encoding that > doesn't support some of the glyphs being sent by the notifying backend? > Then we have a mess, because we can't deliver it. I was thinking more about this. It seems clear that we want the backend that issues the notify to only put 7-bit ASCII in the payload. But if the client sends the letters 'string' as a payload, and the representation in the server encoding is something other than the normal 7-bit ASCII representation of 'string', it will be incorrect, right? Looking at the documentation, it appears that all of the server encodings represent 7-bit ascii characters using the same 7-bit ascii representation. Is that true? Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Looking at the documentation, it appears that all of the server > encodings represent 7-bit ascii characters using the same 7-bit ascii > representation. Is that true? Correct. We only support ASCII-superset encodings, both for frontend and backend. Limiting NOTIFY payloads to 7-bit would definitely avoid the issue. The question is if that's more of a pain than a benefit. regards, tom lane
On Tue, Feb 9, 2010 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Davis <pgsql@j-davis.com> writes: >> Looking at the documentation, it appears that all of the server >> encodings represent 7-bit ascii characters using the same 7-bit ascii >> representation. Is that true? > > Correct. We only support ASCII-superset encodings, both for frontend > and backend. > > Limiting NOTIFY payloads to 7-bit would definitely avoid the issue. > The question is if that's more of a pain than a benefit. I think it's a reasonable restriction for now. We have limited time remaining here; and we can always consider relaxing the restriction in the future when we have more time to think through the issues. It'll still be a big improvement over what we have now. ...Robert
On Tue, 2010-02-09 at 16:51 -0500, Tom Lane wrote: > Limiting NOTIFY payloads to 7-bit would definitely avoid the issue. > The question is if that's more of a pain than a benefit. I don't see any alternative. If one backend sends a NOTIFY payload that contains a non-ASCII character, there's a risk that we won't be able to deliver it to another backend with a client_encoding that can't represent that character. Also, just the fact that client_encoding can be changed at pretty much any time is a potential problem, because it's difficult to know whether a particular notification was sent using the old client_encoding or the new one (because it's asynchronous). Regards,Jeff Davis
Jeff Davis wrote: > On Tue, 2010-02-09 at 16:51 -0500, Tom Lane wrote: >> Limiting NOTIFY payloads to 7-bit would definitely avoid the issue. >> The question is if that's more of a pain than a benefit. > > I don't see any alternative. If one backend sends a NOTIFY payload that Wouldn't binary payloads be an alternative? NOTE: I may have missed this discussion. Sorry if it has already been covered. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On Tue, 2010-02-09 at 19:02 -0500, Andrew Chernow wrote: > Wouldn't binary payloads be an alternative? NOTE: I may have missed this > discussion. Sorry if it has already been covered. The Notify struct has a "char *" field, which can't hold embedded NULL bytes, so it can't really be binary. But it can't be arbitrary text, because it has to be encoded in a way that works for every possible client encoding (otherwise there's a possibility of an error, and no way to handle it). Also, the query starts out as text, so we need a way to interpret the text in an encoding-independent way. So, I think ASCII is the natural choice here. Regards,Jeff Davis
Jeff Davis wrote: > Also, the query starts out as text, so we need a way to interpret the > text in an encoding-independent way. > > So, I think ASCII is the natural choice here. > > > It's not worth hanging up this facility over this issue, ISTM. If we want something more that ASCII then a base64 or hex encoded string could possibly meet the need in the first instance. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Jeff Davis wrote: >> So, I think ASCII is the natural choice here. > It's not worth hanging up this facility over this issue, ISTM. If we > want something more that ASCII then a base64 or hex encoded string could > possibly meet the need in the first instance. Yeah, that would serve people who want to push either binary or non-ASCII data through the pipe. It would leave all risks of encoding problems on the user's head, though. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Jeff Davis wrote: >> >>> So, I think ASCII is the natural choice here. >>> > > >> It's not worth hanging up this facility over this issue, ISTM. If we >> want something more that ASCII then a base64 or hex encoded string could >> possibly meet the need in the first instance. >> > > Yeah, that would serve people who want to push either binary or > non-ASCII data through the pipe. It would leave all risks of encoding > problems on the user's head, though. > > > True. It's a workaround, but I think it's acceptable at this stage. We need to get some experience with this facility before we can refine it. cheers andrew
Andrew Dunstan wrote: > > > Tom Lane wrote: > >Andrew Dunstan <andrew@dunslane.net> writes: > >>Jeff Davis wrote: > >>>So, I think ASCII is the natural choice here. > > > >>It's not worth hanging up this facility over this issue, ISTM. > >>If we want something more that ASCII then a base64 or hex > >>encoded string could possibly meet the need in the first > >>instance. > > > >Yeah, that would serve people who want to push either binary or > >non-ASCII data through the pipe. It would leave all risks of encoding > >problems on the user's head, though. > > True. It's a workaround, but I think it's acceptable at this stage. > We need to get some experience with this facility before we can > refine it. Hmm? If we decide now that it's not going to have encoding conversion, we won't able to change it later. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Andrew Dunstan wrote: >> True. It's a workaround, but I think it's acceptable at this stage. >> We need to get some experience with this facility before we can >> refine it. > Hmm? If we decide now that it's not going to have encoding conversion, > we won't able to change it later. How so? If the feature currently allows only ASCII data, the behavior would be upward compatible with a future version that is able to accept and convert non-ASCII characters. regards, tom lane
On Mon, Feb 8, 2010 at 5:16 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > These are the on-disk notifications, right? It seems to me a bit > wasteful to store channel name always as NAMEDATALEN bytes. Can we > truncate it at its strlen? Attached is a new and hopefully more or less final patch for LISTEN / NOTIFY. The following items have been addressed in this patch: - only store strlen(channel) instead of NAMEDATALEN bytes on disk - limit to 7-bit ASCII - forbid 2PC and LISTEN/NOTIFY for now - documentation changes - add missing tab completion for NOTIFY - fix pg_notify() behavior with respect to NULLs, too long and too short parameters - rebased to current HEAD, OID conflicts resolved Joachim
Attachment
On Thu, 2010-02-11 at 00:52 +0100, Joachim Wieland wrote: > On Mon, Feb 8, 2010 at 5:16 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > These are the on-disk notifications, right? It seems to me a bit > > wasteful to store channel name always as NAMEDATALEN bytes. Can we > > truncate it at its strlen? > > Attached is a new and hopefully more or less final patch for LISTEN / NOTIFY. > > The following items have been addressed in this patch: > > - only store strlen(channel) instead of NAMEDATALEN bytes on disk > - limit to 7-bit ASCII > - forbid 2PC and LISTEN/NOTIFY for now > - documentation changes > - add missing tab completion for NOTIFY > - fix pg_notify() behavior with respect to NULLs, too long and too > short parameters > - rebased to current HEAD, OID conflicts resolved Some minor review comments without having taken in much of previous discussion: * Patch doesn't apply cleanly anymore, close. * In async.c you say "new async notification model". Please say "async notification model in 9.0 is". In (2) you say there is a central queue, but don't describe purpose of queue or what it contains. Some other typos in header comments. * There is no mention of what to do with pg_notify at checkpoint. Look at how pg_subtrans handles this. Should pg_notify do the same? * Is there a lazy backend avoidance scheme as exists for relcache invalidation messages? see storage/ipc/sinval... OK, I see asyncQueueFillWarning() but nowhere else says it exists and there aren't any comments in it to say what it does or why * What do you expect the DBA to do when they receive a queue fill warning? Where is that written down? * Not clear of the purpose of backendSendsNotifications. In AtCommit_NotifyAfterCommit() the logic seems strange. Code is /* Allow transactions that have not executed LISTEN/UNLISTEN/NOTIFYto * return as soon as possible */ if (!pendingActions && !backendSendsNotifications) return; but since backendSendsNotifications is set true earlier there is no fast path. * AtSubCommit_Notify doesn't seem to have a fastpath when no notify commands have been executed. * In Send_Notify() you throw ERROR while still holding the lock. It seems better practice to drop the lock then ERROR. * Why is Send_Notify() a separate function? It's only called from one point in the code. * We know that sometimes a FATAL error can occur without properly processing the abort. Do we depend upon this never happening? * Can we make NUM_ASYNC_BUFFERS = 8? 4 just seems too small * backendSendsNotifications is future tense. The variable should be called something like hasSentNotifications. Couple of other variables with similar names Hope that helps -- Simon Riggs www.2ndQuadrant.com
On Sun, Feb 14, 2010 at 2:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Some minor review comments without having taken in much of previous > discussion: > > * Patch doesn't apply cleanly anymore, close. New version is rebased against current HEAD. > * In async.c you say "new async notification model". Please say "async > notification model in 9.0 is". I haven't changed that line at all... ;-) But the fact that the former "new async model" is now the old one really shows that a version number makes sense here. > In (2) you say there is a central queue, fixed with some additional comments. > * There is no mention of what to do with pg_notify at checkpoint. Look > at how pg_subtrans handles this. Should pg_notify do the same? Actually we don't care... We even hope that the pg_notify pages are not flushed at all. Notifications don't survive a server restart anyway and upon restart we just delete whatever is in the directory. > * Is there a lazy backend avoidance scheme as exists for relcache > invalidation messages? see storage/ipc/sinval... We cannot avoid a lazy backend. If a listening backend is (probably idle) in a running transaction we don't clean up the notification queue anymore but wait for the slowest backend to do it. But there are other effects of long running transactions that you will probably notice long before your notification queue gets filled up. > OK, I see asyncQueueFillWarning() but nowhere else says it exists and > there aren't any comments in it to say what it does or why > * What do you expect the DBA to do when they receive a queue fill > warning? Where is that written down? Comment added as well as an errdetail for whoever sees this message: WARNING: pg_notify queue is more than 50% full. Among the slowest backends: 27744 DETAIL: Cleanup can only proceed if this backend ends its current transaction I have also added a note in the NOTIFY documentation about this. > * Not clear of the purpose of backendSendsNotifications. In > AtCommit_NotifyAfterCommit() the logic seems strange. Code is > /* Allow transactions that have not executed > LISTEN/UNLISTEN/NOTIFY to > * return as soon as possible */ > if (!pendingActions && !backendSendsNotifications) > return; > but since backendSendsNotifications is set true earlier there is no fast > path. What exactly is the strange logic that you're seeing? The fast path is for transactions that do neither LISTEN/UNLISTEN nor NOTIFY. Without LISTEN/UNLISTEN they have pendingActions = NIL and without NOTIFY they have backendSendsNotifications = false and will return immediately. backendSendsNotifications is only set to be true if the backend has executed NOTIFY... > * AtSubCommit_Notify doesn't seem to have a fastpath when no notify > commands have been executed. I haven't changed this function at all. I don't see much benefit either as it's just concatenating empty lists. This immediately returns from the respective list handling function. > * In Send_Notify() you throw ERROR while still holding the lock. It > seems better practice to drop the lock then ERROR. Explicit LWLockRelease() added. > * Why is Send_Notify() a separate function? It's only called from one > point in the code. I have now inlined Send_Notify(). > * We know that sometimes a FATAL error can occur without properly > processing the abort. Do we depend upon this never happening? What exactly could happen? I thought that a backend either does a clean shutdown or that the postmaster starts over. We depend on a listening backend to execute its on_shmem_exit() handlers before leaving the game... > * Can we make NUM_ASYNC_BUFFERS = 8? 4 just seems too small I have done some tests with NUM_ASYNC_BUFFERS and couldn't find any measurable performance difference at all, so I kept it at 4. 8 isn't exaggerated either and seems to be a fine number though. Changed to 8. > * backendSendsNotifications is future tense. The variable should be > called something like hasSentNotifications. Couple of other variables > with similar names Variable names changed as proposed. New patch attached, thanks for the review. Joachim
Attachment
On Sun, 2010-02-14 at 17:22 +0100, Joachim Wieland wrote: > > * There is no mention of what to do with pg_notify at checkpoint. > Look > > at how pg_subtrans handles this. Should pg_notify do the same? > > Actually we don't care... We even hope that the pg_notify pages are > not flushed at all. Notifications don't survive a server restart > anyway and upon restart we just delete whatever is in the directory. Suspected that was true, just checking it was commented somewhere. -- Simon Riggs www.2ndQuadrant.com
Joachim Wieland <joe@mcknight.de> writes: > + #define ERRCODE_TOO_MANY_ENTRIES MAKE_SQLSTATE('5','4', '0','3','1') Do you have any evidence that there is actually a DB2 error code matching this, or is this errcode just invented? The one page Google finds doesn't list it: http://publib.boulder.ibm.com/iseries/v5r1/ic2924/index.htm?info/rzala/rzalastc.html regards, tom lane
On Sun, 2010-02-14 at 17:22 +0100, Joachim Wieland wrote: > New patch attached, thanks for the review. Next set of questions * Will this work during Hot Standby now? The barrier was that it wrote to a table and so we could not allow that. ISTM this new version can and should work with Hot Standby. Can you test that and if so, remove the explicit barrier code and change tests and docs to enable it? * We also discussed the idea of having a NOTIFY command that would work from Primary to Standby. All this would need is some code to WAL log the NOTIFY if not in Hot Standby and for some recovery code to send the NOTIFY to any listeners on the standby. I would suggest that would be an option on NOTIFY to WAL log the notification: e.g. NOTIFY me 'with_payload' FOR STANDBY ALSO; * Don't really like pg_listening() as a name. Perhaps pg_listening_to() or pg_listening_on() or pg_listening_for() or pg_listening_channels() or pg_listen_channels() * I think it's confusing that pg_notify is both a data structure and a function. Suggest changing one of those to avoid issues in understanding. "Use pg_notify" might be confused by a DBA. -- Simon Riggs www.2ndQuadrant.com
On Sun, Feb 14, 2010 at 11:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Next set of questions > > * Will this work during Hot Standby now? The barrier was that it wrote > to a table and so we could not allow that. ISTM this new version can and > should work with Hot Standby. Can you test that and if so, remove the > explicit barrier code and change tests and docs to enable it? I have tested it already. The point where it currently fails is the following line: qe->xid = GetCurrentTransactionId(); We record the TransactionId (of the notifying transaction) in the notification in order to later check if this transaction has committed successfully or not. If you tell me how we can find this out in HS, we might be done... The reason why we are doing all this is because we fear that we can not write the notifications to disk once we have committed to clog... So we write them to disk before committing to clog and therefore need to record the TransactionId. > * We also discussed the idea of having a NOTIFY command that would work > from Primary to Standby. All this would need is some code to WAL log the > NOTIFY if not in Hot Standby and for some recovery code to send the > NOTIFY to any listeners on the standby. I would suggest that would be an > option on NOTIFY to WAL log the notification: > e.g. NOTIFY me 'with_payload' FOR STANDBY ALSO; What should happen if you wanted to replay a NOTIFY WAL record in the standby but cannot write to the pg_notify/ directory? > * Don't really like pg_listening() as a name. Perhaps pg_listening_to() > or pg_listening_on() or pg_listening_for() or pg_listening_channels() or > pg_listen_channels() pg_listen_channels() sounds best to me but I leave this decision to a native speaker. > * I think it's confusing that pg_notify is both a data structure and a > function. Suggest changing one of those to avoid issues in > understanding. "Use pg_notify" might be confused by a DBA. You are talking about the libpq datastructure PGnotify I suppose... I don't see it overly confusing but I wouldn't object changing it. There was a previous discussion about the name, see the last paragraph of http://archives.postgresql.org/message-id/dc7b844e1002021510i4aaa879fy8bbdd003729d28da@mail.gmail.com Joachim
On Mon, 2010-02-15 at 12:59 +0100, Joachim Wieland wrote: > > * I think it's confusing that pg_notify is both a data structure and > a > > function. Suggest changing one of those to avoid issues in > > understanding. "Use pg_notify" might be confused by a DBA. > > You are talking about the libpq datastructure PGnotify I suppose... I > don't see it overly confusing but I wouldn't object changing it. There > was a previous discussion about the name, see the last paragraph of > http://archives.postgresql.org/message-id/dc7b844e1002021510i4aaa879fy8bbdd003729d28da@mail.gmail.com No, which illustrates the confusion nicely! Function and datastructure. -- Simon Riggs www.2ndQuadrant.com
On Mon, 2010-02-15 at 12:59 +0100, Joachim Wieland wrote: > On Sun, Feb 14, 2010 at 11:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Next set of questions > > > > * Will this work during Hot Standby now? The barrier was that it wrote > > to a table and so we could not allow that. ISTM this new version can and > > should work with Hot Standby. Can you test that and if so, remove the > > explicit barrier code and change tests and docs to enable it? > > I have tested it already. The point where it currently fails is the > following line: > > qe->xid = GetCurrentTransactionId(); > > We record the TransactionId (of the notifying transaction) in the > notification in order to later check if this transaction has committed > successfully or not. If you tell me how we can find this out in HS, we > might be done... > > The reason why we are doing all this is because we fear that we can > not write the notifications to disk once we have committed to clog... > So we write them to disk before committing to clog and therefore need > to record the TransactionId. That's a shame. So it will never work in Hot Standby mode unless you can think of a different way. > > * We also discussed the idea of having a NOTIFY command that would work > > from Primary to Standby. All this would need is some code to WAL log the > > NOTIFY if not in Hot Standby and for some recovery code to send the > > NOTIFY to any listeners on the standby. I would suggest that would be an > > option on NOTIFY to WAL log the notification: > > e.g. NOTIFY me 'with_payload' FOR STANDBY ALSO; > > What should happen if you wanted to replay a NOTIFY WAL record in the > standby but cannot write to the pg_notify/ directory? Same thing that happens to any action that cannot be replayed. Why should that be a problem? -- Simon Riggs www.2ndQuadrant.com
On Mon, Feb 15, 2010 at 1:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2010-02-15 at 12:59 +0100, Joachim Wieland wrote: >> I have tested it already. The point where it currently fails is the >> following line: >> >> qe->xid = GetCurrentTransactionId(); > > That's a shame. So it will never work in Hot Standby mode unless you can > think of a different way. We could probably fake this on the Hot Standby in the following way: We introduce a commit record for every notifying transaction and write it into the queue itself. So right before writing anything else, we write an entry which informs readers that the following records are not yet committed. Then we write the actual notifications and commit. In post-commit we return back to the commit record and flip its status. Reading backends would stop at the commit record and we'd signal them so that they can continue. This actually plays nicely with Tom's intent to not have interleaved notifications in the queue (makes things a bit easier but would probably work either way)... However we'd need to make sure that we clean up that commit record even if something weird happens (similar to TransactionIdDidAbort() returning true) in order to allow the readers to proceed. Comments? Joachim
Joachim Wieland <joe@mcknight.de> writes: > We could probably fake this on the Hot Standby in the following way: > We introduce a commit record for every notifying transaction and write > it into the queue itself. So right before writing anything else, we > write an entry which informs readers that the following records are > not yet committed. Then we write the actual notifications and commit. > In post-commit we return back to the commit record and flip its > status. This doesn't seem likely to work --- it essentially makes commit non atomic. There has to be one and only one authoritative reference as to whether transaction X committed. I think that having HS slave sessions issue notifies is a fairly silly idea anyway. They can't write the database, so exactly what condition are they going to be notifying others about? What *would* be useful is for HS slaves to be able to listen for notify messages issued by writing sessions on the master. This patch gets rid of the need for LISTEN to change on-disk state, so in principle we can do it. The only bit we seem to lack is WAL transmission of the messages (plus of course synchronization in case a slave session is too slow about picking up messages). Definitely a 9.1 project at this point though. regards, tom lane
On Sun, 2010-02-14 at 22:44 +0000, Simon Riggs wrote: > * We also discussed the idea of having a NOTIFY command that would work > from Primary to Standby. All this would need is some code to WAL log the > NOTIFY if not in Hot Standby and for some recovery code to send the > NOTIFY to any listeners on the standby. I would suggest that would be an > option on NOTIFY to WAL log the notification: > e.g. NOTIFY me 'with_payload' FOR STANDBY ALSO; My first reaction is that it should not be optional. If we allow a slave system to LISTEN on a condition, what's the point if it doesn't receive the notifications from the master? Cache invalidation seems to be the driving use case for LISTEN/NOTIFY. Only the master can invalidate the cache (as Tom points out downthread); and users on the slave system want to know about that invalidation if they are explicitly listening for it. Regards,Jeff Davis
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > * We also discussed the idea of having a NOTIFY command that > would work from Primary to Standby. Just curious, what's a use case for this? - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201002161102 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkt6wZ4ACgkQvJuQZxSWSsjrYwCfSWvHlTBFT/fIYcBToX9C57GO toAAoOLQhBj6NdVTayaVtRH8L7nk16qM =LBAH -----END PGP SIGNATURE-----
On Tue, 2010-02-16 at 16:02 +0000, Greg Sabino Mullane wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > > > * We also discussed the idea of having a NOTIFY command that > > would work from Primary to Standby. > > Just curious, what's a use case for this? If you have some kind of cache above the DBMS, you need to invalidate it when a part of the database is updated. It makes sense that every reader would want to know about the update, not just those connected to the master. Regards,Jeff Davis
Joachim Wieland <joe@mcknight.de> writes: > [ listen/notify patch ] Applied after rather a lot of hacking. Aside from the issues previously raised, I changed the NOTIFY syntax to include a comma between channel name and payload. The submitted syntax with no comma looked odd to me, and it would have been a real nightmare to extend if we ever decide we want to support expressions in NOTIFY. I found a number of implementation problems having to do with wraparound behavior and error recovery. I think they're all fixed, but any remaining bugs are probably my fault not yours. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > * Don't really like pg_listening() as a name. Perhaps pg_listening_to() > or pg_listening_on() or pg_listening_for() or pg_listening_channels() or > pg_listen_channels() BTW, I used pg_listening_channels() for that. > * I think it's confusing that pg_notify is both a data structure and a > function. Suggest changing one of those to avoid issues in > understanding. "Use pg_notify" might be confused by a DBA. I didn't change that. The data structure is PGnotify, which seems enough different from pg_notify to not be a real serious problem. There is a duplication with the $PGDATA subdirectory pg_notify/, but that one is not a user-facing name, so I thought it wasn't really an issue. regards, tom lane
On Tue, Feb 16, 2010 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joachim Wieland <joe@mcknight.de> writes: >> [ listen/notify patch ] > > I found a number of implementation problems having to do with wraparound > behavior and error recovery. I think they're all fixed, but any > remaining bugs are probably my fault not yours. First, thanks for the rework you have done and thanks for applying this. While I can see a lot of improvements over my version, I think the logic in asyncQueueProcessPageEntries() needs to be reordered: + static bool + asyncQueueProcessPageEntries(QueuePosition *current, + QueuePosition stop, + char *page_buffer) [...] + do + { [...] + /* + * Advance *current over this message, possibly to the next page. + * As noted in the comments for asyncQueueReadAllNotifications, we + * must do this before possibly failing while processing the message. + */ + reachedEndOfPage = asyncQueueAdvance(current, qe->length); [...] + if (TransactionIdDidCommit(qe->xid)) [...] + else if (TransactionIdDidAbort(qe->xid)) [...] + else + { + /* + * The transaction has neither committed nor aborted so far, + * so we can't process its message yet. Break out of the loop. + */ + reachedStop = true; + break; In the beginning you are advancing *current but later on you could find out that the transaction is still running. As the position in the queue has already advanced you would miss one notification here because you'd restart directly behind this notification in the queue... Joachim
Joachim Wieland <joe@mcknight.de> writes: > While I can see a lot of improvements over my version, I think the > logic in asyncQueueProcessPageEntries() needs to be reordered: Hmmm ... I was intending to cover the case of a failure in TransactionIdDidCommit too, but I can see it will take a bit more thought. regards, tom lane
On Mon, 2010-02-15 at 15:00 -0500, Tom Lane wrote: > Joachim Wieland <joe@mcknight.de> writes: > > We could probably fake this on the Hot Standby in the following way: > > > We introduce a commit record for every notifying transaction and write > > it into the queue itself. So right before writing anything else, we > > write an entry which informs readers that the following records are > > not yet committed. Then we write the actual notifications and commit. > > In post-commit we return back to the commit record and flip its > > status. > > This doesn't seem likely to work --- it essentially makes commit non > atomic. There has to be one and only one authoritative reference as > to whether transaction X committed. I thought a bit more about this and don't really understand why we need an xid at all. When we discussed this before the role of a NOTIFY was to remind us to refresh a cache, not as a way of delivering a transactional payload. If the cache refresh use case is still the objective why does it matter whether we commit or not when we issue a NOTIFY? Surely, the rare case where we actually abort right at the end of the transaction will just cause an unnecessary cache refresh. > I think that having HS slave sessions issue notifies is a fairly silly > idea anyway. They can't write the database, so exactly what condition > are they going to be notifying others about? Agreed > What *would* be useful is for HS slaves to be able to listen for notify > messages issued by writing sessions on the master. This patch gets rid > of the need for LISTEN to change on-disk state, so in principle we can > do it. The only bit we seem to lack is WAL transmission of the messages > (plus of course synchronization in case a slave session is too slow > about picking up messages). Definitely a 9.1 project at this point > though. OK -- Simon Riggs www.2ndQuadrant.com
On 2/18/10 9:58 AM, Simon Riggs wrote: > I thought a bit more about this and don't really understand why we need > an xid at all. When we discussed this before the role of a NOTIFY was to > remind us to refresh a cache, not as a way of delivering a transactional > payload. If the cache refresh use case is still the objective why does > it matter whether we commit or not when we issue a NOTIFY? Surely, the > rare case where we actually abort right at the end of the transaction > will just cause an unnecessary cache refresh. Actually, even for that use, it doesn't wash to have notifies being sent for transactions which have not yet committed. Think of cases (and I have two applications) where data is being pushed into an external non-transactional cache or queue (like Memcached, Redis or ApacheMQ) from PostgreSQL on the backend. If the transaction fails, it's important that the data not get pushed. I guess I'm not following why HS would be different from a single server in this regard, though? Mind you, this is all 9.1 discussion, no? --Josh Berkus
On Thu, Feb 18, 2010 at 12:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2010-02-15 at 15:00 -0500, Tom Lane wrote: >> Joachim Wieland <joe@mcknight.de> writes: >> > We could probably fake this on the Hot Standby in the following way: >> >> > We introduce a commit record for every notifying transaction and write >> > it into the queue itself. So right before writing anything else, we >> > write an entry which informs readers that the following records are >> > not yet committed. Then we write the actual notifications and commit. >> > In post-commit we return back to the commit record and flip its >> > status. >> >> This doesn't seem likely to work --- it essentially makes commit non >> atomic. There has to be one and only one authoritative reference as >> to whether transaction X committed. > > I thought a bit more about this and don't really understand why we need > an xid at all. When we discussed this before the role of a NOTIFY was to > remind us to refresh a cache, not as a way of delivering a transactional > payload. If the cache refresh use case is still the objective why does > it matter whether we commit or not when we issue a NOTIFY? Surely, the > rare case where we actually abort right at the end of the transaction > will just cause an unnecessary cache refresh. notifications serve many more purposes than cache refreshes...it's a generic 'wake up and do something' to the client. For example, one of those things could be for the client to shut down.If the server errors out of the transaction that setup the client to shut down, you probably wouldn't want the client to shut down. I don't think that's a big deal really, but it conflicts with the old behavior. However, being able to send notifications immediately (not at end of transaction) would be exceptionally useful in some cases. This happens when the notifying backend is waiting on some sort of response from the notified client. If you could NOTIFY IMMEDIATELY, then you could ping the client and get the response in a single transaction without using dblink based hacks. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Thu, Feb 18, 2010 at 12:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I thought a bit more about this and don't really understand why we need >> an xid at all. When we discussed this before the role of a NOTIFY was to >> remind us to refresh a cache, not as a way of delivering a transactional >> payload. If the cache refresh use case is still the objective why does >> it matter whether we commit or not when we issue a NOTIFY? Surely, the >> rare case where we actually abort right at the end of the transaction >> will just cause an unnecessary cache refresh. > notifications serve many more purposes than cache refreshes...it's a > generic 'wake up and do something' to the client. The point to my mind is that the previous implementation guaranteed that failed transactions would not send notifies. I don't think we can just drop that semantic consistency statement and not break applications. Also, as Josh notes, even for cache refresh uses it is *critical* that the notifies not be delivered to listeners till after the sender commits; else you have race conditions where the listeners look for changes before they can see them. So it's difficult to make it much simpler than this anyhow. regards, tom lane