Thread: Replication protocol doc fix
The docs currently say (introduced in commit 91fa853): "In the event of a backend-detected error during copy-both mode, the backend will issue an ErrorResponse message, discard frontend messages until a Sync message is received, and then issue ReadyForQuery and return to normal processing." But that doesn't seem to be correct: Sync is only used for the extended query protocol, and CopyBoth can only be initiated with the simple query protocol. So the actual behavior seems to be more like a "COPY FROM STDIN" initiated with the simple query protocol: "In the event of a backend-detected error during copy-in mode (including receipt of a CopyFail message), the backend will issue an ErrorResponse message. ... If the COPY command was issued in a simple Query message, the rest of that message is discarded and ReadyForQuery is issued ... any subsequent CopyData, CopyDone, or CopyFail messages issued by the frontend will simply be dropped." If the client does send a Sync, it results in an extra ReadyForQuery message. Diagnosed and reported by Petros Angelatos (petrosagg on Github). Regards, Jeff Davis
On Thu, Jun 10, 2021 at 9:26 PM Jeff Davis <pgsql@j-davis.com> wrote: > The docs currently say (introduced in commit 91fa853): > > "In the event of a backend-detected error during copy-both mode, the > backend will issue an ErrorResponse message, discard frontend messages > until a Sync message is received, and then issue ReadyForQuery and > return to normal processing." > > But that doesn't seem to be correct: Sync is only used for the extended > query protocol, and CopyBoth can only be initiated with the simple > query protocol. My impression was that CopyBoth can be initiated either way, but if you use the extended query protocol, then the result is a hopeless mess, because the protocol is badly designed: https://www.postgresql.org/message-id/CA+Tgmoa4eA+cPXaiGQmEBp9XisVd3ZE9dbvnbZEvx9UcMiw2tg@mail.gmail.com But I think you're correct in saying that the discard-until-Sync behavior only happens if the extended query protocol is used, so I agree that the current text is wrong. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, 2021-06-11 at 16:57 -0400, Robert Haas wrote: > My impression was that CopyBoth can be initiated either way, The docs say: "In either physical replication or logical replication walsender mode, only the simple query protocol can be used." Is there some way to initiate CopyBoth outside of walsender? > but if > you use the extended query protocol, then the result is a hopeless > mess, because the protocol is badly designed: > > https://www.postgresql.org/message-id/CA+Tgmoa4eA+cPXaiGQmEBp9XisVd3ZE9dbvnbZEvx9UcMiw2tg@mail.gmail.com It seems like you're saying that CopyIn and CopyBoth are both equally broken in extended query mode. Is that right? > But I think you're correct in saying that the discard-until-Sync > behavior only happens if the extended query protocol is used, so I > agree that the current text is wrong. Should we just document how CopyBoth works with the simple query protocol, or should we make it match the CopyIn docs? Regards, Jeff Davis
On Fri, Jun 11, 2021 at 6:12 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2021-06-11 at 16:57 -0400, Robert Haas wrote: > > My impression was that CopyBoth can be initiated either way, > > The docs say: "In either physical replication or logical replication > walsender mode, only the simple query protocol can be used." Is there > some way to initiate CopyBoth outside of walsender? Currently, no, or at least not to my knowledge. I just meant that there seems to be nothing in the protocol specification which prevents CopyBothResponse from being sent in response to a query sent using the extended protocol. > > but if > > you use the extended query protocol, then the result is a hopeless > > mess, because the protocol is badly designed: > > > https://www.postgresql.org/message-id/CA+Tgmoa4eA+cPXaiGQmEBp9XisVd3ZE9dbvnbZEvx9UcMiw2tg@mail.gmail.com > > It seems like you're saying that CopyIn and CopyBoth are both equally > broken in extended query mode. Is that right? Yeah. > > But I think you're correct in saying that the discard-until-Sync > > behavior only happens if the extended query protocol is used, so I > > agree that the current text is wrong. > > Should we just document how CopyBoth works with the simple query > protocol, or should we make it match the CopyIn docs? I think it would make sense to make it match the CopyIn docs. Possibly the CopyOut docs should be made more similar as well. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, 2021-06-14 at 10:51 -0400, Robert Haas wrote: > but if > > > you use the extended query protocol, then the result is a > > > hopeless > > > mess, because the protocol is badly designed: > > > After looking in more detail, I think I understand a bit better. Clients don't differentiate between: * A normal command, where you know that you've sent everything that you will send. In this case, the client needs to send the Sync message in order to get the ReadyForQuery message. * A command that initiates CopyIn/CopyBoth, where you are going to send more data after the command. In this case, sending the Sync eagerly is wrong, and you can't pipeline more queries in the middle of CopyIn/CopyBoth mode. Instead, the client should send Sync after receiving an ErrorResponse, or after sending a CopyDone/CopyFail (right?). One thing I don't fully understand is what would happen if the client issued the Sync as the *first* message in an extended-protocol series. > > > But I think you're correct in saying that the discard-until-Sync > > > behavior only happens if the extended query protocol is used, so > > > I > > > agree that the current text is wrong. > > > > Should we just document how CopyBoth works with the simple query > > protocol, or should we make it match the CopyIn docs? > > I think it would make sense to make it match the CopyIn docs. > Possibly > the CopyOut docs should be made more similar as well. I attached a doc patch that hopefully clarifies this point as well as the weirdness around CopyIn/CopyBoth and the extended protocol. I reorganized the sections, as well. Regards, Jeff Davis
Attachment
Jeff Davis <pgsql@j-davis.com> writes: > One thing I don't fully understand is what would happen if the client > issued the Sync as the *first* message in an extended-protocol series. That'd cause the backend to send ReadyForQuery, which'd likely confuse the client. > But I think you're correct in saying that the discard-until-Sync > behavior only happens if the extended query protocol is used, Certainly, because otherwise there is no Sync. regards, tom lane
On Wed, Jun 16, 2021 at 5:15 PM Jeff Davis <pgsql@j-davis.com> wrote: > * A normal command, where you know that you've sent everything that you > will send. In this case, the client needs to send the Sync message in > order to get the ReadyForQuery message. > > * A command that initiates CopyIn/CopyBoth, where you are going to send > more data after the command. In this case, sending the Sync eagerly is > wrong, and you can't pipeline more queries in the middle of > CopyIn/CopyBoth mode. Instead, the client should send Sync after > receiving an ErrorResponse, or after sending a CopyDone/CopyFail > (right?). Well, that's one view of it. I would argue that the protocol ought not to be designed in such a way that the client has to guess what response the server might send back. How is it supposed to know? If the user says, hey, go run this via the extended query protocol, we don't want libpq to have to try to parse the query text and figure out whether it looks COPY-ish. That's expensive, hacky, and might create cross-version compatibility hazards if, say, a new replication command that uses the copy protocol is added. Nor do we want the user to have to specify what it thinks the server is going to do. Right now, we have this odd situation where the client indeed does not try to guess what the server will do and always send Sync, but the server acts as if the client is doing what you propose here - only sending the CopyDone/CopyFail at the end of everything associated with the command. > One thing I don't fully understand is what would happen if the client > issued the Sync as the *first* message in an extended-protocol series. I don't think that will break anything, because I think you can send a Sync message to try to reestablish protocol synchronization whenever you want. But I don't think it will accomplish anything either, because presumably you've already got protocol synchronization at the beginning of the sequence. The tricky part is getting resynchronized after you've done some stuff. > I attached a doc patch that hopefully clarifies this point as well as > the weirdness around CopyIn/CopyBoth and the extended protocol. I > reorganized the sections, as well. On a casual read-through this seems pretty reasonable, but it essentially documents that libpq is doing the wrong thing by sending Sync unconditionally. As I say above, I disagree with that from a philosophical perspective. Then again, unless we're willing to redefine the wire protocol, I don't have an alternative to offer. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, 2021-06-17 at 12:42 -0400, Robert Haas wrote: > On a casual read-through this seems pretty reasonable, but it > essentially documents that libpq is doing the wrong thing by sending > Sync unconditionally. As I say above, I disagree with that from a > philosophical perspective. Then again, unless we're willing to > redefine the wire protocol, I don't have an alternative to offer. What if we simply mandate that a Sync must be sent before the server will respond with CopyInResponse/CopyBothResponse, and the client must send another Sync after CopyDone/CopyFail (or after receiving an ErrorResponse, if the client isn't going to send a CopyDone/CopyFail)? This will follow what libpq is already doing today, as far as I can tell, and it will leave the server in a definite state. In theory, it could break a client that issues Parse+Bind+Execute for a CopyIn/CopyBoth command without a Sync, but I'm not sure there are any clients that do that, and it's arguable whether the documentation permitted that or not anyway. I hacked together a quick patch; attached. Regards, Jeff Davis
Attachment
On Thu, Jun 17, 2021 at 7:37 PM Jeff Davis <pgsql@j-davis.com> wrote: > What if we simply mandate that a Sync must be sent before the server > will respond with CopyInResponse/CopyBothResponse, and the client must > send another Sync after CopyDone/CopyFail (or after receiving an > ErrorResponse, if the client isn't going to send a CopyDone/CopyFail)? I am not sure whether this works or not. Holding off cancel interrupts across possible network I/O seems like a non-starter. We have to be able to kill off connections that have wedged. Also, if we have to postpone sending ErrorResponse until we see the Sync, that's also bad: I think we need to be able to error out whenever. But, hmm, maybe it's OK to send ErrorResponse either before or after sending Copy{In,Both}Response. Then the client knows that if ErrorResponse shows up before Copy{In,Both}Response, the server sent it before consuming the Sync and will stop skipping messages when it sees the Sync; whereas if the ErrorResponse shows up after the Copy{In,Both}Response then the client knows the Sync was eaten and it has to send another one. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, 2021-06-30 at 12:25 -0400, Robert Haas wrote: > I am not sure whether this works or not. Holding off cancel > interrupts > across possible network I/O seems like a non-starter. We have to be > able to kill off connections that have wedged. I was following a pattern that I saw in CopyGetData() and SocketBackend(). If I understand correctly, the idea is to avoid a cancel leaving part of a message unread, which would desync the protocol. > Also, if we have to > postpone sending ErrorResponse until we see the Sync, that's also > bad: > I think we need to be able to error out whenever. I think we could still send an ErrorResponse whenever we want, and then just ignore messages until we get a Sync (just like for an ordinary extended protocol sequence). > But, hmm, maybe it's > OK to send ErrorResponse either before or after sending > Copy{In,Both}Response. Then the client knows that if ErrorResponse > shows up before Copy{In,Both}Response, the server sent it before > consuming the Sync and will stop skipping messages when it sees the > Sync; whereas if the ErrorResponse shows up after the > Copy{In,Both}Response then the client knows the Sync was eaten and it > has to send another one. That's what I had in mind. Regards, Jeff Davis
On Fri, Jul 2, 2021 at 1:55 AM Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2021-06-30 at 12:25 -0400, Robert Haas wrote: > > I am not sure whether this works or not. Holding off cancel > > interrupts > > across possible network I/O seems like a non-starter. We have to be > > able to kill off connections that have wedged. > > I was following a pattern that I saw in CopyGetData() and > SocketBackend(). If I understand correctly, the idea is to avoid a > cancel leaving part of a message unread, which would desync the > protocol. Right, that seems like a good goal. Thinking about this a little more, it's only holding off *cancel* interrupts, not *all* interrupts, so presumably you can still terminate the backend in this state. That's not so bad, and it's not clear how we could do any better. So I withdraw my previous complaint about this point. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, 2021-07-02 at 08:44 -0400, Robert Haas wrote: > Right, that seems like a good goal. Thinking about this a little > more, > it's only holding off *cancel* interrupts, not *all* interrupts, so > presumably you can still terminate the backend in this state. That's > not so bad, and it's not clear how we could do any better. So I > withdraw my previous complaint about this point. Further thoughts on this? I don't feel comfortable making this change without a stronger endorsement. Regards, Jeff Davis
Hi, On 2021-06-17 16:37:51 -0700, Jeff Davis wrote: > In theory, it could break a client that issues Parse+Bind+Execute for a > CopyIn/CopyBoth command without a Sync, but I'm not sure there are any > clients that do that, and it's arguable whether the documentation > permitted that or not anyway. I'm worried about that breaking things and us only noticing down the road. This doesn't fix a problem that we are actively hitting, and as you say it's arguably compliant to do it differently. Potential protocol incompatibilities are a dangerous area. I think before doing something like this we ought to at least verify that the most popular native drivers won't have a problem with the change. Maybe pgjdbc, npgsql, the popular go ones and rust-postgres? Greetings, Andres Freund
The commitfest CI times out on all platforms and never finishes when running make check with this patch, so unless the patch is dropped due to concerns already raised then that seems like a good thing to fix. -- Daniel Gustafsson https://vmware.com/
> On 3 Nov 2021, at 12:14, Daniel Gustafsson <daniel@yesql.se> wrote: > > The commitfest CI times out on all platforms and never finishes when running > make check with this patch, so unless the patch is dropped due to concerns > already raised then that seems like a good thing to fix. As the thread has stalled, I'm marking this Returned with Feedback. Please feel free to resubmit when/if a new patch is available. -- Daniel Gustafsson https://vmware.com/