Thread: More thoughts about FE/BE protocol
I've been thinking some more about the FE/BE protocol redesign, specifically the desire to ensure that we can recover from error conditions without losing synchronization. The fact that the existing protocol doesn't do very well at this shows up in several places:* having to drop and restart the connection after a COPYerror* fastpath function calls also lose sync if there's an error* libpq gets terribly confused if it runs out of memoryfor a query result I'm coming around to the idea that the cleanest solution is to require *all* protocol messages, in both directions, to have an initial length word. That is, the general message format would look like<message type> 1 byte<payload length> number of followingbytes (4 bytes MSB-first)... message data as needed ... The advantage of doing it this way is that the recipient can absorb the whole message before starting to parse the contents; then, errors detected while processing the message contents don't cause us to lose protocol synchronization. Also, even if the message is so large as to run the recipient out of memory, it can still use the <payload length> to count off the number of bytes it has to drop before looking for another message. This would make it quite a bit easier for libpq to cope with out-of-memory, as an example. These advantages seem enough to me to justify adding three or four bytes to the length of each message. Does anyone have a problem with that? A related point is that I had been thinking of the new "extended query" facility (separate PARSE/BIND/DESCRIBE/EXECUTE steps) in terms of sending just one big message to the backend per interaction cycle, with the processing-step commands appearing as fields within that message. But putting a length word in front would effectively require the frontend to marshal the whole sequence before sending any of it. It seems better to send each of the processing-step commands as an independent message. To do that, we need to introduce an additional processing step, call it SYNC, that substitutes for the functionality associated with the overall message boundary in the other way. Specifically: * ReadyForQuery (Z) is sent in response to SYNC. * If no BEGIN has been issued, SYNC is the point at which an implicit COMMIT is done. * After an error occurs in an extended-query command, the backend reads and discards messages until it finds a SYNC, thenissues ReadyForQuery and resumes processing messages. This allows the frontend to be certain what has been processedand what hasn't. Comments? regards, tom lane
tgl@sss.pgh.pa.us wrote: > I've been thinking some more about the FE/BE protocol redesign, > specifically the desire to ensure that we can recover from error > conditions without losing synchronization. The fact that the existing > protocol doesn't do very well at this shows up in several places: > * having to drop and restart the connection after a COPY error > * fastpath function calls also lose sync if there's an error > * libpq gets terribly confused if it runs out of memory for > a query result > > I'm coming around to the idea that the cleanest solution is to require > *all* protocol messages, in both directions, to have an initial length > word. That is, the general message format would look like > <message type> 1 byte > <payload length> number of following bytes (4 bytes MSB-first) > ... message data as needed ... > > The advantage of doing it this way is that the recipient can absorb the > whole message before starting to parse the contents; then, errors detected > while processing the message contents don't cause us to lose protocol > synchronization. Also, even if the message is so large as to run the > recipient out of memory, it can still use the <payload length> to count > off the number of bytes it has to drop before looking for another message. > This would make it quite a bit easier for libpq to cope with > out-of-memory, as an example. > > These advantages seem enough to me to justify adding three or four bytes > to the length of each message. Does anyone have a problem with that? > ... Well, just to be a bit contrary, I suggest that the loss of synch problem seems to not be caused by lack of an overall message length word. In the case of fastpath function calls, the length of each parameter is there now, so the backend could already read all the data before doing any error checking. The comment in tcop/fastpath.c:HandleFunctionRequest() says this, but then it loses me when it goes on to say this is impossible because the message can't be read until doing type lookups. I don't understand why, but if this is true, how will an overall length word help? You either read the whole thing into memory before error checking, or you don't. In the case of COPY, what would your overall length word apply to, since the copy data is stream oriented, not message oriented? I don't understand backend error handling, but if the "copy" function loses control when an error occurs (for example, bad data type for a column), I don't see how knowing the overall message or data length helps in this case.
ljb <lbayuk@mindspring.com> writes: > In the case of fastpath function calls, the length of each parameter is > there now, so the backend could already read all the data before doing > any error checking. The comment in tcop/fastpath.c:HandleFunctionRequest() > says this, but then it loses me when it goes on to say this is impossible > because the message can't be read until doing type lookups. Yeah, I was just looking at that. The comment is not quite accurate; the message could be parsed as-is, but it could not be converted into the internal form that is needed (since you need to know if the type is pass-by-ref or not). So one could imagine an implementation that reads the message but just holds it in an internal buffer till it's all read, then goes back and processes the info a second time to detect errors and make the conversion. Then, if you report an error, you don't have a partially-read message still sitting in the input stream. What I'm suggesting is that we could implement that logic in a more straightforward fashion if the "read into a buffer" part is driven by an initial byte count and doesn't have to duplicate the knowledge of the specific layout of the message type. If it were only fastpath involved, I'd say let's just rewrite it and be happy --- but there is exactly this same problem appearing in COPY error recovery, libpq memory-overrun recovery, etc. In all these place it looks like "buffer the message first, parse it later" is the way to go. Rather than having two sets of logic that understand the detailed format of each message type, we should just adjust the protocol to make this painless. Another objection, with perhaps more force, is that this requires the sender to marshal the whole message before sending (or at least be able to precompute its length, but in practice people will probably just assemble the whole message in a buffer). But it turns out that the backend already does that anyway, precisely so that it can be sure it never sends a partial message. And frontends that don't do it that way probably should, for the same reason --- if you fail partway through sending the message, you've got a problem. > In the case of COPY, what would your overall length word apply to, since > the copy data is stream oriented, not message oriented? I've been debating that. We could say that the sender is allowed to chop the COPY datastream into arbitrary-length messages, or we could require the message boundaries to be semantically meaningful (say, one message per data row). I feel the latter is probably cleaner in the long run, but it'd take more adjustment of existing code to do it that way. Any thoughts? > I don't understand > backend error handling, but if the "copy" function loses control when an > error occurs (for example, bad data type for a column), I don't see how > knowing the overall message or data length helps in this case. The point is to not raise an error when there is a partial message remaining in the input stream. If there are whole messages remaining, it's easy to design the main loop to discard COPY-data messages until it sees something it likes (probably, a SYNC message denoting the end of the series of COPY-data messages). If there's a partial message remaining then you are out of sync and there's no good way for the main loop to recover. regards, tom lane
Tom Lane wrote: > I'm coming around to the idea that the cleanest solution is to require > *all* protocol messages, in both directions, to have an initial length > word. That is, the general message format would look like > <message type> 1 byte > <payload length> number of following bytes (4 bytes MSB-first) > ... message data as needed ... Is there any message - speaking from a standpoint of a normal user and not a source hacker WRT postgresql - where knowing the length of the response is either unknown or is expensive (in buffering) to find out ? This would be the only disadvantage I can immediately see. Sorry if I have the wrong end of the stick. Peter
"Peter Galbavy" <peter.galbavy@knowtion.net> writes: > Is there any message - speaking from a standpoint of a normal user and not a > source hacker WRT postgresql - where knowing the length of the response is > either unknown or is expensive (in buffering) to find out ? See my response to ljb --- I think that in practice people assemble each message before sending anyway. If you don't do it that way, you've got a problem with recovering if you hit an error condition after sending a partial message. regards, tom lane
Steve Crawford <scrawford@pinpointresearch.com> writes: > What would be the recovery/re-sync mechanism for those cases where the > message is, either accidentally or maliciously, longer or shorter than the > described length? Once you're out of sync, there's not much to do except abandon the connection. The detection mechanism for this would have two parts: (a) noticing an invalid message type code; (b) some kind of sanity check on the message length field. Also, if we insist that the internal layout of each message still permits detection of the end (eg, we still use null-terminated strings and so on), we could test for bytes being left over in the byte count. > Without proper timeout/recovery mechanisms a too-short message could cause > the receiver to effectively hang. I see no need to try to solve the Byzantine-generals problem here. A malicious attacker who's been able to connect to your database can do lots worse damage than just make the backend hang up. In the years I've been working with Postgres, I've never seen an out-of-sync problem that didn't arise directly from the lack-of-error- recovery deficiencies that this proposal addresses. I don't see any point in complicating the protocol still further to handle failures that don't arise in practice. regards, tom lane
Hannu Krosing <hannu@tm.ee> writes: > Tom Lane kirjutas N, 10.04.2003 kell 16:57: >> See my response to ljb --- I think that in practice people assemble each >> message before sending anyway. > I just tested it by running "select *" on 68M records (6.5 GB data) > table and you seem to be wrong - while psql shows nothing, its size > starts rapidly growing (I ^C it at ~500M) , while backend stays at > stable 32M, which indicates that postgres starts to push data out as > fast as it can get it. Sure. "Message" here is at the granularity of one data row, not an entire query result. > If you hit an error condition after sending a partial message then I'm > in trouble anyway. Assembling the message beforehand just makes hitting > error less likely. But when you assemble the message beforehand, the only possible part-way-through failures are communications failures, for which you may as well abandon the connection anyhow. > I would propose something like X11 protocol (from memory) As I was saying to Steve, I don't want to complicate the protocol more than is needed to handle the problems we have actually had. We don't use unreliable transport mechanisms and are not likely to start doing so in future, so I see no need to invent features to deal with problems that are already solved by the transport mechanism. > Also there should be a way to tell the backend not to send some types of > notices/warnings. We already have that, see client_min_message_level. regards, tom lane
Barry Lind <blind@xythos.com> writes: > When an application needs to do a lot of the same thing (i.e > insert a thousand rows), the applicaiton tells the driver to insert a > 'batch' of 1000 rows instead of performing 1000 regular inserts. This > allows the driver to optimize this operation as one network roundtrip > instead of 1000 roundtrips. > ... How could this be accomplished with the > new FE/BE protocol "extended query" facility? Well, as far as network roundtrips go, it's always been true that you don't really have to wait for the backend's response before sending the next command. The proposal to decouple SYNC from individual commands should make this easier: you fire off N commands "blind", then a SYNC. When the sync response comes back, it's done. If any of the commands fail, all else up to the SYNC will be ignored, so you don't have the problem of commands executing against an unexpected state. (I'm not sure it'd be bright to issue thousands of commands with no SYNC, but certainly reasonable-size batches would be sensible.) As for lots of instances of the same kind of command, you could PARSE the SQL insert command itself just once (with parameter placeholders for the data values), then repeat BIND/EXECUTE pairs as often as you want. That's probably about as efficient as you're going to get without switching to COPY mode. Does that address your concern, or is there more to do? regards, tom lane
Tom Lane wrote: > > Hannu Krosing <hannu@tm.ee> writes: > > Tom Lane kirjutas N, 10.04.2003 kell 16:57: > >> See my response to ljb --- I think that in practice people assemble each > >> message before sending anyway. > > > I just tested it by running "select *" on 68M records (6.5 GB data) > > table and you seem to be wrong - while psql shows nothing, its size > > starts rapidly growing (I ^C it at ~500M) , while backend stays at > > stable 32M, which indicates that postgres starts to push data out as > > fast as it can get it. > > Sure. "Message" here is at the granularity of one data row, not an > entire query result. Could even be smaller since TOASTed items don't get loaded at the row level but rather one after another. So a 68M row consisting of 4 17M fields doesn't require 68M of memory to be sent to the client. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Barry Lind <blind@xythos.com> writes: > This all sounds great. I am really looking to seeing all of this > implemented. Are you still on schedule for getting this into 7.4? I gotta get off my duff and get with it, but I still plan to do it. > And will there be time left for the clients to make the changes > necessary to use the new protocol? Well, that depends how long they take ;-) I had originally planned to do the error message revisions first, but I now think they should be last, or at least the long/tedious part of converting all the backend elog calls should be put off to last. That will provide an interval where all the protocol-level revisions are done and so people can hack on the clients in parallel with the backend work. I'll try to put out a protocol spec document first, which would give you something to shoot at, but I'm not really expecting anyone to work very hard on it until there's backend code to test against ... regards, tom lane
On Fri, 2003-04-11 at 04:15, Tom Lane wrote: > Well, as far as network roundtrips go, it's always been true that you > don't really have to wait for the backend's response before sending the > next command. The proposal to decouple SYNC from individual commands > should make this easier: you fire off N commands "blind", then a SYNC. > When the sync response comes back, it's done. If any of the commands > fail, all else up to the SYNC will be ignored, so you don't have the > problem of commands executing against an unexpected state. Is SYNC going to be a new kind of message? Is the SYNC response yet another? Either way, could this be used as a keep-alive for long-lived connections? (some users of the current Smalltalk drivers report that long lived connections over the Internet sometimes just die) Also, with the new protocol, will the number of affected rows be returned in a way that does not require parsing to fish it out? Thanks,Bruce
On 11 Apr 2003, Bruce Badger wrote: > On Fri, 2003-04-11 at 04:15, Tom Lane wrote: > > > Well, as far as network roundtrips go, it's always been true that you > > don't really have to wait for the backend's response before sending the > > next command. The proposal to decouple SYNC from individual commands > > should make this easier: you fire off N commands "blind", then a SYNC. > > When the sync response comes back, it's done. If any of the commands > > fail, all else up to the SYNC will be ignored, so you don't have the > > problem of commands executing against an unexpected state. > > Is SYNC going to be a new kind of message? Is the SYNC response yet > another? > > Either way, could this be used as a keep-alive for long-lived > connections? (some users of the current Smalltalk drivers report that > long lived connections over the Internet sometimes just die) If there's talk of keep-alive messages, what of keep-alive from server to client. There's been a few reports of backends been left hanging around because clients have dropped the connection or network issues have blocked connections in such a manner that the server hasn't seen the connection close. I believe this is only going to be an issue on systems configured to not use keep-alive packets. So it may be deemed nothing to do with postgresql and if it's an issue the sys/net admin has to get involved. -- Nigel J. Andrews
Bruce Badger <bruce_badger@badgerse.com> writes: > Is SYNC going to be a new kind of message? Is the SYNC response yet > another? Yes; no. SYNC response already exists: it's ReadyForQuery (Z). > Either way, could this be used as a keep-alive for long-lived > connections? (some users of the current Smalltalk drivers report that > long lived connections over the Internet sometimes just die) If you're worried about that, Q with an empty query already suffices, though SYNC will work too. I'd be inclined to think that such breakage isn't our problem though; anyone suffering from it needs to fix their firewall timeouts ... > Also, with the new protocol, will the number of affected rows be > returned in a way that does not require parsing to fish it out? I'm not planning to change the contents of messages more than I have to. What's so hard about parsing "UPDATE nnn" ? regards, tom lane
On Fri, 2003-04-11 at 09:29, Tom Lane wrote: > Bruce Badger <bruce_badger@badgerse.com> writes: > > Is SYNC going to be a new kind of message? Is the SYNC response yet > > another? > > Yes; no. SYNC response already exists: it's ReadyForQuery (Z). > > > Either way, could this be used as a keep-alive for long-lived > > connections? (some users of the current Smalltalk drivers report that > > long lived connections over the Internet sometimes just die) > > If you're worried about that, Q with an empty query already suffices, > though SYNC will work too. > > I'd be inclined to think that such breakage isn't our problem though; > anyone suffering from it needs to fix their firewall timeouts ... Oh, I agree. But not everyone has control over the firewalls they have to work through, and not all admins have enough (any?) motivation to help. A keep-alive could be a useful option in an unhelpful world. > > Also, with the new protocol, will the number of affected rows be > > returned in a way that does not require parsing to fish it out? > > I'm not planning to change the contents of messages more than I have to. > What's so hard about parsing "UPDATE nnn" ? Nothing, of course. However the fewer easy things we *have* to do, the more other things we have time for. Also, some things that could return a row count don't, e.g. SELECT. And to turn the question around: What's so hard about adding in an Int32 in the 'C' (CompletedResponse) message which gives a row count? Then, if people want to display "UPDATE nnn", they can concatenate the string 'UPDATE' with the number - even easier than parsing - at least in Smalltalk :-) ... then, having a code to indicate the kind of thing completed (like the code used in the 'R' (AuthenticationXxx) messages means you could lose the string in the message altogether.
Bruce Badger <bruce_badger@badgerse.com> writes: > On Fri, 2003-04-11 at 09:29, Tom Lane wrote: >> I'm not planning to change the contents of messages more than I have to. >> What's so hard about parsing "UPDATE nnn" ? > Nothing, of course. However the fewer easy things we *have* to do, the > more other things we have time for. The other side of that coin is that making low-value changes takes time away from dealing with the important problems. We're not working in a green field here --- we have existing code that we're planning to change. > Also, some things that could return > a row count don't, e.g. SELECT. But the client has surely already accumulated a row count while collecting the SELECT result. Doesn't seem like there's much value-added to be found there. regards, tom lane
Tom, This all sounds great. I am really looking to seeing all of this implemented. Are you still on schedule for getting this into 7.4? And will there be time left for the clients to make the changes necessary to use the new protocol? thanks, --Barry Tom Lane wrote: > Barry Lind <blind@xythos.com> writes: > >>When an application needs to do a lot of the same thing (i.e >>insert a thousand rows), the applicaiton tells the driver to insert a >>'batch' of 1000 rows instead of performing 1000 regular inserts. This >>allows the driver to optimize this operation as one network roundtrip >>instead of 1000 roundtrips. >>... How could this be accomplished with the >>new FE/BE protocol "extended query" facility? > > > Well, as far as network roundtrips go, it's always been true that you > don't really have to wait for the backend's response before sending the > next command. The proposal to decouple SYNC from individual commands > should make this easier: you fire off N commands "blind", then a SYNC. > When the sync response comes back, it's done. If any of the commands > fail, all else up to the SYNC will be ignored, so you don't have the > problem of commands executing against an unexpected state. > > (I'm not sure it'd be bright to issue thousands of commands with no > SYNC, but certainly reasonable-size batches would be sensible.) > > As for lots of instances of the same kind of command, you could PARSE > the SQL insert command itself just once (with parameter placeholders for > the data values), then repeat BIND/EXECUTE pairs as often as you want. > That's probably about as efficient as you're going to get without > switching to COPY mode. > > Does that address your concern, or is there more to do? > > regards, tom lane >
On Fri, 2003-04-11 at 04:15, Tom Lane wrote: > Well, as far as network roundtrips go, it's always been true that you > don't really have to wait for the backend's response before sending the > next command. The proposal to decouple SYNC from individual commands > should make this easier: you fire off N commands "blind", then a SYNC. > When the sync response comes back, it's done. If any of the commands > fail, all else up to the SYNC will be ignored, so you don't have the > problem of commands executing against an unexpected state. Is SYNC going to be a new kind of message? Is the SYNC response yet another? Either way, could this be used as a keep-alive for long-lived connections? (some users of the current Smalltalk drivers report that long lived connections over the Internet sometimes just die) Also, with the new protocol, will the number of affected rows be returned in a way that does not require parsing to fish it out? Thanks,Bruce
What would be the recovery/re-sync mechanism for those cases where the message is, either accidentally or maliciously, longer or shorter than the described length? Without proper timeout/recovery mechanisms a too-short message could cause the receiver to effectively hang. A too long message could cause overflows, loss of sync or other problems if the receiver attempts to interpret the extra data as the next message header. Cheers, Steve On Wednesday 09 April 2003 3:30 pm, Tom Lane wrote: > I've been thinking some more about the FE/BE protocol redesign, > specifically the desire to ensure that we can recover from error > conditions without losing synchronization. The fact that the existing > protocol doesn't do very well at this shows up in several places: > * having to drop and restart the connection after a COPY error > * fastpath function calls also lose sync if there's an error > * libpq gets terribly confused if it runs out of memory for > a query result > > I'm coming around to the idea that the cleanest solution is to require > *all* protocol messages, in both directions, to have an initial length > word. That is, the general message format would look like > <message type> 1 byte > <payload length> number of following bytes (4 bytes MSB-first) > ... message data as needed ... > > The advantage of doing it this way is that the recipient can absorb the > whole message before starting to parse the contents; then, errors detected > while processing the message contents don't cause us to lose protocol > synchronization. Also, even if the message is so large as to run the > recipient out of memory, it can still use the <payload length> to count > off the number of bytes it has to drop before looking for another message. > This would make it quite a bit easier for libpq to cope with > out-of-memory, as an example. > > These advantages seem enough to me to justify adding three or four bytes > to the length of each message. Does anyone have a problem with that? > > > A related point is that I had been thinking of the new "extended query" > facility (separate PARSE/BIND/DESCRIBE/EXECUTE steps) in terms of > sending just one big message to the backend per interaction cycle, with > the processing-step commands appearing as fields within that message. > But putting a length word in front would effectively require the > frontend to marshal the whole sequence before sending any of it. > It seems better to send each of the processing-step commands as an > independent message. To do that, we need to introduce an additional > processing step, call it SYNC, that substitutes for the functionality > associated with the overall message boundary in the other way. > Specifically: > * ReadyForQuery (Z) is sent in response to SYNC. > * If no BEGIN has been issued, SYNC is the point at which an implicit > COMMIT is done. > * After an error occurs in an extended-query command, the backend reads > and discards messages until it finds a SYNC, then issues ReadyForQuery > and resumes processing messages. This allows the frontend to be > certain what has been processed and what hasn't. > > Comments? > > regards, tom lane > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Tom Lane kirjutas N, 10.04.2003 kell 16:57: > "Peter Galbavy" <peter.galbavy@knowtion.net> writes: > > Is there any message - speaking from a standpoint of a normal user and not a > > source hacker WRT postgresql - where knowing the length of the response is > > either unknown or is expensive (in buffering) to find out ? > > See my response to ljb --- I think that in practice people assemble each > message before sending anyway. I just tested it by running "select *" on 68M records (6.5 GB data) table and you seem to be wrong - while psql shows nothing, its size starts rapidly growing (I ^C it at ~500M) , while backend stays at stable 32M, which indicates that postgres starts to push data out as fast as it can get it. > If you don't do it that way, you've got > a problem with recovering if you hit an error condition after sending a > partial message. If you hit an error condition after sending a partial message then I'm in trouble anyway. Assembling the message beforehand just makes hitting error less likely. I would propose something like X11 protocol (from memory) each request (query) has a serial number. eash response is sent in resonable sized chunks, and each chunk has a serial number of request, chunk number of response and an indicator for EndOfResponse (or perhaps to for Completed or Aborted). This would make resyncing easier as you can safely ignore responsed you dont wan't to see anymore even if for some reasons some of them are still in pipeline. What could be useful (and is often requested) is sending the number of records in response *if it is known without extra work*. Also there should be a way to tell the backend not to send some types of notices/warnings. ----------------- Hannu
Hannu Krosing kirjutas N, 10.04.2003 kell 19:50: > I would propose something like X11 protocol (from memory) > > each request (query) has a serial number. > > eash response is sent in resonable sized chunks, and each chunk has a > serial number of request, chunk number of response and an indicator for > EndOfResponse (or perhaps to for Completed or Aborted). They should both have size as well, as Tom suggested. > This would make resyncing easier as you can safely ignore responsed you > dont wan't to see anymore even if for some reasons some of them are > still in pipeline. > > What could be useful (and is often requested) is sending the number of > records in response *if it is known without extra work*. > > Also there should be a way to tell the backend not to send some types of > notices/warnings. ----------------- Hannu
Tom Lane kirjutas N, 10.04.2003 kell 19:59: > Hannu Krosing <hannu@tm.ee> writes: > > Also there should be a way to tell the backend not to send some types of > > notices/warnings. > > We already have that, see client_min_message_level. What I would like would be possibility to better support interactive applications, namely updating progress bar by sending some extra out-of-band info (expected rows, how much work has backend done, etc) as notices. At the same time I would like to not get some warnings. Thus just having min-level is not good enough. ----------------- Hannu
Tom, This SYNC message reminds me to ask you one thing about the FE/BE protocol redesign. One of the features of the jdbc spec is a "batch" interface. When an application needs to do a lot of the same thing (i.e insert a thousand rows), the applicaiton tells the driver to insert a 'batch' of 1000 rows instead of performing 1000 regular inserts. This allows the driver to optimize this operation as one network roundtrip instead of 1000 roundtrips. The current jdbc driver code doesn't do any optimization here and stupidly still does 1000 roundtrips. However when I or someone has the time this could be optimized in the current BE/FE protocol by concatenating all 1000 individual inserts statements into a single query to be issued to the backend. How could this be accomplished with the new FE/BE protocol "extended query" facility? If I understand the SYNC message, would it just be the case that the client would send the PARSE/BIND/DESCRIBE/EXECUTE messages for all 1000 inserts before issuing a single SYNC? thanks, --Barry Tom Lane wrote: > I've been thinking some more about the FE/BE protocol redesign, > specifically the desire to ensure that we can recover from error > conditions without losing synchronization. The fact that the existing > protocol doesn't do very well at this shows up in several places: > * having to drop and restart the connection after a COPY error > * fastpath function calls also lose sync if there's an error > * libpq gets terribly confused if it runs out of memory for > a query result > > I'm coming around to the idea that the cleanest solution is to require > *all* protocol messages, in both directions, to have an initial length > word. That is, the general message format would look like > <message type> 1 byte > <payload length> number of following bytes (4 bytes MSB-first) > ... message data as needed ... > > The advantage of doing it this way is that the recipient can absorb the > whole message before starting to parse the contents; then, errors detected > while processing the message contents don't cause us to lose protocol > synchronization. Also, even if the message is so large as to run the > recipient out of memory, it can still use the <payload length> to count > off the number of bytes it has to drop before looking for another message. > This would make it quite a bit easier for libpq to cope with > out-of-memory, as an example. > > These advantages seem enough to me to justify adding three or four bytes > to the length of each message. Does anyone have a problem with that? > > > A related point is that I had been thinking of the new "extended query" > facility (separate PARSE/BIND/DESCRIBE/EXECUTE steps) in terms of > sending just one big message to the backend per interaction cycle, with > the processing-step commands appearing as fields within that message. > But putting a length word in front would effectively require the > frontend to marshal the whole sequence before sending any of it. > It seems better to send each of the processing-step commands as an > independent message. To do that, we need to introduce an additional > processing step, call it SYNC, that substitutes for the functionality > associated with the overall message boundary in the other way. > Specifically: > * ReadyForQuery (Z) is sent in response to SYNC. > * If no BEGIN has been issued, SYNC is the point at which an implicit > COMMIT is done. > * After an error occurs in an extended-query command, the backend reads > and discards messages until it finds a SYNC, then issues ReadyForQuery > and resumes processing messages. This allows the frontend to be > certain what has been processed and what hasn't. > > Comments? > > regards, tom lane > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Tom Lane wrote: > Barry Lind <blind@xythos.com> writes: > > This all sounds great. I am really looking to seeing all of this > > implemented. Are you still on schedule for getting this into 7.4? > > I gotta get off my duff and get with it, but I still plan to do it. I am wondering if the wondering if the start of 7.5 may not be a better time for such a massive change, and to give interfaces time to catch up. Also, is Patrick going to need your help for PITR in 7.4? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> I gotta get off my duff and get with it, but I still plan to do it. > I am wondering if the wondering if the start of 7.5 may not be a better > time for such a massive change, and to give interfaces time to catch up. (a) the problems are "swapped in" now; if I don't continue with this work I'll probably forget a lot of the issues. (b) there are some things in 7.3 --- autocommit being a big one --- that we can *not* put off changing, or we'll be stuck with them forevermore. (c) the interfaces do not have to catch up in time for 7.4 release, because we'll still support the old protocol. regards, tom lane
> > >(a) the problems are "swapped in" now; if I don't continue with this >work I'll probably forget a lot of the issues. > >(b) there are some things in 7.3 --- autocommit being a big one --- that we >can *not* put off changing, or we'll be stuck with them forevermore. > >(c) the interfaces do not have to catch up in time for 7.4 release, >because we'll still support the old protocol. > > regards, tom lane > > > What is this "auto-commit" about? Any place where I can read about it? Thanks Wei