Thread: More thoughts about FE/BE protocol

More thoughts about FE/BE protocol

From
Tom Lane
Date:
I've been thinking some more about the FE/BE protocol redesign,
specifically the desire to ensure that we can recover from error
conditions without losing synchronization.  The fact that the existing
protocol doesn't do very well at this shows up in several places:* having to drop and restart the connection after a
COPYerror* fastpath function calls also lose sync if there's an error* libpq gets terribly confused if it runs out of
memoryfor  a query result
 

I'm coming around to the idea that the cleanest solution is to require
*all* protocol messages, in both directions, to have an initial length
word.  That is, the general message format would look like<message type>        1 byte<payload length>    number of
followingbytes (4 bytes MSB-first)... message data as needed ...
 

The advantage of doing it this way is that the recipient can absorb the
whole message before starting to parse the contents; then, errors detected
while processing the message contents don't cause us to lose protocol
synchronization.  Also, even if the message is so large as to run the
recipient out of memory, it can still use the <payload length> to count
off the number of bytes it has to drop before looking for another message.
This would make it quite a bit easier for libpq to cope with
out-of-memory, as an example.

These advantages seem enough to me to justify adding three or four bytes
to the length of each message.  Does anyone have a problem with that?


A related point is that I had been thinking of the new "extended query"
facility (separate PARSE/BIND/DESCRIBE/EXECUTE steps) in terms of
sending just one big message to the backend per interaction cycle, with
the processing-step commands appearing as fields within that message.
But putting a length word in front would effectively require the
frontend to marshal the whole sequence before sending any of it.
It seems better to send each of the processing-step commands as an
independent message.  To do that, we need to introduce an additional
processing step, call it SYNC, that substitutes for the functionality
associated with the overall message boundary in the other way.
Specifically:
* ReadyForQuery (Z) is sent in response to SYNC.
* If no BEGIN has been issued, SYNC is the point at which an implicit COMMIT is done.
* After an error occurs in an extended-query command, the backend reads and discards messages until it finds a SYNC,
thenissues ReadyForQuery and resumes processing messages.  This allows the frontend to be certain what has been
processedand what hasn't.
 

Comments?
        regards, tom lane



Re: More thoughts about FE/BE protocol

From
ljb
Date:
tgl@sss.pgh.pa.us wrote:
> I've been thinking some more about the FE/BE protocol redesign,
> specifically the desire to ensure that we can recover from error
> conditions without losing synchronization.  The fact that the existing
> protocol doesn't do very well at this shows up in several places:
>     * having to drop and restart the connection after a COPY error
>     * fastpath function calls also lose sync if there's an error
>     * libpq gets terribly confused if it runs out of memory for
>       a query result
> 
> I'm coming around to the idea that the cleanest solution is to require
> *all* protocol messages, in both directions, to have an initial length
> word.  That is, the general message format would look like
>     <message type>        1 byte
>     <payload length>    number of following bytes (4 bytes MSB-first)
>     ... message data as needed ...
> 
> The advantage of doing it this way is that the recipient can absorb the
> whole message before starting to parse the contents; then, errors detected
> while processing the message contents don't cause us to lose protocol
> synchronization.  Also, even if the message is so large as to run the
> recipient out of memory, it can still use the <payload length> to count
> off the number of bytes it has to drop before looking for another message.
> This would make it quite a bit easier for libpq to cope with
> out-of-memory, as an example.
> 
> These advantages seem enough to me to justify adding three or four bytes
> to the length of each message.  Does anyone have a problem with that?
> ...

Well, just to be a bit contrary, I suggest that the loss of synch problem
seems to not be caused by lack of an overall message length word.

In the case of fastpath function calls, the length of each parameter is
there now, so the backend could already read all the data before doing
any error checking.  The comment in tcop/fastpath.c:HandleFunctionRequest()
says this, but then it loses me when it goes on to say this is impossible
because the message can't be read until doing type lookups. I don't
understand why, but if this is true, how will an overall length word help?
You either read the whole thing into memory before error checking, or you don't.

In the case of COPY, what would your overall length word apply to, since
the copy data is stream oriented, not message oriented? I don't understand
backend error handling, but if the "copy" function loses control when an
error occurs (for example, bad data type for a column), I don't see how
knowing the overall message or data length helps in this case.



Re: More thoughts about FE/BE protocol

From
Tom Lane
Date:
ljb <lbayuk@mindspring.com> writes:
> In the case of fastpath function calls, the length of each parameter is
> there now, so the backend could already read all the data before doing
> any error checking.  The comment in tcop/fastpath.c:HandleFunctionRequest()
> says this, but then it loses me when it goes on to say this is impossible
> because the message can't be read until doing type lookups.

Yeah, I was just looking at that.  The comment is not quite accurate;
the message could be parsed as-is, but it could not be converted into
the internal form that is needed (since you need to know if the type is
pass-by-ref or not).  So one could imagine an implementation that reads
the message but just holds it in an internal buffer till it's all read,
then goes back and processes the info a second time to detect errors
and make the conversion.  Then, if you report an error, you don't have
a partially-read message still sitting in the input stream.

What I'm suggesting is that we could implement that logic in a more
straightforward fashion if the "read into a buffer" part is driven by
an initial byte count and doesn't have to duplicate the knowledge of the
specific layout of the message type.  If it were only fastpath involved,
I'd say let's just rewrite it and be happy --- but there is exactly this
same problem appearing in COPY error recovery, libpq memory-overrun
recovery, etc.  In all these place it looks like "buffer the message
first, parse it later" is the way to go.  Rather than having two sets
of logic that understand the detailed format of each message type, we
should just adjust the protocol to make this painless.

Another objection, with perhaps more force, is that this requires the
sender to marshal the whole message before sending (or at least be able
to precompute its length, but in practice people will probably just
assemble the whole message in a buffer).  But it turns out that the
backend already does that anyway, precisely so that it can be sure it
never sends a partial message.  And frontends that don't do it that way
probably should, for the same reason --- if you fail partway through
sending the message, you've got a problem.

> In the case of COPY, what would your overall length word apply to, since
> the copy data is stream oriented, not message oriented?

I've been debating that.  We could say that the sender is allowed to
chop the COPY datastream into arbitrary-length messages, or we could
require the message boundaries to be semantically meaningful (say, one
message per data row).  I feel the latter is probably cleaner in the
long run, but it'd take more adjustment of existing code to do it that
way.  Any thoughts?

> I don't understand
> backend error handling, but if the "copy" function loses control when an
> error occurs (for example, bad data type for a column), I don't see how
> knowing the overall message or data length helps in this case.

The point is to not raise an error when there is a partial message
remaining in the input stream.  If there are whole messages remaining,
it's easy to design the main loop to discard COPY-data messages until it
sees something it likes (probably, a SYNC message denoting the end of
the series of COPY-data messages).  If there's a partial message
remaining then you are out of sync and there's no good way for the main
loop to recover.
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
"Peter Galbavy"
Date:
Tom Lane wrote:
> I'm coming around to the idea that the cleanest solution is to require
> *all* protocol messages, in both directions, to have an initial length
> word.  That is, the general message format would look like
> <message type> 1 byte
> <payload length> number of following bytes (4 bytes MSB-first)
> ... message data as needed ...

Is there any message - speaking from a standpoint of a normal user and not a
source hacker WRT postgresql - where knowing the length of the response is
either unknown or is expensive (in buffering) to find out ? This would be
the only disadvantage I can immediately see.

Sorry if I have the wrong end of the stick.

Peter



Re: [HACKERS] More thoughts about FE/BE protocol

From
Tom Lane
Date:
"Peter Galbavy" <peter.galbavy@knowtion.net> writes:
> Is there any message - speaking from a standpoint of a normal user and not a
> source hacker WRT postgresql - where knowing the length of the response is
> either unknown or is expensive (in buffering) to find out ?

See my response to ljb --- I think that in practice people assemble each
message before sending anyway.  If you don't do it that way, you've got
a problem with recovering if you hit an error condition after sending a
partial message.
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
Tom Lane
Date:
Steve Crawford <scrawford@pinpointresearch.com> writes:
> What would be the recovery/re-sync mechanism for those cases where the 
> message is, either accidentally or maliciously, longer or shorter than the 
> described length?

Once you're out of sync, there's not much to do except abandon the
connection.  The detection mechanism for this would have two parts:
(a) noticing an invalid message type code; (b) some kind of sanity
check on the message length field.  Also, if we insist that the internal
layout of each message still permits detection of the end (eg, we still
use null-terminated strings and so on), we could test for bytes being
left over in the byte count.

> Without proper timeout/recovery mechanisms a too-short message could cause 
> the receiver to effectively hang.

I see no need to try to solve the Byzantine-generals problem here.
A malicious attacker who's been able to connect to your database can
do lots worse damage than just make the backend hang up.

In the years I've been working with Postgres, I've never seen an
out-of-sync problem that didn't arise directly from the lack-of-error-
recovery deficiencies that this proposal addresses.  I don't see any
point in complicating the protocol still further to handle failures that
don't arise in practice.
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> Tom Lane kirjutas N, 10.04.2003 kell 16:57:
>> See my response to ljb --- I think that in practice people assemble each
>> message before sending anyway. 

> I just tested it by running "select *" on 68M records (6.5 GB data)
> table and you seem to be wrong - while psql shows nothing, its size
> starts rapidly growing (I ^C it at ~500M) , while backend stays at
> stable 32M, which indicates that postgres starts to push data out as
> fast as it can get it.

Sure.  "Message" here is at the granularity of one data row, not an
entire query result.

> If you hit an error condition after sending a partial message then I'm
> in trouble anyway. Assembling the message beforehand just makes hitting
> error less likely.

But when you assemble the message beforehand, the only possible
part-way-through failures are communications failures, for which you may
as well abandon the connection anyhow.

> I would propose something like X11 protocol (from memory)

As I was saying to Steve, I don't want to complicate the protocol more
than is needed to handle the problems we have actually had.  We don't
use unreliable transport mechanisms and are not likely to start doing so
in future, so I see no need to invent features to deal with problems
that are already solved by the transport mechanism.

> Also there should be a way to tell the backend not to send some types of
> notices/warnings.

We already have that, see client_min_message_level.
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
Tom Lane
Date:
Barry Lind <blind@xythos.com> writes:
> When an application needs to do a lot of the same thing (i.e 
> insert a thousand rows), the applicaiton tells the driver to insert a 
> 'batch' of 1000 rows instead of performing 1000 regular inserts. This 
> allows the driver to optimize this operation as one network roundtrip 
> instead of 1000 roundtrips.
> ... How could this be accomplished with the 
> new FE/BE protocol "extended query" facility?

Well, as far as network roundtrips go, it's always been true that you
don't really have to wait for the backend's response before sending the
next command.  The proposal to decouple SYNC from individual commands
should make this easier: you fire off N commands "blind", then a SYNC.
When the sync response comes back, it's done.  If any of the commands
fail, all else up to the SYNC will be ignored, so you don't have the
problem of commands executing against an unexpected state.

(I'm not sure it'd be bright to issue thousands of commands with no
SYNC, but certainly reasonable-size batches would be sensible.)

As for lots of instances of the same kind of command, you could PARSE
the SQL insert command itself just once (with parameter placeholders for
the data values), then repeat BIND/EXECUTE pairs as often as you want.
That's probably about as efficient as you're going to get without
switching to COPY mode.

Does that address your concern, or is there more to do?
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
Jan Wieck
Date:
Tom Lane wrote:
> 
> Hannu Krosing <hannu@tm.ee> writes:
> > Tom Lane kirjutas N, 10.04.2003 kell 16:57:
> >> See my response to ljb --- I think that in practice people assemble each
> >> message before sending anyway.
> 
> > I just tested it by running "select *" on 68M records (6.5 GB data)
> > table and you seem to be wrong - while psql shows nothing, its size
> > starts rapidly growing (I ^C it at ~500M) , while backend stays at
> > stable 32M, which indicates that postgres starts to push data out as
> > fast as it can get it.
> 
> Sure.  "Message" here is at the granularity of one data row, not an
> entire query result.

Could even be smaller since TOASTed items don't get loaded at the row
level but rather one after another. So a 68M row consisting of 4 17M
fields doesn't require 68M of memory to be sent to the client.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: [HACKERS] More thoughts about FE/BE protocol

From
Tom Lane
Date:
Barry Lind <blind@xythos.com> writes:
> This all sounds great.  I am really looking to seeing all of this 
> implemented.  Are you still on schedule for getting this into 7.4?

I gotta get off my duff and get with it, but I still plan to do it.

> And will there be time left for the clients to make the changes
> necessary to use the new protocol?

Well, that depends how long they take ;-)

I had originally planned to do the error message revisions first, but
I now think they should be last, or at least the long/tedious part of
converting all the backend elog calls should be put off to last.  That
will provide an interval where all the protocol-level revisions are done
and so people can hack on the clients in parallel with the backend work.

I'll try to put out a protocol spec document first, which would give you
something to shoot at, but I'm not really expecting anyone to work very
hard on it until there's backend code to test against ...
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
Bruce Badger
Date:
On Fri, 2003-04-11 at 04:15, Tom Lane wrote:

> Well, as far as network roundtrips go, it's always been true that you
> don't really have to wait for the backend's response before sending the
> next command.  The proposal to decouple SYNC from individual commands
> should make this easier: you fire off N commands "blind", then a SYNC.
> When the sync response comes back, it's done.  If any of the commands
> fail, all else up to the SYNC will be ignored, so you don't have the
> problem of commands executing against an unexpected state.

Is SYNC going to be a new kind of message?  Is the SYNC response yet
another?

Either way, could this be used as a keep-alive for long-lived
connections?  (some users of the current Smalltalk drivers report that
long lived connections over the Internet sometimes just die)

Also, with the new protocol, will the number of affected rows be 
returned in a way that does not require parsing to fish it out?

Thanks,Bruce



Re: [HACKERS] More thoughts about FE/BE protocol

From
"Nigel J. Andrews"
Date:
On 11 Apr 2003, Bruce Badger wrote:

> On Fri, 2003-04-11 at 04:15, Tom Lane wrote:
> 
> > Well, as far as network roundtrips go, it's always been true that you
> > don't really have to wait for the backend's response before sending the
> > next command.  The proposal to decouple SYNC from individual commands
> > should make this easier: you fire off N commands "blind", then a SYNC.
> > When the sync response comes back, it's done.  If any of the commands
> > fail, all else up to the SYNC will be ignored, so you don't have the
> > problem of commands executing against an unexpected state.
> 
> Is SYNC going to be a new kind of message?  Is the SYNC response yet
> another?
> 
> Either way, could this be used as a keep-alive for long-lived
> connections?  (some users of the current Smalltalk drivers report that
> long lived connections over the Internet sometimes just die)

If there's talk of keep-alive messages, what of keep-alive from server to
client.

There's been a few reports of backends been left hanging around because clients
have dropped the connection or network issues have blocked connections in such
a manner that the server hasn't seen the connection close. I believe this is
only going to be an issue on systems configured to not use keep-alive packets.
So it may be deemed nothing to do with postgresql and if it's an issue the
sys/net admin has to get involved.


-- 
Nigel J. Andrews



Re: [HACKERS] More thoughts about FE/BE protocol

From
Tom Lane
Date:
Bruce Badger <bruce_badger@badgerse.com> writes:
> Is SYNC going to be a new kind of message?  Is the SYNC response yet
> another?

Yes; no.  SYNC response already exists: it's ReadyForQuery (Z).

> Either way, could this be used as a keep-alive for long-lived
> connections?  (some users of the current Smalltalk drivers report that
> long lived connections over the Internet sometimes just die)

If you're worried about that, Q with an empty query already suffices,
though SYNC will work too.

I'd be inclined to think that such breakage isn't our problem though;
anyone suffering from it needs to fix their firewall timeouts ...

> Also, with the new protocol, will the number of affected rows be 
> returned in a way that does not require parsing to fish it out?

I'm not planning to change the contents of messages more than I have to.
What's so hard about parsing "UPDATE nnn" ?
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
Bruce Badger
Date:
On Fri, 2003-04-11 at 09:29, Tom Lane wrote:
> Bruce Badger <bruce_badger@badgerse.com> writes:
> > Is SYNC going to be a new kind of message?  Is the SYNC response yet
> > another?
> 
> Yes; no.  SYNC response already exists: it's ReadyForQuery (Z).
> 
> > Either way, could this be used as a keep-alive for long-lived
> > connections?  (some users of the current Smalltalk drivers report that
> > long lived connections over the Internet sometimes just die)
> 
> If you're worried about that, Q with an empty query already suffices,
> though SYNC will work too.
> 
> I'd be inclined to think that such breakage isn't our problem though;
> anyone suffering from it needs to fix their firewall timeouts ...

Oh, I agree.  But not everyone has control over the firewalls they have
to work through, and not all admins have enough (any?) motivation to
help.  A keep-alive could be a useful option in an unhelpful world.

> > Also, with the new protocol, will the number of affected rows be 
> > returned in a way that does not require parsing to fish it out?
> 
> I'm not planning to change the contents of messages more than I have to.
> What's so hard about parsing "UPDATE nnn" ?

Nothing, of course.  However the fewer easy things we *have* to do, the
more other things we have time for.  Also, some things that could return
a row count don't, e.g. SELECT.

And to turn the question around: What's so hard about adding in an Int32
in the 'C' (CompletedResponse) message which gives a row count?  Then,
if people want to display "UPDATE nnn", they can concatenate the string
'UPDATE' with the number - even easier than parsing - at least in
Smalltalk :-)

... then, having a code to indicate the kind of thing completed (like
the code used in the 'R' (AuthenticationXxx) messages means you could
lose the string in the message altogether.



Re: [HACKERS] More thoughts about FE/BE protocol

From
Tom Lane
Date:
Bruce Badger <bruce_badger@badgerse.com> writes:
> On Fri, 2003-04-11 at 09:29, Tom Lane wrote:
>> I'm not planning to change the contents of messages more than I have to.
>> What's so hard about parsing "UPDATE nnn" ?

> Nothing, of course.  However the fewer easy things we *have* to do, the
> more other things we have time for.

The other side of that coin is that making low-value changes takes time
away from dealing with the important problems.  We're not working in a
green field here --- we have existing code that we're planning to change.

> Also, some things that could return
> a row count don't, e.g. SELECT.

But the client has surely already accumulated a row count while
collecting the SELECT result.  Doesn't seem like there's much
value-added to be found there.
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
Barry Lind
Date:
Tom,

This all sounds great.  I am really looking to seeing all of this 
implemented.  Are you still on schedule for getting this into 7.4?  And 
will there be time left for the clients to make the changes necessary to 
use the new protocol?

thanks,
--Barry


Tom Lane wrote:
> Barry Lind <blind@xythos.com> writes:
> 
>>When an application needs to do a lot of the same thing (i.e 
>>insert a thousand rows), the applicaiton tells the driver to insert a 
>>'batch' of 1000 rows instead of performing 1000 regular inserts. This 
>>allows the driver to optimize this operation as one network roundtrip 
>>instead of 1000 roundtrips.
>>... How could this be accomplished with the 
>>new FE/BE protocol "extended query" facility?
> 
> 
> Well, as far as network roundtrips go, it's always been true that you
> don't really have to wait for the backend's response before sending the
> next command.  The proposal to decouple SYNC from individual commands
> should make this easier: you fire off N commands "blind", then a SYNC.
> When the sync response comes back, it's done.  If any of the commands
> fail, all else up to the SYNC will be ignored, so you don't have the
> problem of commands executing against an unexpected state.
> 
> (I'm not sure it'd be bright to issue thousands of commands with no
> SYNC, but certainly reasonable-size batches would be sensible.)
> 
> As for lots of instances of the same kind of command, you could PARSE
> the SQL insert command itself just once (with parameter placeholders for
> the data values), then repeat BIND/EXECUTE pairs as often as you want.
> That's probably about as efficient as you're going to get without
> switching to COPY mode.
> 
> Does that address your concern, or is there more to do?
> 
>             regards, tom lane
> 



Re: [HACKERS] More thoughts about FE/BE protocol

From
Bruce Badger
Date:
On Fri, 2003-04-11 at 04:15, Tom Lane wrote:

> Well, as far as network roundtrips go, it's always been true that you
> don't really have to wait for the backend's response before sending the
> next command.  The proposal to decouple SYNC from individual commands
> should make this easier: you fire off N commands "blind", then a SYNC.
> When the sync response comes back, it's done.  If any of the commands
> fail, all else up to the SYNC will be ignored, so you don't have the
> problem of commands executing against an unexpected state.

Is SYNC going to be a new kind of message?  Is the SYNC response yet
another?

Either way, could this be used as a keep-alive for long-lived
connections?  (some users of the current Smalltalk drivers report that
long lived connections over the Internet sometimes just die)

Also, with the new protocol, will the number of affected rows be
returned in a way that does not require parsing to fish it out?

Thanks,Bruce


Re: [HACKERS] More thoughts about FE/BE protocol

From
Steve Crawford
Date:
What would be the recovery/re-sync mechanism for those cases where the
message is, either accidentally or maliciously, longer or shorter than the
described length?

Without proper timeout/recovery mechanisms a too-short message could cause
the receiver to effectively hang. A too long message could cause overflows,
loss of sync or other problems if the receiver attempts to interpret the
extra data as the next message header.

Cheers,
Steve

On Wednesday 09 April 2003 3:30 pm, Tom Lane wrote:
> I've been thinking some more about the FE/BE protocol redesign,
> specifically the desire to ensure that we can recover from error
> conditions without losing synchronization.  The fact that the existing
> protocol doesn't do very well at this shows up in several places:
>     * having to drop and restart the connection after a COPY error
>     * fastpath function calls also lose sync if there's an error
>     * libpq gets terribly confused if it runs out of memory for
>       a query result
>
> I'm coming around to the idea that the cleanest solution is to require
> *all* protocol messages, in both directions, to have an initial length
> word.  That is, the general message format would look like
>     <message type>        1 byte
>     <payload length>    number of following bytes (4 bytes MSB-first)
>     ... message data as needed ...
>
> The advantage of doing it this way is that the recipient can absorb the
> whole message before starting to parse the contents; then, errors detected
> while processing the message contents don't cause us to lose protocol
> synchronization.  Also, even if the message is so large as to run the
> recipient out of memory, it can still use the <payload length> to count
> off the number of bytes it has to drop before looking for another message.
> This would make it quite a bit easier for libpq to cope with
> out-of-memory, as an example.
>
> These advantages seem enough to me to justify adding three or four bytes
> to the length of each message.  Does anyone have a problem with that?
>
>
> A related point is that I had been thinking of the new "extended query"
> facility (separate PARSE/BIND/DESCRIBE/EXECUTE steps) in terms of
> sending just one big message to the backend per interaction cycle, with
> the processing-step commands appearing as fields within that message.
> But putting a length word in front would effectively require the
> frontend to marshal the whole sequence before sending any of it.
> It seems better to send each of the processing-step commands as an
> independent message.  To do that, we need to introduce an additional
> processing step, call it SYNC, that substitutes for the functionality
> associated with the overall message boundary in the other way.
> Specifically:
> * ReadyForQuery (Z) is sent in response to SYNC.
> * If no BEGIN has been issued, SYNC is the point at which an implicit
>   COMMIT is done.
> * After an error occurs in an extended-query command, the backend reads
>   and discards messages until it finds a SYNC, then issues ReadyForQuery
>   and resumes processing messages.  This allows the frontend to be
>   certain what has been processed and what hasn't.
>
> Comments?
>
>             regards, tom lane
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org



Re: [HACKERS] More thoughts about FE/BE protocol

From
Hannu Krosing
Date:
Tom Lane kirjutas N, 10.04.2003 kell 16:57:
> "Peter Galbavy" <peter.galbavy@knowtion.net> writes:
> > Is there any message - speaking from a standpoint of a normal user and not a
> > source hacker WRT postgresql - where knowing the length of the response is
> > either unknown or is expensive (in buffering) to find out ?
> 
> See my response to ljb --- I think that in practice people assemble each
> message before sending anyway. 

I just tested it by running "select *" on 68M records (6.5 GB data)
table and you seem to be wrong - while psql shows nothing, its size
starts rapidly growing (I ^C it at ~500M) , while backend stays at
stable 32M, which indicates that postgres starts to push data out as
fast as it can get it.

> If you don't do it that way, you've got
> a problem with recovering if you hit an error condition after sending a
> partial message.

If you hit an error condition after sending a partial message then I'm
in trouble anyway. Assembling the message beforehand just makes hitting
error less likely.

I would propose something like X11 protocol (from memory)

each request (query) has a serial number.

eash response is sent in resonable sized chunks, and each chunk has a
serial number of request, chunk number of response and an indicator for
EndOfResponse (or perhaps to for Completed or Aborted).

This would make resyncing easier as you can safely ignore responsed you
dont wan't to see anymore even if for some reasons some of them are
still in pipeline.

What could be useful (and is often requested) is sending the number of
records in response *if it is known without extra work*.

Also there should be a way to tell the backend not to send some types of
notices/warnings.

-----------------
Hannu



Re: [HACKERS] More thoughts about FE/BE protocol

From
Hannu Krosing
Date:
Hannu Krosing kirjutas N, 10.04.2003 kell 19:50:

> I would propose something like X11 protocol (from memory)
> 
> each request (query) has a serial number.
> 
> eash response is sent in resonable sized chunks, and each chunk has a
> serial number of request, chunk number of response and an indicator for
> EndOfResponse (or perhaps to for Completed or Aborted).

They should both have size as well, as Tom suggested.

> This would make resyncing easier as you can safely ignore responsed you
> dont wan't to see anymore even if for some reasons some of them are
> still in pipeline.
> 
> What could be useful (and is often requested) is sending the number of
> records in response *if it is known without extra work*.
> 
> Also there should be a way to tell the backend not to send some types of
> notices/warnings.

-----------------
Hannu



Re: [HACKERS] More thoughts about FE/BE protocol

From
Hannu Krosing
Date:
Tom Lane kirjutas N, 10.04.2003 kell 19:59:
> Hannu Krosing <hannu@tm.ee> writes:

> > Also there should be a way to tell the backend not to send some types of
> > notices/warnings.
> 
> We already have that, see client_min_message_level.

What I would like would be possibility to better support interactive
applications, namely updating progress bar by sending some extra
out-of-band info (expected rows, how much work has backend done, etc) as
notices.

At the same time I would like to not get some warnings. Thus just having
min-level is not good enough.

-----------------
Hannu



Re: [HACKERS] More thoughts about FE/BE protocol

From
Barry Lind
Date:
Tom,

This SYNC message reminds me to ask you one thing about the FE/BE 
protocol redesign.  One of the features of the jdbc spec is a "batch" 
interface.  When an application needs to do a lot of the same thing (i.e 
insert a thousand rows), the applicaiton tells the driver to insert a 
'batch' of 1000 rows instead of performing 1000 regular inserts. This 
allows the driver to optimize this operation as one network roundtrip 
instead of 1000 roundtrips.

The current jdbc driver code doesn't do any optimization here and 
stupidly still does 1000 roundtrips.  However when I or someone has the 
time this could be optimized in the current BE/FE protocol by 
concatenating all 1000 individual inserts statements into a single query 
to be issued to the backend.  How could this be accomplished with the 
new FE/BE protocol "extended query" facility?

If I understand the SYNC message, would it just be the case that the 
client would send the PARSE/BIND/DESCRIBE/EXECUTE messages for all 1000 
inserts before issuing a single SYNC?

thanks,
--Barry

Tom Lane wrote:
> I've been thinking some more about the FE/BE protocol redesign,
> specifically the desire to ensure that we can recover from error
> conditions without losing synchronization.  The fact that the existing
> protocol doesn't do very well at this shows up in several places:
>     * having to drop and restart the connection after a COPY error
>     * fastpath function calls also lose sync if there's an error
>     * libpq gets terribly confused if it runs out of memory for
>       a query result
> 
> I'm coming around to the idea that the cleanest solution is to require
> *all* protocol messages, in both directions, to have an initial length
> word.  That is, the general message format would look like
>     <message type>        1 byte
>     <payload length>    number of following bytes (4 bytes MSB-first)
>     ... message data as needed ...
> 
> The advantage of doing it this way is that the recipient can absorb the
> whole message before starting to parse the contents; then, errors detected
> while processing the message contents don't cause us to lose protocol
> synchronization.  Also, even if the message is so large as to run the
> recipient out of memory, it can still use the <payload length> to count
> off the number of bytes it has to drop before looking for another message.
> This would make it quite a bit easier for libpq to cope with
> out-of-memory, as an example.
> 
> These advantages seem enough to me to justify adding three or four bytes
> to the length of each message.  Does anyone have a problem with that?
> 
> 
> A related point is that I had been thinking of the new "extended query"
> facility (separate PARSE/BIND/DESCRIBE/EXECUTE steps) in terms of
> sending just one big message to the backend per interaction cycle, with
> the processing-step commands appearing as fields within that message.
> But putting a length word in front would effectively require the
> frontend to marshal the whole sequence before sending any of it.
> It seems better to send each of the processing-step commands as an
> independent message.  To do that, we need to introduce an additional
> processing step, call it SYNC, that substitutes for the functionality
> associated with the overall message boundary in the other way.
> Specifically:
> * ReadyForQuery (Z) is sent in response to SYNC.
> * If no BEGIN has been issued, SYNC is the point at which an implicit
>   COMMIT is done.
> * After an error occurs in an extended-query command, the backend reads
>   and discards messages until it finds a SYNC, then issues ReadyForQuery
>   and resumes processing messages.  This allows the frontend to be
>   certain what has been processed and what hasn't.
> 
> Comments?
> 
>             regards, tom lane
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 



Re: [HACKERS] More thoughts about FE/BE protocol

From
Bruce Momjian
Date:
Tom Lane wrote:
> Barry Lind <blind@xythos.com> writes:
> > This all sounds great.  I am really looking to seeing all of this 
> > implemented.  Are you still on schedule for getting this into 7.4?
> 
> I gotta get off my duff and get with it, but I still plan to do it.

I am wondering if the wondering if the start of 7.5 may not be a better
time for such a massive change, and to give interfaces time to catch up.

Also, is Patrick going to need your help for PITR in 7.4?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 



Re: [HACKERS] More thoughts about FE/BE protocol

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> I gotta get off my duff and get with it, but I still plan to do it.

> I am wondering if the wondering if the start of 7.5 may not be a better
> time for such a massive change, and to give interfaces time to catch up.

(a) the problems are "swapped in" now; if I don't continue with this
work I'll probably forget a lot of the issues.

(b) there are some things in 7.3 --- autocommit being a big one --- that we
can *not* put off changing, or we'll be stuck with them forevermore.

(c) the interfaces do not have to catch up in time for 7.4 release,
because we'll still support the old protocol.
        regards, tom lane



Re: [HACKERS] More thoughts about FE/BE protocol

From
Wei Weng
Date:
>
>
>(a) the problems are "swapped in" now; if I don't continue with this
>work I'll probably forget a lot of the issues.
>
>(b) there are some things in 7.3 --- autocommit being a big one --- that we
>can *not* put off changing, or we'll be stuck with them forevermore.
>
>(c) the interfaces do not have to catch up in time for 7.4 release,
>because we'll still support the old protocol.
>
>            regards, tom lane
>
>  
>
What is this "auto-commit" about? Any place where I can read about it?

Thanks

Wei