Re: More thoughts about FE/BE protocol - Mailing list pgsql-interfaces
From | Tom Lane |
---|---|
Subject | Re: More thoughts about FE/BE protocol |
Date | |
Msg-id | 21689.1049940929@sss.pgh.pa.us Whole thread Raw |
In response to | Re: More thoughts about FE/BE protocol (ljb <lbayuk@mindspring.com>) |
List | pgsql-interfaces |
ljb <lbayuk@mindspring.com> writes: > In the case of fastpath function calls, the length of each parameter is > there now, so the backend could already read all the data before doing > any error checking. The comment in tcop/fastpath.c:HandleFunctionRequest() > says this, but then it loses me when it goes on to say this is impossible > because the message can't be read until doing type lookups. Yeah, I was just looking at that. The comment is not quite accurate; the message could be parsed as-is, but it could not be converted into the internal form that is needed (since you need to know if the type is pass-by-ref or not). So one could imagine an implementation that reads the message but just holds it in an internal buffer till it's all read, then goes back and processes the info a second time to detect errors and make the conversion. Then, if you report an error, you don't have a partially-read message still sitting in the input stream. What I'm suggesting is that we could implement that logic in a more straightforward fashion if the "read into a buffer" part is driven by an initial byte count and doesn't have to duplicate the knowledge of the specific layout of the message type. If it were only fastpath involved, I'd say let's just rewrite it and be happy --- but there is exactly this same problem appearing in COPY error recovery, libpq memory-overrun recovery, etc. In all these place it looks like "buffer the message first, parse it later" is the way to go. Rather than having two sets of logic that understand the detailed format of each message type, we should just adjust the protocol to make this painless. Another objection, with perhaps more force, is that this requires the sender to marshal the whole message before sending (or at least be able to precompute its length, but in practice people will probably just assemble the whole message in a buffer). But it turns out that the backend already does that anyway, precisely so that it can be sure it never sends a partial message. And frontends that don't do it that way probably should, for the same reason --- if you fail partway through sending the message, you've got a problem. > In the case of COPY, what would your overall length word apply to, since > the copy data is stream oriented, not message oriented? I've been debating that. We could say that the sender is allowed to chop the COPY datastream into arbitrary-length messages, or we could require the message boundaries to be semantically meaningful (say, one message per data row). I feel the latter is probably cleaner in the long run, but it'd take more adjustment of existing code to do it that way. Any thoughts? > I don't understand > backend error handling, but if the "copy" function loses control when an > error occurs (for example, bad data type for a column), I don't see how > knowing the overall message or data length helps in this case. The point is to not raise an error when there is a partial message remaining in the input stream. If there are whole messages remaining, it's easy to design the main loop to discard COPY-data messages until it sees something it likes (probably, a SYNC message denoting the end of the series of COPY-data messages). If there's a partial message remaining then you are out of sync and there's no good way for the main loop to recover. regards, tom lane
pgsql-interfaces by date: