Thread: Last round (I think) of FE/BE protocol changes

Last round (I think) of FE/BE protocol changes

From
Tom Lane
Date:
Okay, based on the recent discussions, here are concrete proposals for
the last few adjustments to the 3.0 FE/BE protocol.

Let's use int16 (2-byte integers) as format selector codes; this seems a
reasonable compromise between bandwidth and flexibility.  As of 7.4 the
only supported values will be 0 = text and 1 = binary, but future versions
can add more codes.

In client-sent messages, format codes appear primarily in the Bind
message.  Bind needs to be able to specify two sets of formats: one for
the parameters it is supplying, and one for the query result columns if
any.  I propose representing each set as a count N followed by N format
codes.  If the count is zero, then all the columns have the default format
(which will always be 0 = text in 7.4, though we might later allow it to
be set to something else).  If the count is one, then the single format
code is applied to all columns.  Otherwise the count must match the number
of parameters or output columns.  (Note that this moves the output format
request from Execute to Bind, so that formats can't be changed from one
row to the next in a portal's result.  This allows more server-side
optimization of formatting routine setup.)

FunctionCall likewise needs to specify the format codes for the data it is
supplying and the result to be returned.

In server-sent messages, format codes will be added to RowDescription
messages, one per column.  (A RowDescription sent in response to statement
Describe will show the default zero format code for all columns.  A
RowDescription sent in response to portal Describe or simple Query will
show the actual format codes in use for the result.)  The CopyInResponse
and CopyOutResponse messages will be changed to include a column count and
per-column format codes.  (Currently, the per-column codes will all be the
same: all zero for plain COPY and all one for binary COPY.  But someday we
might extend COPY to do something different.)

We will move to a single uniform representation of data items at the
protocol level: an int4 byte count (not including self) followed by that
many data bytes.  NULL is represented by byte count -1 (and no data bytes,
of course).  The interpretation of the data bytes depends on the format
code.  This will be used in DataRow output, Bind parameters, FunctionCall,
and FunctionResultResponse messages (the separate representation of
FunctionVoidResponse goes away).  This will also become the data
representation in COPY BINARY files.  I will change the header signature
for COPY BINARY so that the files can't be mistaken for old-style
server-internal-representation binary files.

The BinaryRow message type goes away; DataRow will serve for all format
codes.  The content of DataRow will be a field count N followed by N
fields in the above representation.  Note that the null bitmap goes away.
This representation is a little bulkier than the old one for rows
containing many NULLs, but the same or smaller when there are no NULLs.
It has a major advantage over the old representation in that the field
contents can be extracted without any external knowledge --- in the old
layout, if you didn't know the number of fields in advance, you were
completely lost.  libpq, for example, cannot support receiving Execute
results without a preceding Describe result unless it can parse DataRow
without knowing the number of columns in advance.

Any objections?
        regards, tom lane



Re: Last round (I think) of FE/BE protocol changes

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I think the "future versions" in this are going to be making this choice a
> datatype-specific session state.  How can we make this transition
> smoother?  Maybe 0 can be default, 1 text, 2 binary?

Why would a variable default be a good idea?  The client *always* wants
to know what format the data is being returned in; I can't imagine
wanting a default that might be unknown to (any layer of) client
software.

One of the things that I think I have learned from this redesign is
that hidden state variables that affect the low-level protocol are
a bad idea.  If we were to provide a changeable default format, I'd
want it to be reported by ParameterStatus messages.  But I don't really
see the argument for providing it.  You'd just have to clutter the
client-side stack with mechanisms for finding out what the default is.
That's about the same amount of grunge in the API as labeling data with
the format code in the first place ... but it's a lot easier to shoot
yourself in the foot by forgetting to handle it.
        regards, tom lane



Re: Last round (I think) of FE/BE protocol changes

From
Peter Eisentraut
Date:
Tom Lane writes:

> Let's use int16 (2-byte integers) as format selector codes; this seems a
> reasonable compromise between bandwidth and flexibility.  As of 7.4 the
> only supported values will be 0 = text and 1 = binary, but future versions
> can add more codes.

I think the "future versions" in this are going to be making this choice a
datatype-specific session state.  How can we make this transition
smoother?  Maybe 0 can be default, 1 text, 2 binary?  Or maybe we should
implement a simple version of the session-state system right now with only
two predefined transform groups for each type?  Then we don't need to send
a format to the server at all, and the messages to the client would
contain an OID field in the RowDescription message.

-- 
Peter Eisentraut   peter_e@gmx.net