Thread: FE/BE Protocol 3.0 Draft - Some Comments

FE/BE Protocol 3.0 Draft - Some Comments

From
ljb
Date:
I looked at the new draft protocol document, and it looks pretty good. I
have no real problems with anything, but just some comments. (Wouldn't want
you to think nobody is reading your stuff.)

1) I think the ErrorResponse message is way too complicated. I realize the
previous format (just an error string) wasn't good enough, and you need
error codes and such. But the current v3 message spec calls for parsing out
an indefinite number of up to 10 different sub-types (3 required, 7
optional), and I don't see the need for this complexity. What if something
goes wrong while parsing an error message?  Would you consider something
simpler, like:  Byte1('E'), Int32 length  - The standard message header  Byte1 Severity: one of E F P W N D I L for
ErrorFatal Panic...  Byte5 Code: the SQLSTATE  String Message: Human-readable message, suitable for showing the user.
 
To me, the other fields are not useful to most of us (file, linenumber,
call-stack traceback), overkill (detail), or silly (hint).     > SELECT * from MYTABEL;     ERROR:  Relation "mytabel"
doesnot exist     Hint: Learn to type.
 

2) Please don't deprecate FunctionCall, nor say that a prepared "SELECT
function($1)" can replace it. What about sending binary data to the backend
(without having to escape it)? Is there some other way than FunctionCall?
How would the large object interface work without FunctionCall?

3) There is still no way to pass NULL as a FunctionCall argument; now is
your chance to fix it if you want to. Not that I've seen a need for it.

4) Is there a good reason to continue having DataRow's field size value
include itself, but BinaryRow's field size value not include itself? It
bothers me that these two messages have identical syntax except that the
size value is +4 in DataRow. For the record, I prefer to have lengths not
include themselves. This applies to the new overall message length.
See: type = read_byte(); length = read_int32(); message = read(length-4);
That -4 seems unnecessary.

5) Backward compatibility: Realizing that the primary focus will be
V2 clients talking to V2/V3 servers ("upgrade your servers before your
clients"), it would still be nice if V2/V3 clients could talk to a V2 server.
I see a problem here. The V2/V3 client writes a V3 startup packet to the
server, and the V2 server responds with an error "FATAL:  unsupported
frontend protocol" and closes the connection. OK, the client can then
retry connecting with V2 protocol. The problem is that the error message
from the V2 server is in V2 format: byte1('E'), String Message. The
V2/V3 client will be looking for a V3 error message: byte1('E'), int32
length, ... If the client separates "reading the message" from "parsing
the message", it hasn't got a chance. This would have to be special-cased,
like relying on length = 0x46415441 (this is 'FATA'). Is there a better way?



Re: FE/BE Protocol 3.0 Draft - Some Comments

From
Tom Lane
Date:
ljb <lbayuk@mindspring.com> writes:
> 1) I think the ErrorResponse message is way too complicated.

Agreed, it's much richer than it was before, but I disagree with your
opinion that empty complexity is being added.  There are good examples
in the existing error message repertoire for each of the concepts I
propose adding.  For instance, here are two examples of existing
messages that are offering hints:
   elog(ERROR, "Unable to identify an operator '%s' for types '%s' and '%s'"        "\n\tYou will have to retype this
queryusing an explicit cast",        NameListToString(op),        format_type_be(arg1), format_type_be(arg2));
 

   fprintf(stderr, "IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %s\n",         (int) memKey, size, (IPC_CREAT
|IPC_EXCL | IPCProtection),           strerror(errno));
 
   if (errno == EINVAL)       fprintf(stderr,               "\nThis error usually means that PostgreSQL's request for a
sharedmemory\n"               "segment exceeded your kernel's SHMMAX parameter.  You can either\n"
"reducethe request size or reconfigure the kernel with larger SHMMAX.\n"               "To reduce the request size
(currently%u bytes), reduce\n"               "PostgreSQL's shared_buffers parameter (currently %d) and/or\n"
  "its max_connections parameter (currently %d).\n"               "\n"               "If the request size is already
small,it's possible that it is less than\n"               "your kernel's SHMMIN parameter, in which case raising the
requestsize or\n"               "reconfiguring SHMMIN is called for.\n"               "\n"               "The
PostgreSQLdocumentation contains more information about shared\n"               "memory configuration.\n\n",
  size, NBuffers, MaxBackends);
 

Continuing to classify all that verbiage as primary error message isn't
going to make anyone's life simpler.  On the other hand, if we split it
up into message/detail/hint as I was proposing, I think life *will* get
simpler for GUI clients (pgAdmin and suchlike) which won't have to
be prepared to cope with many-line primary error messages.  Admittedly,
it will take them some work to make the extra info accessible via popup
windows or whatever --- but I think they'll be glad to do it.  And if
they don't do it, well, not that much is being lost if the extra
verbiage doesn't get displayed.

Please note that I'm *not* proposing that all or even most error
messages should carry detail or hint fields.  I'm just planning to split
existing messages into those parts where it seems appropriate, which is
probably only a few dozen places.  The hints involved aren't silly, for
the most part anyway --- they got there through bitter experience.

> But the current v3 message spec calls for parsing out
> an indefinite number of up to 10 different sub-types (3 required, 7
> optional), and I don't see the need for this complexity.

You're welcome to ignore fields you don't care about.  I do see a spec
deficiency here, which is that I intended at most one of any particular
field type per message, whereas you're evidently worrying what to do
with multiple instances.  Will improve the document.

> To me, the other fields are not useful to most of us (file, linenumber,
> call-stack traceback), overkill (detail), or silly (hint).

Call-stack tracebacks are already there, in a limited form, and are
*very* useful to the people who get them:

regression=# create function foo(float8) returns float8 as '
regression'# begin
regression'#   return 1.0 / $1;
regression'# end' language plpgsql;
CREATE FUNCTION

regression=# select foo(0);
WARNING:  Error occurred while executing PL/pgSQL function foo
WARNING:  line 2 at return
ERROR:  division by zero
regression=#

Again, I'm not proposing adding new functionality, just relabeling
information that is already being issued so that receiving clients
can actually understand what it is and do something more reasonable
than they do now.  Those "warning" lines are not obviously related
to the ERROR as far as existing client code can make out, and too
many clients just drop them on the floor.

As for file and linenumber, agreed those are of use to hackers only
--- but those of us who have been puzzling over trouble reports for
any length of time know how badly they are needed when they're needed.
I would rather put them into a separate field of the error report,
in which client software has the option to ignore them if it's not
showing error details, than put them right in the main error text.
(That was the only alternative we had up to now, thus it never got
done because the clutter factor was too high.)

Now we could go down a different path wherein we try to deal these
things on the server side --- for example, have some GUC variables
that control whether hints and location and such get embedded into
a single error string that's transmitted to the client.  But it seems
to me that that's catering to dumb clients rather than giving the
clients the tools they need to be smart.  I'd rather design a protocol
that lets user-interface decisions be made on the client side.

> 2) Please don't deprecate FunctionCall, nor say that a prepared "SELECT
> function($1)" can replace it. What about sending binary data to the backend
> (without having to escape it)? Is there some other way than FunctionCall?

The Bind message can do it.

> How would the large object interface work without FunctionCall?

You could build it out of spare parts using a set of prepared
statements, one for each underlying LO function.  Execution would send
Bind and Execute in place of the FunctionCall message.  I'm not currently
planning to actually rewrite libpq's LO code in this round, but in the
long run it would make sense to do so.

> 3) There is still no way to pass NULL as a FunctionCall argument; now is
> your chance to fix it if you want to. Not that I've seen a need for it.

I was originally planning to do that, but since I now see FunctionCall
as vestigial, there's not much point in adding more incompatibility to
it than we absolutely must.

> 4) Is there a good reason to continue having DataRow's field size value
> include itself, but BinaryRow's field size value not include itself?

No, this bothers me too.  I would like to rationalize the representation
of binary data more, but exactly what to do is still an open question.
We have some conflicting precedents (FunctionCall vs. COPY BINARY file
formats), and the wild card in the whole thing is the possibility of
interposing format-translation routines (the old send/receive functions
again) to achieve some semblance of platform independence.

> For the record, I prefer to have lengths not
> include themselves. This applies to the new overall message length.

I would have done it that way, but that would require making
StartupMessage a special case (or more special than it is already),
since the precedent is already established that its length word includes
itself.  We can't change that if we want to continue accepting
connections from old clients.

Also, if we don't introduce send/receive functions then I think we
really need to migrate BinaryRow towards length-includes-itself, because
that's what the underlying backend convention for varlena is.  The
existing behavior of the protocol and libpq is actively broken on
machines with MAXALIGN > 4 because the representation exposed by libpq
doesn't align the contents of a varlena field correctly.  (But adding
send/receive functions would allow us to avoid that issue, since the
on-the-wire representation wouldn't have to be the same as the backend's
anymore.)

> 5) Backward compatibility: Realizing that the primary focus will be
> V2 clients talking to V2/V3 servers ("upgrade your servers before your
> clients"), it would still be nice if V2/V3 clients could talk to a V2 server.

Yeah, I know, but I'm not personally willing to expend the effort needed
to make a dual-protocol V2/V3 libpq.  If someone else wants to look at
that after the dust settles, they're surely welcome to.  (We did get
away without a backwards-compatible libpq in the last protocol revision,
back at release 6.4, and there were relatively few complaints.  So I'm
assuming we can do it again.)

> I see a problem here. The V2/V3 client writes a V3 startup packet to the
> server, and the V2 server responds with an error "FATAL:  unsupported
> frontend protocol" and closes the connection. OK, the client can then
> retry connecting with V2 protocol. The problem is that the error message
> from the V2 server is in V2 format: byte1('E'), String Message.

Yeah, I've been wondering about that myself.  One fairly straightforward
hack that could be used is to have a small limit (a few hundred) on the
expected response-message length to the startup packet.  Whatever your
endianness, the V2 packet would look to have a huge length, and that
could trigger fallback code that parses it as V2 format.  Or just punt
and assume the message must be complaining about protocol mismatch.
If you've got a better idea I'm all ears ...
        regards, tom lane



Re: FE/BE Protocol 3.0 Draft - Some Comments

From
Peter Eisentraut
Date:
Tom Lane writes:

> Agreed, it's much richer than it was before, but I disagree with your
> opinion that empty complexity is being added.  There are good examples
> in the existing error message repertoire for each of the concepts I
> propose adding.  For instance, here are two examples of existing
> messages that are offering hints:

Message + hint is OK, but can you give an example of a "detail"?
(Better more than one.)

-- 
Peter Eisentraut   peter_e@gmx.net



Re: FE/BE Protocol 3.0 Draft - Some Comments

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Message + hint is OK, but can you give an example of a "detail"?

Basically I see the 'detail' field as a safety valve for cases where you
otherwise have to compromise between having a reasonably short primary
error message and including all the info that might be useful.

For example, xlog.c's
elog(emode, "ReadRecord: out-of-sequence SUI %u (after %u) in log file %u, segment %u, offset %u",

might be better rendered
ereport(emode,    errmsg("out-of-sequence SUI in xlog page header"),    errdetail("SUI %u appeared after %u in log file
%u,segment %u, offset %u", ...));
 

The same file also has a bunch of messages along the lines of
       elog(PANIC,            "The database cluster was initialized with NAMEDATALEN %d,\n"            "\tbut the
backendwas compiled with NAMEDATALEN %d.\n"            "\tIt looks like you need to recompile or initdb.",
ControlFile->nameDataLen,NAMEDATALEN);
 

While the advice to re-initdb is clearly a hint, I'm not as sure what to
do with the info about exactly what was found in pg_control.  It might
be best to reword all of these to use a common primary error message,
viz
ereport(PANIC,    errmsg("Database cluster is not compatible with server compilation options"),    errdetail("Cluster
hasNAMEDATALEN %d, server was compiled with NAMEDATALEN %d", ...),    errhint("You need to recompile or initdb"));
 

A point worth making is that with this approach, the translator has
the option to translate just the one shared primary error message,
and not bother with the half-dozen slightly different detail messages,
and still have done a creditable job.  I don't know how much info the
gettext tools can provide about the context in which they found each
string extracted from the source, but it strikes me that it'd be real
handy to label strings as primary message, detail, or hint, so that
translators would know which ones to put priority on translating.

Here's another example that cries out to be broken down into primary
message and detail:
               elog(ERROR, "COPY command, running in backend with "                    "effective uid %d, could not
openfile '%s' for "                    "reading.  Errno = %s (%d).",                    (int) geteuid(), filename,
strerror(errno),errno);
 

There's no hint there --- it's all factual --- but it seems a tad on
the verbose side.

Same here:
               elog(ERROR, "Can't add class \"%s\" as default for type %s"                    "\n\tclass \"%s\" already
isthe default",                    opcname,                    TypeNameToString(stmt->datatype),
NameStr(opclass->opcname));

The part after \n\t isn't a hint, so it's detail; or we try to claim
that it's part of the primary error message, which is awkward; or we
drop the information, which is bad.

Or here:
           elog(ERROR, "CREATE SCHEMA: permission denied"                "\n\t\"%s\" is not a superuser, so cannot
createa schema for \"%s\"",                owner_name, authId);
 

Or here:
           elog(FATAL, "pg_nofile: insufficient file descriptors available to start backend.\n"
"\tSystemallows %ld, we need at least %d.",                no_files, RESERVE_FOR_LD + FD_MINFREE);
 

I think that without a provision for a detail field, the notion that
primary error messages should fit on one line is going to be widely
ignored.  We don't actually have to have it, but it will encourage good
error-message style.  IMHO anyway.
        regards, tom lane