Tom Lane writes:
> > OCTET_LENGTH returns the size of its argument, not the size of some
> > possible future shape of that argument.
>
> That would serve equally well as an argument for returning the
> compressed length of the string, I think. You'll need to do better.
TOAST is not part of the conceptual computational model. The fact that
the compressed representation is available to functions at all is somewhat
peculiar (although I'm not questioning it). I've already attempted to
show that returning the size of the compressed representation doesn't fit
the letter of the standard.
> My take on it is that when a particular client encoding is specified,
> Postgres does its best to provide the illusion that your data actually
> is stored in that encoding. If we don't make OCTET_LENGTH agree, then
> we're breaking the illusion.
The way I've seen it we consider the encoding conversion to happen "on the
wire" while both the server and the client run in their own encoding. In
that model it's appropriate that computations in the server use the
encoding in the server.
However, if the model is that it should appear to clients that the entire
setup magically runs in "their" encoding then the other behaviour would be
better. In that case the database encoding is really only an optimization
hint because the actual encoding in the server is of no matter. This
model would certainly be attractive as well, but there could be a few
problems. For instance, I don't know if the convert() function would make
sense then. (Does it even make sense now?)
Also, we do need to consider carefully how to interface this "illusion" to
operations contained strictly within the server (e.g., CREATE TABLE AS,
column defaults) and to procedural languages that may or may not come with
encoding ideas of their own.
--
Peter Eisentraut peter_e@gmx.net