Thread: v3 protocol & string encoding

v3 protocol & string encoding

From
Oliver Jowett
Date:
Couple of quick protocol questions:

1) What encoding is used for strings sent and received during the 
startup phase? I can set client_encoding to a known value as a parameter 
in the startup packet, but the protocol spec doesn't appear to say how 
the startup packet itself and the various strings sent/received during 
startup (e.g. authentication, error messages) are encoded.

2) At what point in the stream does a client_encoding change take effect 
-- immediately after the corresponding ParameterStatus message, or at 
some other point?

-O


Re: v3 protocol & string encoding

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> 1) What encoding is used for strings sent and received during the 
> startup phase?

The startup packet itself will not get any encoding conversion AFAIR,
so one way to look at it is that the data therein must be in server
encoding.  In practice, there are no strings therein that really need
conversion anyway.  (If you use characters outside 7-bit-ASCII for user
or database names, you're going to have much worse problems than just
this one.)

Any client_encoding received from the client is not going to be applied
until after the authentication exchange is complete, so the rest of that
is going to be in server encoding as well.  The only part of this that
seems like it might be an issue is a failure ERROR message would be in
server encoding, but the client wouldn't have any good way to know what
that is ...

> 2) At what point in the stream does a client_encoding change take effect 
> -- immediately after the corresponding ParameterStatus message, or at 
> some other point?

ParameterStatus is sent when the change is made.
        regards, tom lane


Re: v3 protocol & string encoding

From
Oliver Jowett
Date:
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
> 
>>1) What encoding is used for strings sent and received during the 
>>startup phase?
> 
> 
> The startup packet itself will not get any encoding conversion AFAIR,
> so one way to look at it is that the data therein must be in server
> encoding.  In practice, there are no strings therein that really need
> conversion anyway.  (If you use characters outside 7-bit-ASCII for user
> or database names, you're going to have much worse problems than just
> this one.)

The encoding of user & database names was my main concern. If they can 
only be 7-bit ASCII in practice, that's easy..

>>2) At what point in the stream does a client_encoding change take effect 
>>-- immediately after the corresponding ParameterStatus message, or at 
>>some other point?
> 
> 
> ParameterStatus is sent when the change is made.

Are the strings in the ParameterStatus encoded with the old or new 
client_encoding? I need to know the point in the stream to switch 
encodings. I suppose this is only an issue if there are pairs of 
encodings where "client_encoding" or the encoding names encode 
differently in the two encodings. Is it safe to assume that 7-bit ASCII 
is always encoded unchanged regardless of the encoding in use?

-O


Re: v3 protocol & string encoding

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> The encoding of user & database names was my main concern. If they can 
> only be 7-bit ASCII in practice, that's easy..

Well, you can *try* using other encodings, but there are enough known
problems that I don't think it will work pleasantly unless client and
server encodings are the same all the time.

>>> 2) At what point in the stream does a client_encoding change take effect 
>>> -- immediately after the corresponding ParameterStatus message, or at 
>>> some other point?
>> 
>> ParameterStatus is sent when the change is made.

> Are the strings in the ParameterStatus encoded with the old or new 
> client_encoding?

Okay, make that "sent just after the change is made".  So it looks like
you should receive a string in the new encoding.  I can't offhand think
of a way to test this though --- are any of the reported settings
interesting from an encoding standpoint?

> Is it safe to assume that 7-bit ASCII 
> is always encoded unchanged regardless of the encoding in use?

Hm.  This is true for all the "backend-safe" encodings but I believe
not for all the supported client encodings.  Tatsuo might have more of
a clue than me about likely failure cases.
        regards, tom lane


Re: v3 protocol & string encoding

From
Oliver Jowett
Date:
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:>
>>>>2) At what point in the stream does a client_encoding change take effect 
>>>>-- immediately after the corresponding ParameterStatus message, or at 
>>>>some other point?
>>>
>>>ParameterStatus is sent when the change is made.
> 
>>Are the strings in the ParameterStatus encoded with the old or new 
>>client_encoding?
> 
> Okay, make that "sent just after the change is made".  So it looks like
> you should receive a string in the new encoding.  I can't offhand think
> of a way to test this though --- are any of the reported settings
> interesting from an encoding standpoint?

This timing makes it harder for a client to recognize a change in 
client_encoding -- how is it supposed to know to change encoding before 
interpreting the ParameterStatus message?

I'd like to add some robustness to the JDBC driver such that if the user 
changes client_encoding, the driver throws an error rather than garbling 
data (it is expecting client_encoding = 'UNICODE'). If the user can set 
client_encoding such that the driver won't recognize the ParameterStatus 
message (i.e. the string "client_encoding" does not encode as it would 
in UNICODE), it's not so useful. I don't know if there is such an 
encoding, however.

>>Is it safe to assume that 7-bit ASCII 
>>is always encoded unchanged regardless of the encoding in use?
> 
> 
> Hm.  This is true for all the "backend-safe" encodings but I believe
> not for all the supported client encodings.  Tatsuo might have more of
> a clue than me about likely failure cases.

By "backend-safe" do you mean "can be used as a database encoding"?

If so, it solves my problem, which is handling the switchover from 
default client_encoding (== database encoding) to UNICODE in the JDBC 
driver's connection setup code. I can initially use 7-bit ASCII 
regardless of the actual database encoding, and switch to UNICODE when 
possible (this is what the current driver does in most cases, I'm just 
verifying that the assumptions it makes are correct).

-O