Thread: Windows locale cause server to send invalid data encoding to client

Windows locale cause server to send invalid data encoding to client

From
Antoine
Date:
I set up a postgresql server on Windows 10 and connected it using Rust, but the Rust client reports invalid UTF-8 data when the password is wrong. I use a french locale windows that contain some accents "éèê" etc.

Windows use WTF-16 but the Rust client asks UTF-8 to the serveur so we think it's a bug from the server to not send UTF-8 even if Windows doesn't use it for its locale translation.

Details can be found at this github issue #803. Here a recorded wireshark of the problem  output.pcapng.gz.

The version is: `psql (PostgreSQL) 13.3`

I never used a mailing list so I'm not used to it.

--
La Terre est le berceau de l'humanité mais qui voudrait passer sa vie dans un berceau.

Re: Windows locale cause server to send invalid data encoding to client

From
Julien Rouhaud
Date:
On Wed, Jul 14, 2021 at 01:49:24PM +0200, Antoine wrote:
> I set up a postgresql server on Windows 10 and connected it using Rust, but
> the Rust client reports invalid UTF-8 data when the password is wrong. I
> use a french locale windows that contain some accents "éèê" etc.
> 
> Windows use WTF-16 <https://simonsapin.github.io/wtf-8/> but the Rust
> client asks UTF-8 to the serveur so we think it's a bug from the server to
> not send UTF-8 even if Windows doesn't use it for its locale translation.

This is unfortunately working as designed.  The client encoding can't be set
during startup (and authentication is part of it), see
https://github.com/postgres/postgres/blob/master/src/backend/utils/mb/mbutils.c#L85-L88
for more details about it:

> /*
>  * During backend startup we can't set client encoding because we (a)
>  * can't look up the conversion functions, and (b) may not know the database
>  * encoding yet either.  So SetClientEncoding() just accepts anything and
>  * remembers it for InitializeClientEncoding() to apply later.
>  */

The driver should be prepared to receive non UTF-8 messages until
authentication succeeded.



Julien Rouhaud <rjuju123@gmail.com> writes:
> On Wed, Jul 14, 2021 at 01:49:24PM +0200, Antoine wrote:
>> I set up a postgresql server on Windows 10 and connected it using Rust, but
>> the Rust client reports invalid UTF-8 data when the password is wrong. I
>> use a french locale windows that contain some accents "éèê" etc.

> This is unfortunately working as designed.  The client encoding can't be set
> during startup (and authentication is part of it), see
> https://github.com/postgres/postgres/blob/master/src/backend/utils/mb/mbutils.c#L85-L88
> for more details about it:

It seems like the core problem is that the "authentication failed" error
text may be sent in an unexpected encoding.  I wonder if we should decline
to translate any error messages until we've established the requested
client encoding.  Sending the message in English isn't ideal either,
but it'd avoid this hazard.

            regards, tom lane



Julien Rouhaud <rjuju123@gmail.com> writes:
> On Wed, Jul 14, 2021 at 10:19:59AM -0400, Tom Lane wrote:
>> I wonder if we should decline
>> to translate any error messages until we've established the requested
>> client encoding.  Sending the message in English isn't ideal either,
>> but it'd avoid this hazard.

> I'm not sure which one is the worst.  One the bright side there aren't that
> many messages that can be sent until the client encoding can be set up for I'm
> +0.5 for this change.

Yeah, it's ugly either way.  I think though that the reason we don't hear
more complaints about this is that such messages are currently sent using
the language and encoding derived from the postmaster's environment.
In simple cases that'll be the same as the client's environment and
everything works.  So after thinking harder, I'm afraid that breaking that
scenario would make this idea a net loss.

            regards, tom lane