Thread: Patch for an encoding bug in the derive_locale_encoding function

Patch for an encoding bug in the derive_locale_encoding function

From

Mario De Frutos

Date:

09 February 2018, 20:02:33

Hello!

I've found a bug while I was working with the driver. It seems that
when the drive gets the encoding from the local environment it takes
everything, for example:


LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8

then it clears until the first dot and uses the rest as encoding:


UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8

and this gets an error in the following code because is not a right
encoding string:

https://github.com/postgres/postgres/blob/master/src/backend/utils/mb/encnames.c#L570

There are two problems there:

1. First, you get the error because of the encoding
2. It hangs the connection because Postgres uses ereport instead of
returning -1 so it gets stuck

At first, I thought it was an error in the ifdef clause of the
postgres function but it seems correct although I don't know how to
catch that kind of errors to avoid this kind of behavior in cases like
this

In this mail, I've attached a patch to solve the bug. Hope it helps :)

Attachment

multibyte_encoding_fix.patch

Re: Patch for an encoding bug in the derive_locale_encoding function

From

"Inoue, Hiroshi"

Date:

14 February 2018, 03:25:22

Hi Mario,

On 2018/02/10 2:02, Mario De Frutos wrote:
> Hello!
>
> I've found a bug while I was working with the driver. It seems that
> when the drive gets the encoding from the local environment it takes
> everything, for example:
>
>
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8
>
> then it clears until the first dot and uses the rest as encoding:
>
>
UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8
>
> and this gets an error in the following code because is not a right
> encoding string:
>
> https://github.com/postgres/postgres/blob/master/src/backend/utils/mb/encnames.c#L570
>
> There are two problems there:
>
> 1. First, you get the error because of the encoding
> 2. It hangs the connection because Postgres uses ereport instead of
> returning -1 so it gets stuck
>
> At first, I thought it was an error in the ifdef clause of the
> postgres function but it seems correct although I don't know how to
> catch that kind of errors to avoid this kind of behavior in cases like
> this
>
> In this mail, I've attached a patch to solve the bug. Hope it helps :)

I would take care of the patch.

Thanks.
Hiroshi Inoue