Thread: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15772
Logged by:          Eugene Podshivalov
Email address:      yaugenka@gmail.com
PostgreSQL version: 11.2
Operating system:   Windows 10
Description:

My postgresql.conf has the following locale settings
----
#client_encoding = sql_ascii        # actually, defaults to database encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Russian_Russia.1251'            # locale for system error message
strings
lc_monetary = 'Russian_Russia.1251'            # locale for monetary formatting
lc_numeric = 'Russian_Russia.1251'            # locale for number formatting
lc_time = 'Russian_Russia.1251'                # locale for time formatting
----
Server encoding is "UTF8".
Messages in the log file are usually in UTF8, but some messages are logged
in ANSI encoding.
Here are some example cases (in the Russian language)  when ANSI is used
instead of UTF8
--
СООБЩЕНИЕ:  контрольные точки происходят слишком часто (через 19 сек.)
ПОДСКАЗКА:  Возможно, стоит увеличить параметр "max_wal_size".
--
СООБЩЕНИЕ:  получен запрос на быстрое выключение
СООБЩЕНИЕ:  прерывание всех активных транзакций
--
СООБЩЕНИЕ:  система БД была выключена:
СООБЩЕНИЕ:  система БД готова принимать подключения


On Thu, Apr 18, 2019 at 01:53:18PM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      15772
> Logged by:          Eugene Podshivalov
> Email address:      yaugenka@gmail.com
> PostgreSQL version: 11.2
> Operating system:   Windows 10
> Description:        
> 
> My postgresql.conf has the following locale settings
> ----
> #client_encoding = sql_ascii        # actually, defaults to database encoding
> 
> # These settings are initialized by initdb, but they can be changed.
> lc_messages = 'Russian_Russia.1251'            # locale for system error message
> strings
> lc_monetary = 'Russian_Russia.1251'            # locale for monetary formatting
> lc_numeric = 'Russian_Russia.1251'            # locale for number formatting
> lc_time = 'Russian_Russia.1251'                # locale for time formatting
> ----
> Server encoding is "UTF8".
> Messages in the log file are usually in UTF8, but some messages are logged
> in ANSI encoding.
> Here are some example cases (in the Russian language)  when ANSI is used
> instead of UTF8
> --
> СООБЩЕНИЕ:  контрольные точки происходят слишком часто (через 19 сек.)
> ПОДСКАЗКА:  Возможно, стоит увеличить параметр "max_wal_size".
> --
> СООБЩЕНИЕ:  получен запрос на быстрое выключение
> СООБЩЕНИЕ:  прерывание всех активных транзакций
> --
> СООБЩЕНИЕ:  система БД была выключена:
> СООБЩЕНИЕ:  система БД готова принимать подключения

I am kind of confused since all the messages look like Russian to me,
except for the mention of "max_wal_size".  When you say ANSI, do you
mean ISO-8859-5 - Cyrillic, or ASCII?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Bruce,
Here is a screenshot of how looks like when I open the log file in notepad++ and switch encoding from UTF8 to ANSI.
image.png

Regards,
Eugene

чт, 18 апр. 2019 г. в 17:31, Bruce Momjian <bruce@momjian.us>:
On Thu, Apr 18, 2019 at 01:53:18PM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      15772
> Logged by:          Eugene Podshivalov
> Email address:      yaugenka@gmail.com
> PostgreSQL version: 11.2
> Operating system:   Windows 10
> Description:       
>
> My postgresql.conf has the following locale settings
> ----
> #client_encoding = sql_ascii          # actually, defaults to database encoding
>
> # These settings are initialized by initdb, but they can be changed.
> lc_messages = 'Russian_Russia.1251'                   # locale for system error message
> strings
> lc_monetary = 'Russian_Russia.1251'                   # locale for monetary formatting
> lc_numeric = 'Russian_Russia.1251'                    # locale for number formatting
> lc_time = 'Russian_Russia.1251'                               # locale for time formatting
> ----
> Server encoding is "UTF8".
> Messages in the log file are usually in UTF8, but some messages are logged
> in ANSI encoding.
> Here are some example cases (in the Russian language)  when ANSI is used
> instead of UTF8
> --
> СООБЩЕНИЕ:  контрольные точки происходят слишком часто (через 19 сек.)
> ПОДСКАЗКА:  Возможно, стоит увеличить параметр "max_wal_size".
> --
> СООБЩЕНИЕ:  получен запрос на быстрое выключение
> СООБЩЕНИЕ:  прерывание всех активных транзакций
> --
> СООБЩЕНИЕ:  система БД была выключена:
> СООБЩЕНИЕ:  система БД готова принимать подключения

I am kind of confused since all the messages look like Russian to me,
except for the mention of "max_wal_size".  When you say ANSI, do you
mean ISO-8859-5 - Cyrillic, or ASCII?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
Attachment
On Thu, Apr 18, 2019 at 05:40:59PM +0300, Eugene Podshivalov wrote:
> Bruce,
> Here is a screenshot of how looks like when I open the log file in notepad++
> and switch encoding from UTF8 to ANSI.
> image.png

Uh, I see what you mean.  Can you give us a message that is OK and one
that is messed up, but the English versions of those?   I still don't
know what ANSI is?  What does the output look like in UTF8 mode?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



On 2019-Apr-18, Eugene Podshivalov wrote:

> Bruce,
> Here is a screenshot of how looks like when I open the log file in
> notepad++ and switch encoding from UTF8 to ANSI.
> [image: image.png]

I suppose you have databases with the single-byte encoding amidst your
UTF8 ones.  AFAIK the log file registers the log entries in the same
encoding that the database uses.  Different databases can use different
encodings.

That's pretty broken, but it's how it is.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I suppose you have databases with the single-byte encoding amidst your
> UTF8 ones.  AFAIK the log file registers the log entries in the same
> encoding that the database uses.  Different databases can use different
> encodings.

> That's pretty broken, but it's how it is.

Yeah, and it's not easy to improve on.  If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)

Also, what do you do if you get an encoding conversion failure?

That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.

A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.

            regards, tom lane



I guess that the issue is related to this setting in the postgresql.conf file:
lc_messages = 'Russian_Russia.1251'                   # locale for system error message

I tried chaning it to 'en_US.UTF-8' and all new message in the log file are in English and look good regardless of whether I view it in UTF-8 or ANSI encoding.

I don't know what ANSI stands for either but it goes first in the list of encodings in notepad++ Encodings menu.
I guess it refers to Windows-1251 in my case.

The English variant of the messed up message in the UTF8 section of the screenshot above is
LOG:  database system was shut down at ...
LOG:  database system is ready to accept connections

All my databases have encoding=UTF8, collate=Russian_Russia.1251, ctype=Russian_Russia.1251

Regards,
Eugene

чт, 18 апр. 2019 г. в 19:20, Tom Lane <tgl@sss.pgh.pa.us>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I suppose you have databases with the single-byte encoding amidst your
> UTF8 ones.  AFAIK the log file registers the log entries in the same
> encoding that the database uses.  Different databases can use different
> encodings.

> That's pretty broken, but it's how it is.

Yeah, and it's not easy to improve on.  If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)

Also, what do you do if you get an encoding conversion failure?

That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.

A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.

                        regards, tom lane
Could it be the issue of not all messages taking lc_messages setting into account?
i.e. in my case all messeges should be in ANSI (Wndows-1251) instead of UTF-8.

Regards,
Eugene

чт, 18 апр. 2019 г. в 19:26, Eugene Podshivalov <yaugenka@gmail.com>:
I guess that the issue is related to this setting in the postgresql.conf file:
lc_messages = 'Russian_Russia.1251'                   # locale for system error message

I tried chaning it to 'en_US.UTF-8' and all new message in the log file are in English and look good regardless of whether I view it in UTF-8 or ANSI encoding.

I don't know what ANSI stands for either but it goes first in the list of encodings in notepad++ Encodings menu.
I guess it refers to Windows-1251 in my case.

The English variant of the messed up message in the UTF8 section of the screenshot above is
LOG:  database system was shut down at ...
LOG:  database system is ready to accept connections

All my databases have encoding=UTF8, collate=Russian_Russia.1251, ctype=Russian_Russia.1251

Regards,
Eugene

чт, 18 апр. 2019 г. в 19:20, Tom Lane <tgl@sss.pgh.pa.us>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I suppose you have databases with the single-byte encoding amidst your
> UTF8 ones.  AFAIK the log file registers the log entries in the same
> encoding that the database uses.  Different databases can use different
> encodings.

> That's pretty broken, but it's how it is.

Yeah, and it's not easy to improve on.  If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)

Also, what do you do if you get an encoding conversion failure?

That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.

A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.

                        regards, tom lane