Thread: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8
BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15772 Logged by: Eugene Podshivalov Email address: yaugenka@gmail.com PostgreSQL version: 11.2 Operating system: Windows 10 Description: My postgresql.conf has the following locale settings ---- #client_encoding = sql_ascii # actually, defaults to database encoding # These settings are initialized by initdb, but they can be changed. lc_messages = 'Russian_Russia.1251' # locale for system error message strings lc_monetary = 'Russian_Russia.1251' # locale for monetary formatting lc_numeric = 'Russian_Russia.1251' # locale for number formatting lc_time = 'Russian_Russia.1251' # locale for time formatting ---- Server encoding is "UTF8". Messages in the log file are usually in UTF8, but some messages are logged in ANSI encoding. Here are some example cases (in the Russian language) when ANSI is used instead of UTF8 -- СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.) ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size". -- СООБЩЕНИЕ: получен запрос на быстрое выключение СООБЩЕНИЕ: прерывание всех активных транзакций -- СООБЩЕНИЕ: система БД была выключена: СООБЩЕНИЕ: система БД готова принимать подключения
Re: BUG #15772: Some messages in log files are in ANSI encodingwhile server encoding is UTF8
From
Bruce Momjian
Date:
On Thu, Apr 18, 2019 at 01:53:18PM +0000, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 15772 > Logged by: Eugene Podshivalov > Email address: yaugenka@gmail.com > PostgreSQL version: 11.2 > Operating system: Windows 10 > Description: > > My postgresql.conf has the following locale settings > ---- > #client_encoding = sql_ascii # actually, defaults to database encoding > > # These settings are initialized by initdb, but they can be changed. > lc_messages = 'Russian_Russia.1251' # locale for system error message > strings > lc_monetary = 'Russian_Russia.1251' # locale for monetary formatting > lc_numeric = 'Russian_Russia.1251' # locale for number formatting > lc_time = 'Russian_Russia.1251' # locale for time formatting > ---- > Server encoding is "UTF8". > Messages in the log file are usually in UTF8, but some messages are logged > in ANSI encoding. > Here are some example cases (in the Russian language) when ANSI is used > instead of UTF8 > -- > СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.) > ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size". > -- > СООБЩЕНИЕ: получен запрос на быстрое выключение > СООБЩЕНИЕ: прерывание всех активных транзакций > -- > СООБЩЕНИЕ: система БД была выключена: > СООБЩЕНИЕ: система БД готова принимать подключения I am kind of confused since all the messages look like Russian to me, except for the mention of "max_wal_size". When you say ANSI, do you mean ISO-8859-5 - Cyrillic, or ASCII? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8
From
Eugene Podshivalov
Date:
Bruce,
Here is a screenshot of how looks like when I open the log file in notepad++ and switch encoding from UTF8 to ANSI.
Regards,
Eugene
чт, 18 апр. 2019 г. в 17:31, Bruce Momjian <bruce@momjian.us>:
On Thu, Apr 18, 2019 at 01:53:18PM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 15772
> Logged by: Eugene Podshivalov
> Email address: yaugenka@gmail.com
> PostgreSQL version: 11.2
> Operating system: Windows 10
> Description:
>
> My postgresql.conf has the following locale settings
> ----
> #client_encoding = sql_ascii # actually, defaults to database encoding
>
> # These settings are initialized by initdb, but they can be changed.
> lc_messages = 'Russian_Russia.1251' # locale for system error message
> strings
> lc_monetary = 'Russian_Russia.1251' # locale for monetary formatting
> lc_numeric = 'Russian_Russia.1251' # locale for number formatting
> lc_time = 'Russian_Russia.1251' # locale for time formatting
> ----
> Server encoding is "UTF8".
> Messages in the log file are usually in UTF8, but some messages are logged
> in ANSI encoding.
> Here are some example cases (in the Russian language) when ANSI is used
> instead of UTF8
> --
> СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.)
> ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size".
> --
> СООБЩЕНИЕ: получен запрос на быстрое выключение
> СООБЩЕНИЕ: прерывание всех активных транзакций
> --
> СООБЩЕНИЕ: система БД была выключена:
> СООБЩЕНИЕ: система БД готова принимать подключения
I am kind of confused since all the messages look like Russian to me,
except for the mention of "max_wal_size". When you say ANSI, do you
mean ISO-8859-5 - Cyrillic, or ASCII?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Attachment
Re: BUG #15772: Some messages in log files are in ANSI encodingwhile server encoding is UTF8
From
Bruce Momjian
Date:
On Thu, Apr 18, 2019 at 05:40:59PM +0300, Eugene Podshivalov wrote: > Bruce, > Here is a screenshot of how looks like when I open the log file in notepad++ > and switch encoding from UTF8 to ANSI. > image.png Uh, I see what you mean. Can you give us a message that is OK and one that is messed up, but the English versions of those? I still don't know what ANSI is? What does the output look like in UTF8 mode? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: BUG #15772: Some messages in log files are in ANSI encodingwhile server encoding is UTF8
From
Alvaro Herrera
Date:
On 2019-Apr-18, Eugene Podshivalov wrote: > Bruce, > Here is a screenshot of how looks like when I open the log file in > notepad++ and switch encoding from UTF8 to ANSI. > [image: image.png] I suppose you have databases with the single-byte encoding amidst your UTF8 ones. AFAIK the log file registers the log entries in the same encoding that the database uses. Different databases can use different encodings. That's pretty broken, but it's how it is. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I suppose you have databases with the single-byte encoding amidst your > UTF8 ones. AFAIK the log file registers the log entries in the same > encoding that the database uses. Different databases can use different > encodings. > That's pretty broken, but it's how it is. Yeah, and it's not easy to improve on. If we tried to convert all log messages to the same encoding, which one would that be? (Please, no nonsense about UTF8 being a universal solution. The Japanese don't think so, for instance.) Also, what do you do if you get an encoding conversion failure? That's even before you get into implementation-dependent problems, like what to do early in process startup before the encoding conversion machinery is operational. A more realistic idea might be to have separate log files for different encodings, though that has a bunch of management issues to solve as well. regards, tom lane
Re: BUG #15772: Some messages in log files are in ANSI encoding whileserver encoding is UTF8
From
Eugene Podshivalov
Date:
I guess that the issue is related to this setting in the postgresql.conf file:
lc_messages = 'Russian_Russia.1251' # locale for system error message
I tried chaning it to 'en_US.UTF-8' and all new message in the log file are in English and look good regardless of whether I view it in UTF-8 or ANSI encoding.
I don't know what ANSI stands for either but it goes first in the list of encodings in notepad++ Encodings menu.
I guess it refers to Windows-1251 in my case.
The English variant of the messed up message in the UTF8 section of the screenshot above is
LOG: database system was shut down at ...
LOG: database system was shut down at ...
LOG: database system is ready to accept connections
All my databases have encoding=UTF8, collate=Russian_Russia.1251, ctype=Russian_Russia.1251
Regards,
Eugene
чт, 18 апр. 2019 г. в 19:20, Tom Lane <tgl@sss.pgh.pa.us>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I suppose you have databases with the single-byte encoding amidst your
> UTF8 ones. AFAIK the log file registers the log entries in the same
> encoding that the database uses. Different databases can use different
> encodings.
> That's pretty broken, but it's how it is.
Yeah, and it's not easy to improve on. If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)
Also, what do you do if you get an encoding conversion failure?
That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.
A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.
regards, tom lane
Re: BUG #15772: Some messages in log files are in ANSI encoding whileserver encoding is UTF8
From
Eugene Podshivalov
Date:
Could it be the issue of not all messages taking lc_messages setting into account?
i.e. in my case all messeges should be in ANSI (Wndows-1251) instead of UTF-8.
Regards,
Eugene
чт, 18 апр. 2019 г. в 19:26, Eugene Podshivalov <yaugenka@gmail.com>:
I guess that the issue is related to this setting in the postgresql.conf file:lc_messages = 'Russian_Russia.1251' # locale for system error messageI tried chaning it to 'en_US.UTF-8' and all new message in the log file are in English and look good regardless of whether I view it in UTF-8 or ANSI encoding.I don't know what ANSI stands for either but it goes first in the list of encodings in notepad++ Encodings menu.I guess it refers to Windows-1251 in my case.The English variant of the messed up message in the UTF8 section of the screenshot above is
LOG: database system was shut down at ...LOG: database system is ready to accept connectionsAll my databases have encoding=UTF8, collate=Russian_Russia.1251, ctype=Russian_Russia.1251Regards,Eugeneчт, 18 апр. 2019 г. в 19:20, Tom Lane <tgl@sss.pgh.pa.us>:Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I suppose you have databases with the single-byte encoding amidst your
> UTF8 ones. AFAIK the log file registers the log entries in the same
> encoding that the database uses. Different databases can use different
> encodings.
> That's pretty broken, but it's how it is.
Yeah, and it's not easy to improve on. If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)
Also, what do you do if you get an encoding conversion failure?
That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.
A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.
regards, tom lane