Thread: BUG: Incorrect working with POSIX locale if database in UTF-8encoding

BUG: Incorrect working with POSIX locale if database in UTF-8encoding

From
Олег Самойлов
Date:
I think this is a bug, because database encoding logging must work even in this case. The main problem is with pacemaker module pgsqlms, which launch pg_ctl in empty environment and thus with broken logging.
Let me explain on examples.
Empty databases:

-bash-4.2$ psql -p 5433 -l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | ru_RU.UTF-8 | ru_RU.UTF-8 |
 template0 | postgres | UTF8     | ru_RU.UTF-8 | ru_RU.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | ru_RU.UTF-8 | ru_RU.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

In empty environment:

-bash-4.2$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

-bash-4.2$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/krogan0a start
waiting for server to start....2018-10-17 12:04:20.762 MSK [788] ?????????:  ??? ?????? ??????????? ?? ?????? IPv4 "0.0.0.0" ?????? ???? 5433
2018-10-17 12:04:20.763 MSK [788] ?????????:  ??? ?????? ??????????? ?? ?????? IPv6 "::" ?????? ???? 5433
2018-10-17 12:04:20.770 MSK [788] ?????????:  ??? ?????? ??????????? ?????? Unix-????? "/var/run/postgresql/.s.PGSQL.5433"
2018-10-17 12:04:20.784 MSK [788] ?????????:  ??? ?????? ??????????? ?????? Unix-????? "/tmp/.s.PGSQL.5433"
2018-10-17 12:04:20.811 MSK [788] ?????????:  ???????? ?????? ? ???????? ???????? ????? ??????????
2018-10-17 12:04:20.811 MSK [788] ?????????:  ? ?????????? ????????? ????? ?????????? ? ??????? "log".
 done

And log file also unreadable:

2018-10-17 11:56:57.328 MSK [579] ?????????:  ??????? ?? ???? ?????????: 2018-10-17 11:55:41 MSK
2018-10-17 11:56:57.339 MSK [577] ?????????:  ??????? ?? ?????? ????????? ???????????
2018-10-17 12:04:14.754 MSK [577] ?????????:  ??????? ?????? ?? ??????? ??????????
2018-10-17 12:04:14.808 MSK [577] ?????????:  ?????????? ???? ???????? ??????????
2018-10-17 12:04:14.817 MSK [577] ?????????:  ??????? ???????: logical replication launcher (PID 586) ?????????? ? ????? ?????? 1
2018-10-17 12:04:14.817 MSK [580] ?????????:  ??????????
2018-10-17 12:04:16.930 MSK [577] ?????????:  ??????? ?? ?????????
2018-10-17 12:04:20.821 MSK [790] ?????????:  ??????? ?? ???? ?????????: 2018-10-17 12:04:15 MSK
2018-10-17 12:04:20.858 MSK [788] ?????????:  ??????? ?? ?????? ????????? ???????????

But in a connection to database all fine:

-bash-4.2$ psql -p 5433
psql (10.5)
Type "help" for help.

postgres=# selectt;
ОШИБКА:  ошибка синтаксиса (примерное положение: "selectt")
LINE 1: selectt;
        ^

And in log file:

2018-10-17 12:05:08.048 MSK [801] ОШИБКА:  ошибка синтаксиса (примерное положение: "selectt") (символ 1)
2018-10-17 12:05:08.048 MSK [801] ОПЕРАТОР:  selectt;

And on stop:

-bash-4.2$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/krogan0a stop
waiting for server to shut down..... done
server stopped

But log file:

2018-10-17 12:09:10.376 MSK [788] ?????????:  ??????? ?????? ?? ??????? ??????????
2018-10-17 12:09:10.384 MSK [788] ?????????:  ?????????? ???? ???????? ??????????
2018-10-17 12:09:10.390 MSK [788] ?????????:  ??????? ???????: logical replication launcher (PID 797) ?????????? ? ????? ?????? 1
2018-10-17 12:09:10.390 MSK [791] ?????????:  ??????????
2018-10-17 12:09:12.365 MSK [788] ?????????:  ??????? ?? ?????????

Work around is to set UTF-8 locale or LANGUAGE:

-bash-4.2$ LANG=ru_RU.UTF-8 /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/krogan0a start
ожидание запуска сервера....2018-10-17 12:11:05.508 MSK [951] СООБЩЕНИЕ:  для приёма подключений по адресу IPv4 "0.0.0.0" открыт порт 5433
2018-10-17 12:11:05.508 MSK [951] СООБЩЕНИЕ:  для приёма подключений по адресу IPv6 "::" открыт порт 5433
2018-10-17 12:11:05.515 MSK [951] СООБЩЕНИЕ:  для приёма подключений открыт Unix-сокет "/var/run/postgresql/.s.PGSQL.5433"
2018-10-17 12:11:05.529 MSK [951] СООБЩЕНИЕ:  для приёма подключений открыт Unix-сокет "/tmp/.s.PGSQL.5433"
2018-10-17 12:11:05.553 MSK [951] СООБЩЕНИЕ:  передача вывода в протокол процессу сбора протоколов
2018-10-17 12:11:05.553 MSK [951] ПОДСКАЗКА:  В дальнейшем протоколы будут выводиться в каталог "log".
 готово
сервер запущен

log file:

2018-10-17 12:11:05.562 MSK [953] СООБЩЕНИЕ:  система БД была выключена: 2018-10-17 12:09:10 MSK
2018-10-17 12:11:05.576 MSK [951] СООБЩЕНИЕ:  система БД готова принимать подключения

Or:

-bash-4.2$ LANG=en_US.UTF-8 /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/krogan0a start
waiting for server to start....2018-10-17 12:12:35.182 MSK [1030] СООБЩЕНИЕ:  для приёма подключений по адресу IPv4 "0.0.0.0" открыт порт 5433
2018-10-17 12:12:35.182 MSK [1030] СООБЩЕНИЕ:  для приёма подключений по адресу IPv6 "::" открыт порт 5433
2018-10-17 12:12:35.189 MSK [1030] СООБЩЕНИЕ:  для приёма подключений открыт Unix-сокет "/var/run/postgresql/.s.PGSQL.5433"
2018-10-17 12:12:35.203 MSK [1030] СООБЩЕНИЕ:  для приёма подключений открыт Unix-сокет "/tmp/.s.PGSQL.5433"
2018-10-17 12:12:35.229 MSK [1030] СООБЩЕНИЕ:  передача вывода в протокол процессу сбора протоколов
2018-10-17 12:12:35.229 MSK [1030] ПОДСКАЗКА:  В дальнейшем протоколы будут выводиться в каталог "log".
 done
server started

Log file:

2018-10-17 12:12:35.243 MSK [1032] СООБЩЕНИЕ:  система БД была выключена: 2018-10-17 12:12:30 MSK
2018-10-17 12:12:35.257 MSK [1030] СООБЩЕНИЕ:  система БД готова принимать подключения

What is strange set US locale help to write russian messages. %) Or:

-bash-4.2$ LANGUAGE=english /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/krogan0a start
waiting for server to start....2018-10-17 12:14:51.252 MSK [1106] LOG:  listening on IPv4 address "0.0.0.0", port 5433
2018-10-17 12:14:51.252 MSK [1106] LOG:  listening on IPv6 address "::", port 5433
2018-10-17 12:14:51.275 MSK [1106] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2018-10-17 12:14:51.358 MSK [1106] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2018-10-17 12:14:51.435 MSK [1106] LOG:  redirecting log output to logging collector process
2018-10-17 12:14:51.435 MSK [1106] HINT:  Future log output will appear in directory "log".
 done
server started

Log file:

2018-10-17 12:14:51.469 MSK [1108] LOG:  database system was shut down at 2018-10-17 12:14:33 MSK
2018-10-17 12:14:51.522 MSK [1106] LOG:  database system is ready to accept connections

There is not problem with systemctl, if system locale setted by localectl is the same as database locale. But, as I said, there is problem with pacemaker pgsqlms module. And I think this is incorrect behavior. Database may write in log messages in database locale, if this messages arises inside connection to this database. But common messages, such as messages on start cluster or errors on start cluster must written according to locale settings, and if locale empty (POSIX) such messages must be in english. And if messages not in english they must be visible as is, not as ‘???’. :) It works inside connection, why this does not work on start/stop?

-bash-4.2$ LANGUAGE=english /usr/pgsql-10/bin/pg_ctl --version
pg_ctl (PostgreSQL) 10.5

-bash-4.2$ rpm -q postgresql10
postgresql10-10.5-1PGDG.rhel7.x86_64

-bash-4.2$ cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

Re: BUG: Incorrect working with POSIX locale if database in UTF-8encoding

From
Adrian Klaver
Date:
On 10/17/18 2:29 AM, Олег Самойлов wrote:
> I think this is a bug, because database encoding logging must work even 
> in this case. The main problem is with pacemaker module pgsqlms, which 
> launch pg_ctl in empty environment and thus with broken logging.

I suggest filing an issue here:

https://github.com/ClusterLabs/PAF/issues




-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: BUG: Incorrect working with POSIX locale if database in UTF-8encoding

From
Олег Самойлов
Date:
Don’t agree. Pgsqlms just run pg_ctl with POSIX locale, this is not a bug. But unreadable messages when locale charset
don’tmatch database charset is bug of pg_ctl. When messages come from within connection all fine, why don’t do the same
formessages on start/stop? 

> 17 окт. 2018 г., в 15:13, Adrian Klaver <adrian.klaver@aklaver.com> написал(а):
>
> On 10/17/18 2:29 AM, Олег Самойлов wrote:
>> I think this is a bug, because database encoding logging must work even in this case. The main problem is with
pacemakermodule pgsqlms, which launch pg_ctl in empty environment and thus with broken logging. 
>
> I suggest filing an issue here:
>
> https://github.com/ClusterLabs/PAF/issues
>
>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com



Re: BUG: Incorrect working with POSIX locale if database in UTF-8encoding

From
"Jehan-Guillaume (ioguix) de Rorthais"
Date:
On Wed, 17 Oct 2018 12:29:52 +0300
Олег Самойлов <splarv@ya.ru> wrote:

> There is not problem with systemctl, if system locale setted by localectl is
> the same as database locale. But, as I said, there is problem with pacemaker
> pgsqlms module. And I think this is incorrect behavior. Database may write in
> log messages in database locale, if this messages arises inside connection to
> this database. But common messages, such as messages on start cluster or
> errors on start cluster must written according to locale settings, and if
> locale empty (POSIX) such messages must be in english. And if messages not in
> english they must be visible as is, not as ‘???’. :) It works inside
> connection, why this does not work on start/stop?

It seems like POSIX support ASCII and some additional characters. So maybe the
character encoding used in the database is just incompatible with POSIX, ie.
POSIX does not support wide chars. As you can see, characters from ASCII table
are printed correctly.


=?utf-8?B?0J7Qu9C10LMg0KHQsNC80L7QudC70L7Qsg==?= <splarv@ya.ru> writes:
> [ postmaster's localized messages are printed as garbage if LANG is C or unset ]

I'm not quite convinced that this is a bug.  The reason it's misbehaving
is that in the postmaster process (and, probably, non-backend children)
LC_MESSAGES gets set to whatever you said in postgresql.conf, but LC_CTYPE
is never changed away from what it was in the postmaster's environment.
So if the prevailing environment setting is C/POSIX, gettext() throws up
its hands and substitutes "?" for non-ASCII characters, because it has
no idea which encoding to render them in.

This is sort of intentional, in that the environment LC_CTYPE ought to
reflect the "console encoding" that you're operating in; if you run your
terminal in say KOI8R, then you set LC_CTYPE=ru_RU.koi8r and messages
should get printed in the encoding the terminal is expecting.

We could maybe make a case for forcing gettext to use the encoding
implied by LC_MESSAGES if LC_CTYPE is C/POSIX, but I'm not really
convinced that there's anything principled about that.

On the other hand, the current behavior in this situation surely
isn't useful to anybody.  Arguably, gettext() is being pretty
unhelpful here, but I doubt we could get them to change.

Peter, any thoughts?

            regards, tom lane


Re: BUG: Incorrect working with POSIX locale if database in UTF-8encoding

From
Олег Самойлов
Date:
I think correct behavior will be get the whole locale from postgresql.conf (like the backend processes do) or from
environment.It’s  a question, may be, from what place do take locale, but obviously from only one. But do not take
LC_TYPEfrom the one place (postgresql.conf), while LC_MESSAGES from other (environment). Te bug is here. 

> 18 окт. 2018 г., в 19:29, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> =?utf-8?B?0J7Qu9C10LMg0KHQsNC80L7QudC70L7Qsg==?= <splarv@ya.ru> writes:
>> [ postmaster's localized messages are printed as garbage if LANG is C or unset ]
>
> I'm not quite convinced that this is a bug.  The reason it's misbehaving
> is that in the postmaster process (and, probably, non-backend children)
> LC_MESSAGES gets set to whatever you said in postgresql.conf, but LC_CTYPE
> is never changed away from what it was in the postmaster's environment.
> So if the prevailing environment setting is C/POSIX, gettext() throws up
> its hands and substitutes "?" for non-ASCII characters, because it has
> no idea which encoding to render them in.
>
> This is sort of intentional, in that the environment LC_CTYPE ought to
> reflect the "console encoding" that you're operating in; if you run your
> terminal in say KOI8R, then you set LC_CTYPE=ru_RU.koi8r and messages
> should get printed in the encoding the terminal is expecting.
>
> We could maybe make a case for forcing gettext to use the encoding
> implied by LC_MESSAGES if LC_CTYPE is C/POSIX, but I'm not really
> convinced that there's anything principled about that.
>
> On the other hand, the current behavior in this situation surely
> isn't useful to anybody.  Arguably, gettext() is being pretty
> unhelpful here, but I doubt we could get them to change.
>
> Peter, any thoughts?
>
>             regards, tom lane