Thread: 13.x, stream replication and locale(?) issues
Hello.
I have a 13.4 pgsql instance on Linux which has a bunch of databases with UTF-8/ru_RU.utf8 encoding/collation set.
I've stream replicated it to the 13.10 instance on FreeBSD (may be this is the part where it all has gone wrong way, but at this moment I believe streaming replication should work since both run one major version). And the funny things started to happen.
First, the instance said
sql: error: FATAL: database locale is incompatible with operating system
DETAIL: The database was initialized with LC_COLLATE "ru_RU.utf8", which is not recognized by setlocale().
HINT: Recreate the database with another locale or install the missing locale.
I've decided to go the easy way and just symlinked the ru_RU.UTF-8 locale to ru_RU.utf8, because from my understanding it's the same locale, just cased differently (seems like I was totally wrong).
The database was running fine for quite some time and then I've got tonnes of complains about weird SQL queries behavior. I'll just illustrate the difference:
Master instance:
tpark-rbac=# select id, service_name from micro_service where service_name='profile';
id | service_name
----+--------------
17 | profile
(1 row)
tpark-rbac=# select operation_id, micro_service_id from micro_service_operation where operation_id='admin.member-request.list';
operation_id | micro_service_id
---------------------------+------------------
admin.member-request.list | 17
(1 row)
tpark-rbac=# SELECT ms.id FROM micro_service_operation mso, micro_service ms where mso.micro_service_id=ms.id and ms.service_name='profile' AND mso.operation_id='admin.member-request.list';
id
----
17
(1 row)
Standby instance:
tpark-rbac=# select id, service_name from micro_service where service_name='profile';
id | service_name
----+--------------
17 | profile
(1 row)
tpark-rbac=# select operation_id, micro_service_id from micro_service_operation where operation_id='admin.member-request.list';
operation_id | micro_service_id
---------------------------+------------------
admin.member-request.list | 17
(1 row)
tpark-rbac=# SELECT ms.id FROM micro_service_operation mso, micro_service ms where mso.micro_service_id=ms.id and ms.service_name='profile' AND mso.operation_id='admin.member-request.list';
id
----
(0 rows)
The thing is, as it seems, that the value "admin.member-request.list" isn't quite "admin.member-request.list" on a standby:
tpark-rbac=# SELECT ms.id, mso.operation_id, ms.service_name, length(mso.operation_id) as msolength FROM micro_service_operation mso, micro_service ms where mso.micro_service_id=ms.id and ms.service_name='profile' and mso.operation_id like 'admin.member-request.list%';
id | operation_id | service_name | msolength
----+---------------------------+--------------+-----------
17 | admin.member-request.list | profile | 25
(1 row)
tpark-rbac=# SELECT ms.id, mso.operation_id, ms.service_name, length(mso.operation_id) as msolength FROM micro_service_operation mso, micro_service ms where mso.micro_service_id=ms.id and ms.service_name='profile' and mso.operation_id like 'admin.member-request.list';
id | operation_id | service_name | msolength
----+--------------+--------------+-----------
(0 rows)
And I suppose this is because of the locale hack.
Now a bunch of stupid questions:
1) why the utf-8 locales behave differently when working with what appears to be clearly latin1 characters ? From my understanding latin1 characters shouldn't be affected at all.
2) why does the query where I just ask for equality of the value to the "admin.member-request.list" work perfectly when FROM clause contains one table, but fails as soon as FROM starts to contain multiple tables ?
3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux and compile it on FreeBSD - will it help ?
4) the most disappointing thing is that I have long-term relationship with pgsql replication and to this day I was able to do any kinds of juggling - replicating from Linux to Solaris, from Solaris to FreeBSD and vice-versa, all possible combinations as long as UTF-8 was the encoding - what changed now ?
5) will the downgrading to 13.4 on the standby help me ?
Thanks.
Eugene.
"Eugene M. Zheganin" <eugene@zhegan.in> writes: > I have a 13.4 pgsql instance on Linux which has a bunch of databases > with UTF-8/ru_RU.utf8 encoding/collation set. > I've stream replicated it to the 13.10 instance on FreeBSD (may be this > is the part where it all has gone wrong way, but at this moment I > believe streaming replication should work since both run one major > version). And the funny things started to happen. Sadly, ru_RU.utf8 on Linux and ru_RU.utf8 on FreeBSD are almost certainly *not* exactly compatible. You could probably ignore the differences if you were using logical replication, but with physical replication any difference in sort order is going to mean that indexes on text columns appear corrupt on the standby. Which is exactly what your troubles sound like. You could verify this theory by seeing whether contrib/amcheck reports any ordering problems in the indexes of the troublesome tables. (You'll probably have to promote the standby to primary in order to install the amcheck extension, but you need to treat that installation as hosed anyway ...) There's more info about this general class of problems at https://wiki.postgresql.org/wiki/Locale_data_changes although that focuses on the even-more-annoying case where locale sort order changes between releases of a single OS. regards, tom lane
On Wed, Mar 1, 2023 at 12:09 AM Eugene M. Zheganin <eugene@zhegan.in> wrote: > 3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux and compile it on FreeBSD - will it help ? Out of curiosity (I'm not saying it's a good idea!), do you know if FreeBSD's localedef can compile glibc's collation definitions? In theory they are in a format standardised by POSIX... I suspect there may be extensions and quirks... At a wild guess, since the data you showed doesn't even look like it contains non-ASCII characters (it looks like machine readable identifiers or something, and perhaps its the sort order of '-' that is causing you trouble), so it might also be possible to use "ucs_basic" locale for that column and then all computers will agree on the sort order, but of course that doesn't address the more general problem; presumably you might also have Russian language text in your system too. As for ".utf8" vs ".UTF-8", which one is selected by initdb as the database default seems to be something that varies between Linux distributions, so I guess maybe the installers use different techniques for discovering and selecting default locale names. Unlike glibc, FreeBSD doesn't do any name mangling at all when mapping LC_COLLATE to a pathname to find the file, whereas glibc downcases and removes '-' so you can find both formats of name in the various places...
On Wed, Mar 1, 2023 at 10:30 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Mar 1, 2023 at 12:09 AM Eugene M. Zheganin <eugene@zhegan.in> wrote: > > 3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux and compile it on FreeBSD - will it help ? > > Out of curiosity (I'm not saying it's a good idea!), do you know if > FreeBSD's localedef can compile glibc's collation definitions? In > theory they are in a format standardised by POSIX... I suspect there > may be extensions and quirks... Another thought: if you upgrade to 15, you could use ICU as the default collation provider, and then make sure you have the same major version of ICU on your Linux and FreeBSD systems (which probably means compiling from source on FreeBSD, as FreeBSD tends to have a newish ICU in packages, while popular stable Linux distributions tend to have a fairly old one). As far as I know, Linux and FreeBSD systems on amd64 arch should match in every other important detail already (they both follow the System V ABI, so there shouldn't be any layout differences in eg structs that are sent over the wire AFAIK). For what it's worth, for the future we are trying to figure out how to support multi-version ICU so that you could explicitly set the provider to ICU v72 to get that sort of thing working across OS versions and even "close enough" cousins like your case, without having to compile anything from source, but unfortunately we didn't manage to get it ready in time for 16.