Thread: 13.x, stream replication and locale(?) issues

13.x, stream replication and locale(?) issues

From
"Eugene M. Zheganin"
Date:

Hello.

I have a 13.4 pgsql instance on Linux which has a bunch of databases with UTF-8/ru_RU.utf8 encoding/collation set.

I've stream replicated it to the 13.10 instance on FreeBSD (may be this is the part where it all has gone wrong way, but at this moment I believe streaming replication should work since both run one major version). And the funny things started to happen.

First, the instance said

sql: error: FATAL:  database locale is incompatible with operating system
DETAIL:  The database was initialized with LC_COLLATE "ru_RU.utf8",  which is not recognized by setlocale().
HINT:  Recreate the database with another locale or install the missing locale.

I've decided to go the easy way and just symlinked the ru_RU.UTF-8 locale to ru_RU.utf8, because from my understanding it's the same locale, just cased differently (seems like I was totally wrong).

The database was running fine for quite some time and then I've got tonnes of complains about weird SQL queries behavior. I'll just illustrate the difference:

Master instance:

tpark-rbac=# select id, service_name from micro_service where service_name='profile';                                   
id | service_name  
----+--------------
17 | profile
(1 row)

tpark-rbac=# select operation_id, micro_service_id from micro_service_operation where operation_id='admin.member-request.list';
      operation_id        | micro_service_id  
---------------------------+------------------
admin.member-request.list |               17
(1 row)

tpark-rbac=# SELECT ms.id FROM micro_service_operation mso, micro_service ms where mso.micro_service_id=ms.id and ms.service_name='profile' AND mso.operation_id='admin.member-request.list';
id  
----
17
(1 row)

Standby instance:

tpark-rbac=# select id, service_name from micro_service where service_name='profile';
id | service_name  
----+--------------
17 | profile
(1 row)

tpark-rbac=# select operation_id, micro_service_id from micro_service_operation where operation_id='admin.member-request.list';
      operation_id        | micro_service_id  
---------------------------+------------------
admin.member-request.list |               17
(1 row)

tpark-rbac=# SELECT ms.id FROM micro_service_operation mso, micro_service ms where mso.micro_service_id=ms.id and ms.service_name='profile' AND mso.operation_id='admin.member-request.list';
id  
----
(0 rows)


The thing is, as it seems, that the value "admin.member-request.list" isn't quite "admin.member-request.list" on a standby:


tpark-rbac=# SELECT ms.id, mso.operation_id, ms.service_name, length(mso.operation_id) as msolength FROM micro_service_operation mso, micro_service ms where mso.micro_service_id=ms.id and ms.service_name='profile' and mso.operation_id like 'admin.member-request.list%';
id |       operation_id        | service_name | msolength  
----+---------------------------+--------------+-----------
17 | admin.member-request.list | profile      |        25
(1 row)

tpark-rbac=# SELECT ms.id, mso.operation_id, ms.service_name, length(mso.operation_id) as msolength FROM micro_service_operation mso, micro_service ms where mso.micro_service_id=ms.id and ms.service_name='profile' and mso.operation_id like 'admin.member-request.list';  
id | operation_id | service_name | msolength  
----+--------------+--------------+-----------
(0 rows)


And I suppose this is because of the locale hack.

Now a bunch of stupid questions:

1) why the utf-8 locales behave differently when working with what appears to be clearly latin1 characters ? From my understanding latin1 characters shouldn't be affected at all.

2) why does the query where I just ask for equality of the value to the "admin.member-request.list" work perfectly when FROM clause contains one table, but fails as soon as FROM starts to contain multiple tables ?

3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux and compile it on FreeBSD  - will it help ?

4) the most disappointing thing is that I have long-term relationship with pgsql replication and to this day I was able to do any kinds of juggling - replicating from Linux to Solaris, from Solaris to FreeBSD and vice-versa, all possible combinations as long as UTF-8 was the encoding - what changed now ?

5) will the downgrading to 13.4 on the standby help me ?


Thanks.

Eugene.

Re: 13.x, stream replication and locale(?) issues

From
Tom Lane
Date:
"Eugene M. Zheganin" <eugene@zhegan.in> writes:
> I have a 13.4 pgsql instance on Linux which has a bunch of databases 
> with UTF-8/ru_RU.utf8 encoding/collation set.

> I've stream replicated it to the 13.10 instance on FreeBSD (may be this 
> is the part where it all has gone wrong way, but at this moment I 
> believe streaming replication should work since both run one major 
> version). And the funny things started to happen.

Sadly, ru_RU.utf8 on Linux and ru_RU.utf8 on FreeBSD are almost
certainly *not* exactly compatible.  You could probably ignore the
differences if you were using logical replication, but with physical
replication any difference in sort order is going to mean that indexes
on text columns appear corrupt on the standby.  Which is exactly what
your troubles sound like.

You could verify this theory by seeing whether contrib/amcheck reports
any ordering problems in the indexes of the troublesome tables.
(You'll probably have to promote the standby to primary in order to
install the amcheck extension, but you need to treat that installation
as hosed anyway ...)

There's more info about this general class of problems at

https://wiki.postgresql.org/wiki/Locale_data_changes

although that focuses on the even-more-annoying case where locale
sort order changes between releases of a single OS.

            regards, tom lane



Re: 13.x, stream replication and locale(?) issues

From
Thomas Munro
Date:
On Wed, Mar 1, 2023 at 12:09 AM Eugene M. Zheganin <eugene@zhegan.in> wrote:
> 3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux and compile it on FreeBSD  - will it help
?

Out of curiosity (I'm not saying it's a good idea!), do you know if
FreeBSD's localedef can compile glibc's collation definitions?  In
theory they are in a format standardised by POSIX...  I suspect there
may be extensions and quirks...

At a wild guess, since the data you showed doesn't even look like it
contains non-ASCII characters (it looks like machine readable
identifiers or something, and perhaps its the sort order of '-' that
is causing you trouble), so it might also be possible to use
"ucs_basic" locale for that column and then all computers will agree
on the sort order, but of course that doesn't address the more general
problem; presumably you might also have Russian language text in your
system too.

As for ".utf8" vs ".UTF-8", which one is selected by initdb as the
database default seems to be something that varies between Linux
distributions, so I guess maybe the installers use different
techniques for discovering and selecting default locale names.  Unlike
glibc, FreeBSD doesn't do any name mangling at all when mapping
LC_COLLATE to a pathname to find the file, whereas glibc downcases and
removes '-' so you can find both formats of name in the various
places...



Re: 13.x, stream replication and locale(?) issues

From
Thomas Munro
Date:
On Wed, Mar 1, 2023 at 10:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Mar 1, 2023 at 12:09 AM Eugene M. Zheganin <eugene@zhegan.in> wrote:
> > 3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux and compile it on FreeBSD  - will it help
?
>
> Out of curiosity (I'm not saying it's a good idea!), do you know if
> FreeBSD's localedef can compile glibc's collation definitions?  In
> theory they are in a format standardised by POSIX...  I suspect there
> may be extensions and quirks...

Another thought: if you upgrade to 15, you could use ICU as the
default collation provider, and then make sure you have the same major
version of ICU on your Linux and FreeBSD systems (which probably means
compiling from source on FreeBSD, as FreeBSD tends to have a newish
ICU in packages, while popular stable Linux distributions tend to have
a fairly old one).  As far as I know, Linux and FreeBSD systems on
amd64 arch should match in every other important detail already (they
both follow the System V ABI, so there shouldn't be any layout
differences in eg structs that are sent over the wire AFAIK).

For what it's worth, for the future we are trying to figure out how to
support multi-version ICU so that you could explicitly set the
provider to ICU v72 to get that sort of thing working across OS
versions and even "close enough" cousins like your case, without
having to compile anything from source, but unfortunately we didn't
manage to get it ready in time for 16.