Thread: BUG #18523: String compare not consistent
The following bug has been logged on the website: Bug reference: 18523 Logged by: Patrick van Dijk Email address: patrick.van.dijk@gmail.com PostgreSQL version: 16.3 Operating system: Windows vs Linux/Azure Description: When I try the following on a Azure/Linux version of PostgreSQL i see some strange results that are not correct. On Windows the behavior is correct. select '|' < '0'; -- true, true, this is correct select '|1' < '01'; -- false, true, Azure/Linux is wrong select '||' < '0|'; -- true, true, this is correct select '||1' < '0|1'; -- false, true, Azure/Linux is wrong When the first character compares <, then no matter what follows, it should be <...
On Windows: PostgreSQL 16.2, compiled by Visual C++ build 1937, 64-bit
On Azure: PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.2.0, 64-bit
On Wed, Jun 26, 2024 at 11:44 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18523
Logged by: Patrick van Dijk
Email address: patrick.van.dijk@gmail.com
PostgreSQL version: 16.3
Operating system: Windows vs Linux/Azure
Description:
When I try the following on a Azure/Linux version of PostgreSQL i see some
strange results that are not correct.
On Windows the behavior is correct.
select '|' < '0'; -- true, true, this is correct
select '|1' < '01'; -- false, true, Azure/Linux is wrong
select '||' < '0|'; -- true, true, this is correct
select '||1' < '0|1'; -- false, true, Azure/Linux is wrong
When the first character compares <, then no matter what follows, it should
be <...
I am getting more curious, WIndows is consistent, but Azure/Linux is not.
select '.' < '0'; -- true, true
select '.x' < '0x'; -- false, true
select '.1' < '01'; -- false, true
select '..' < '0.'; -- true, true
select '..1' < '0.1'; -- false, true
select '.x' < '0x'; -- false, true
select '.1' < '01'; -- false, true
select '..' < '0.'; -- true, true
select '..1' < '0.1'; -- false, true
On Wed, Jun 26, 2024 at 11:44 AM Patrick van Dijk <patrick.van.dijk@gmail.com> wrote:
On Windows: PostgreSQL 16.2, compiled by Visual C++ build 1937, 64-bitOn Azure: PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.2.0, 64-bitOn Wed, Jun 26, 2024 at 11:44 AM PG Bug reporting form <noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 18523
Logged by: Patrick van Dijk
Email address: patrick.van.dijk@gmail.com
PostgreSQL version: 16.3
Operating system: Windows vs Linux/Azure
Description:
When I try the following on a Azure/Linux version of PostgreSQL i see some
strange results that are not correct.
On Windows the behavior is correct.
select '|' < '0'; -- true, true, this is correct
select '|1' < '01'; -- false, true, Azure/Linux is wrong
select '||' < '0|'; -- true, true, this is correct
select '||1' < '0|1'; -- false, true, Azure/Linux is wrong
When the first character compares <, then no matter what follows, it should
be <...
On Wed, 2024-06-26 at 09:43 +0000, PG Bug reporting form wrote: > PostgreSQL version: 16.3 > Operating system: Windows vs Linux/Azure > > When I try the following on a Azure/Linux version of PostgreSQL i see some > strange results that are not correct. > On Windows the behavior is correct. > > select '|' < '0'; -- true, true, this is correct > select '|1' < '01'; -- false, true, Azure/Linux is wrong > select '||' < '0|'; -- true, true, this is correct > select '||1' < '0|1'; -- false, true, Azure/Linux is wrong > > When the first character compares <, then no matter what follows, it should > be <... That is certainly not a PostgreSQL bug, because PostgreSQL uses collations defined by the C library (or the ICU library, if you use that). So you'd have to complain to the authors of the respective library. Interestingly, glibc and ICU disagree about that: SELECT '|1' < '01' COLLATE "de_DE.utf8" AS glibc, '|1' < '01' COLLATE "de-AT-x-icu" AS icu; glibc │ icu ═══════╪═════ f │ t (1 row) Yours, Laurenz Albe
PG Bug reporting form <noreply@postgresql.org> writes: > When I try the following on a Azure/Linux version of PostgreSQL i see some > strange results that are not correct. > On Windows the behavior is correct. "Correct" is in the eye of the beholder, and non-C collations tend to act in ways far more complicated than you seem to expect. Check the locale settings on the two machines; unless they are both "C", there's not really any guarantee of consistency between Windows and anybody else. As a demonstration that this isn't some Postgres-specific weirdness, you can try asking sort(1) what it thinks. On a RHEL8 machine I get $ echo -e '|\n0' | LANG=C sort 0 | $ echo -e '|1\n01' | LANG=C sort 01 |1 $ echo -e '|\n0' | LANG=en_US.utf8 sort | 0 $ echo -e '|1\n01' | LANG=en_US.utf8 sort 01 |1 and I'd expect the same from any Linux machine with a similar vintage of glibc. regards, tom lane
On 26.06.24 17:59, Laurenz Albe wrote: > Interestingly, glibc and ICU disagree about that: > > SELECT '|1' < '01' COLLATE "de_DE.utf8" AS glibc, > '|1' < '01' COLLATE "de-AT-x-icu" AS icu; > > glibc │ icu > ═══════╪═════ > f │ t > (1 row) Let's work this out. The collation weights of the characters involved are: 0030 ; [.209F.0020.0002] # DIGIT ZERO 0031 ; [.20A0.0020.0002] # DIGIT ONE 007C ; [*06AF.0020.0002] # VERTICAL LINE So under the ICU default "noignore" behavior, the sort keys are: '|1' => 06AF 20A0 0000 0020 0020 0000 0002 0002 '01' => 209F 20A0 0000 0020 0020 0000 0002 0002 Therefore, '|1' < '01'. Under "shifted", the effective collation weights are: 0030 ; [.209F.0020.0002.FFFF] # DIGIT ZERO 0031 ; [.20A0.0020.0002.FFFF] # DIGIT ONE 007C ; [.0000.0000.0000.06AF] # VERTICAL LINE and the sort keys are: '|1' => 20A0 0000 0020 0000 0002 0000 06AF FFFF '01' => 209F 20A0 0000 0020 0020 0000 0002 0002 0000 FFFF FFFF Therefore, '|1' > '01'. That is what you get in glibc. (Older glibc use "shift-trimmed" behavior, which trims the trailing FFFF's, which doesn't affect the result in this case.) Under ICU (or rather CLDR), the '|' character is not actually a variable collation element by default, so you actually get the same result if you ask for "noignore" or "shifted" behavior. But you can make symbols variable with 'und-u-ka-shifted-kv-symbol', and then you will also get the same result as glibc. Further reading: - https://peter.eisentraut.org/blog/2023/04/12/how-collation-of-punctuation-and-whitespace-works - https://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings