Thread: Sorting Discrepancy in PostgreSQL 14.13

Sorting Discrepancy in PostgreSQL 14.13

From
[3반]김민지_4904
Date:
==============================================
        POSTGRESQL BUG REPORT TEMPLATE
==============================================

Your name              : minji-kim
Your email address : hzuiw33@gmail.com


# System Configuration:
---------------------
Architecture (example: Intel Pentium)                      : Intel(R) Core(TM) Ultra 7 155H
Operating System (example: Linux 2.4.18)              : VMware Workstation Pro (Ubuntu-22.04)
PostgreSQL version (example: PostgreSQL 9.6.6)  : PostgreSQL 14.13
Compiler used (example: gcc 3.3.5)                         : X (sudo apt install postgresql postgresql-contrib)


# Please enter a FULL description of your problem:
------------------------------------------------

Sorting Discrepancy in PostgreSQL 14.13

When running the following command in PostgreSQL 14.13:

```sql
CREATE TABLE t0 (c0 TEXT);
INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
SELECT c0, MIN(ABS(CAST(c0 AS BIGINT))) OVER (ORDER BY c0 NULLS FIRST) AS min_function_cast FROM t0;
DROP TABLE IF EXISTS t0;
```

The result is:

```
 c0  | min_function_cast
-----+-------------------
 -10 |                10
 15  |                10
 20  |                10
 -25 |                10
 -5  |                 5
(5 rows)
```

However, in other DBMS (SQLite, MySQL, OracleDB) and PostgreSQL version 17.0, the output is:

```
 c0  | min_function_cast
-----+-------------------
 -10 |                10
 -25 |                10
 -5  |                 5
 15  |                 5
 20  |                 5
(5 rows)
```

This discrepency is due to different string sorting orders.

The minimized PoC is:

```sql
CREATE TABLE t0 (c0 TEXT);
INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
SELECT c0 FROM t0 ORDER BY c0;
```

In PostgreSQL 14.13, the order is incorrect:
```
-10
15
20
-25
-5
```

While the correct order should be:

```
-10
-25
-5
10
20
```

as '-' is smaller than '1', or '2' in ascii.

I'm doubtful this is a collation issue, as most collations basically respect ASCII order.

Even if this issue appears to be related to collation, no warnings are provided when migrating from this version.

# Please describe a way to repeat the problem.  
# Please try to provide a concise reproducible example, if at all possible:
----------------------------------------------------------------------

Running the following command in PostgreSQL 14.13:

```sql
CREATE TABLE t0 (c0 TEXT);
INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
SELECT c0, MIN(ABS(CAST(c0 AS BIGINT))) OVER (ORDER BY c0 NULLS FIRST) AS min_function_cast FROM t0;
DROP TABLE IF EXISTS t0;
```

Results in the incorrect sorting behavior.


If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------
Using the correct sort criteria will solve this problem.

Re: Sorting Discrepancy in PostgreSQL 14.13

From
Tomas Vondra
Date:
Hi,

On 11/14/24 13:49, [3반]김민지_4904 wrote:
>
> ...
>
> The minimized PoC is:
> 
> ```sql
> CREATE TABLE t0 (c0 TEXT);
> INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
> SELECT c0 FROM t0 ORDER BY c0;
> ```
> 
> In PostgreSQL 14.13, the order is incorrect:
> ```
> -10
> 15
> 20
> -25
> -5
> ```
> 
> While the correct order should be:
> 
> ```
> -10
> -25
> -5
> 10
> 20
> ```
> 
> as '-' is smaller than '1', or '2' in ascii.
> 
> I'm doubtful this is a collation issue, as most collations basically
> respect ASCII order.
> 

This is 99.999% due to the collation, so which collations are being used
on these systems? Also, I don't get this "incorrect" behavior on 14.13,
it behaves the same as 17 for me, producing the expected result.


regards

-- 
Tomas Vondra




Re: Sorting Discrepancy in PostgreSQL 14.13

From
Tom Lane
Date:
Tomas Vondra <tomas@vondra.me> writes:
> On 11/14/24 13:49, [3반]김민지_4904 wrote:
>> I'm doubtful this is a collation issue, as most collations basically
>> respect ASCII order.

> This is 99.999% due to the collation, so which collations are being used
> on these systems? Also, I don't get this "incorrect" behavior on 14.13,
> it behaves the same as 17 for me, producing the expected result.

It surely is a collation issue.  Using a glibc-based system, I get

u8=# CREATE TABLE t0 (c0 TEXT);
CREATE TABLE
u8=# INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
INSERT 0 5
u8=# select * from t0 order by c0 collate "C";
 c0  
-----
 -10
 -25
 -5
 15
 20
(5 rows)

u8=# select * from t0 order by c0 collate "en_US";
 c0  
-----
 -10
 15
 20
 -25
 -5
(5 rows)

(In point of fact, most glibc collations do NOT "respect ASCII order".
They tend to ignore punctuation until it's needed as a tiebreaker.)

So this is surely down to the PG 14.13 installation having a different
default collation than whatever it's compared to, which most likely
is caused by having run initdb with a different locale environment.

            regards, tom lane



Re: Sorting Discrepancy in PostgreSQL 14.13

From
Peter Eisentraut
Date:
On 14.11.24 13:49, [3반]김민지_4904 wrote:
> as '-' is smaller than '1', or '2' in ascii.
> 
> I'm doubtful this is a collation issue, as most collations basically 
> respect ASCII order.

See also here for a possible explanation: 
https://peter.eisentraut.org/blog/2023/04/12/how-collation-of-punctuation-and-whitespace-works