Thread: Unexpected behavior sorting strings

Unexpected behavior sorting strings

From

"Jimmy Thrasher"

Date:

08 April 2020, 14:35:32

I'm seeing some unexpected behavior when sorting some strings, and it indicates I don't fully understand how postgresql
stringsorting works.
 

As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding value.

In particular, I'm seeing the following. I would expect "< S" to come first, because "<" (0x3c) is less than ">"
(0x3e).

```
supercatdev=# select unnest(array['> N', '< S']) as s order by s;
  s
-----
 > N
 < S
(2 rows)
```

I've broken this down further:
```
supercatdev=# select '> N' < '< S';
 ?column?
----------
 t
(1 row)
```

Am I missing something about how sorting works?

Metadata:
- postgresql 9.5.19, running on Ubuntu 16LTS
- encoding, collate, and ctype are all UTF8 or en_US.UTF-8, as appropriate

Thanks!

Jimmy

Re: Unexpected behavior sorting strings

From

Tom Lane

Date:

08 April 2020, 15:47:51

"Jimmy Thrasher" <jimmy@jimmythrasher.com> writes:
> As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding
value.

Only if you're using C locale.  Other locales such as en_US have
completely different rules, which most hackers tend to find pretty
unintelligible and inconsistent :-(.  In your example, I think
the first-pass sort is on just the letters, and only if those are
the same will it consider the punctuation.

            regards, tom lane

Re: Unexpected behavior sorting strings

From

Adrian Klaver

Date:

08 April 2020, 15:49:34

On 4/8/20 7:35 AM, Jimmy Thrasher wrote:
> I'm seeing some unexpected behavior when sorting some strings, and it indicates I don't fully understand how
postgresqlstring sorting works.
 
> 
> As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding
value.
> 
> In particular, I'm seeing the following. I would expect "< S" to come first, because "<" (0x3c) is less than ">"
(0x3e).
> 
> ```
> supercatdev=# select unnest(array['> N', '< S']) as s order by s;
>    s
> -----
>   > N
>   < S
> (2 rows)
> ```
> 
> I've broken this down further:
> ```
> supercatdev=# select '> N' < '< S';
>   ?column?
> ----------
>   t
> (1 row)
> ```
> 
> Am I missing something about how sorting works?

I believe you are looking for 'C' collation:

test=# select unnest(array[('> N' collate "C") , ('< S' COLLATE "C")]) 
as s order by s;

   s
-----
  < S
  > N
(2 rows)


For more information see:

https://www.postgresql.org/docs/12/collation.html

> 
> Metadata:
> - postgresql 9.5.19, running on Ubuntu 16LTS
> - encoding, collate, and ctype are all UTF8 or en_US.UTF-8, as appropriate
> 
> Thanks!
> 
> Jimmy
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Unexpected behavior sorting strings

From

"Jimmy Thrasher"

Date:

08 April 2020, 16:08:27

Many thanks! That clarifies things well.

Jimmy

On Wed, Apr 8, 2020, at 11:49 AM, Adrian Klaver wrote:
> On 4/8/20 7:35 AM, Jimmy Thrasher wrote:
<snip>
> > Am I missing something about how sorting works?
> 
> I believe you are looking for 'C' collation:
> 
> test=# select unnest(array[('> N' collate "C") , ('< S' COLLATE "C")]) 
> as s order by s;
> 
>    s
> -----
>   < S
>   > N
> (2 rows)
> 
> 
> For more information see:
> 
> https://www.postgresql.org/docs/12/collation.html