Thread: Unexpected behavior sorting strings
I'm seeing some unexpected behavior when sorting some strings, and it indicates I don't fully understand how postgresql stringsorting works. As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding value. In particular, I'm seeing the following. I would expect "< S" to come first, because "<" (0x3c) is less than ">" (0x3e). ``` supercatdev=# select unnest(array['> N', '< S']) as s order by s; s ----- > N < S (2 rows) ``` I've broken this down further: ``` supercatdev=# select '> N' < '< S'; ?column? ---------- t (1 row) ``` Am I missing something about how sorting works? Metadata: - postgresql 9.5.19, running on Ubuntu 16LTS - encoding, collate, and ctype are all UTF8 or en_US.UTF-8, as appropriate Thanks! Jimmy
"Jimmy Thrasher" <jimmy@jimmythrasher.com> writes: > As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding value. Only if you're using C locale. Other locales such as en_US have completely different rules, which most hackers tend to find pretty unintelligible and inconsistent :-(. In your example, I think the first-pass sort is on just the letters, and only if those are the same will it consider the punctuation. regards, tom lane
On 4/8/20 7:35 AM, Jimmy Thrasher wrote: > I'm seeing some unexpected behavior when sorting some strings, and it indicates I don't fully understand how postgresqlstring sorting works. > > As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding value. > > In particular, I'm seeing the following. I would expect "< S" to come first, because "<" (0x3c) is less than ">" (0x3e). > > ``` > supercatdev=# select unnest(array['> N', '< S']) as s order by s; > s > ----- > > N > < S > (2 rows) > ``` > > I've broken this down further: > ``` > supercatdev=# select '> N' < '< S'; > ?column? > ---------- > t > (1 row) > ``` > > Am I missing something about how sorting works? I believe you are looking for 'C' collation: test=# select unnest(array[('> N' collate "C") , ('< S' COLLATE "C")]) as s order by s; s ----- < S > N (2 rows) For more information see: https://www.postgresql.org/docs/12/collation.html > > Metadata: > - postgresql 9.5.19, running on Ubuntu 16LTS > - encoding, collate, and ctype are all UTF8 or en_US.UTF-8, as appropriate > > Thanks! > > Jimmy > > -- Adrian Klaver adrian.klaver@aklaver.com
Many thanks! That clarifies things well. Jimmy On Wed, Apr 8, 2020, at 11:49 AM, Adrian Klaver wrote: > On 4/8/20 7:35 AM, Jimmy Thrasher wrote: <snip> > > Am I missing something about how sorting works? > > I believe you are looking for 'C' collation: > > test=# select unnest(array[('> N' collate "C") , ('< S' COLLATE "C")]) > as s order by s; > > s > ----- > < S > > N > (2 rows) > > > For more information see: > > https://www.postgresql.org/docs/12/collation.html