Home > mailing lists

Re: insensitive collations - Mailing list pgsql-hackers

From	Daniel Verite
Subject	Re: insensitive collations
Date	January 10, 2019 00:01:36
Msg-id	bf0e38ff-691d-404f-a7a0-b88b22300cb8@manitou-mail.org Whole thread Raw
In response to	Re: insensitive collations (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses	Re: insensitive collations (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List	pgsql-hackers

Tree view

    Peter Eisentraut wrote:

> > =# select n from (values ('été' collate "myfr"), ('ete')) x(n)
> >   group by 1 order by 1 ;
> >   n
> > -----
> >  ete
> > (1 row)
> >
> > =#  select n from (values ('été' collate "myfr"), ('ete')) x(n)
> >   group by 1 order by 1 desc;
> >   n
> > -----
> >  été
> > (1 row)
>
> I don't see anything wrong here.  The collation says that both values
> are equal, so which one is returned is implementation-dependent.

Is it, but it's impractical if the product of seemingly the same GROUP BY
flip-flops between its different valid results. If it can't be avoided, then
okay. If it can be avoided at little cost, then it would be better to do it.

As a different example, the regression tests are somewhat counting on
this already. Consider this part:

+CREATE TABLE test3ci (x text COLLATE case_insensitive);
+INSERT INTO test1ci VALUES ('abc'), ('def'), ('ghi');
+INSERT INTO test2ci VALUES ('ABC'), ('ghi');
+INSERT INTO test3ci VALUES ('abc'), ('ABC'), ('def'), ('ghi');
...
+SELECT x, count(*) FROM test3ci GROUP BY x ORDER BY x;
+  x  | count
+-----+-------
+ abc |     2
+ def |     1
+ ghi |     1
+(3 rows)

If ABC was returned here instead of abc for whatever reason,
that would be correct strictly speaking, yet "make check" would fail.
That's impractical.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

pgsql-hackers by date:

From: Joerg Sonnenberger
Date: 09 January 2019, 23:35:20
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)

From: John Naylor
Date: 10 January 2019, 00:09:32
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

Re: insensitive collations - Mailing list pgsql-hackers

Previous

Next