On Wed, May 03, 2006 at 17:58:07 -0400, Greg Stark <gsstark@mit.edu> wrote:
> 
> Though it's optimized poorly and does a superfluous sort step:
> 
> stark=> explain select col1 from test group by col1 order by upper(col1);
>                                 QUERY PLAN                                 
> ---------------------------------------------------------------------------
>  Sort  (cost=99.72..100.22 rows=200 width=32)
>    Sort Key: upper(col1)
>    ->  Group  (cost=85.43..92.08 rows=200 width=32)
>          ->  Sort  (cost=85.43..88.50 rows=1230 width=32)
>                Sort Key: col1
>                ->  Seq Scan on test  (cost=0.00..22.30 rows=1230 width=32)
> (6 rows)
> 
> 
> Whereas it shouldn't be hard to prove that this is equivalent:
> 
> stark=> explain select col1 from test group by upper(col1),col1 order by upper(col1);
>                              QUERY PLAN                              
> ---------------------------------------------------------------------
>  Group  (cost=88.50..98.23 rows=200 width=32)
>    ->  Sort  (cost=88.50..91.58 rows=1230 width=32)
>          Sort Key: upper(col1), col1
>          ->  Seq Scan on test  (cost=0.00..25.38 rows=1230 width=32)
> (4 rows)
I don't think you can assume that that will be true for any locale. If there
are two different characters that both have the same uppercase version, this
will break things.
And while you would expect that x = y => upper(x) = upper(y) I am not sure
that is guarenteed for locales. I can imagine having two different characters
that are treated the same for ordering purposes, but have uppercase versions
that are considered different for ordering purposes.