Home > mailing lists

Re: sortsupport for text - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: sortsupport for text
Date	June 19, 2012 13:17:50
Msg-id	CAEYLb_XhhPopKcW5MYtLpoAS1zXry0GRqf0KqYP7tRZAq6bd5w@mail.gmail.com Whole thread
In response to	Re: sortsupport for text ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses	Re: sortsupport for text
List	pgsql-hackers

Tree view

On 19 June 2012 16:17, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> wrote:
>
>> So, just to give a bit more weight to my argument that we should
>> recognise that equivalent strings ought to be treated identically
>
> Since we appear to be questioning everything in this area, I'll
> raise something which has been bugging me for a while: in some other
> systems I've used, the "tie-breaker" comparison for equivalent
> values comes after equivalence sorting on *all* sort keys, rather
> than *each* sort key.

Are you sure that they actually have a tie-breaker, and don't just
make the distinction between equality and equivalence (if only
internally)? I would have checked that myself already, but I don't
have access to any other RDBMS that I'd expect to care about these
kinds of distinctions. They make sense for ensuring that the text
comparator's notion of equality is consistent with text's general
notion (if that's bitwise equality, which I suspect it is in these
other products too for the same reasons it is for us). I don't see why
you'd want a tie-breaker across multiple keys. I mean, you could, I
just don't see any reason to.

> test=# select * from c order by 2;
>  last_name | first_name
> -----------+------------
>  smith     | bob
>  SMITH     | EDWARD
>  smith     | peter
> (3 rows)
>
> This seems completely wrong:
>
> test=# select * from c order by 1,2;
>  last_name | first_name
> -----------+------------
>  smith     | bob
>  smith     | peter
>  SMITH     | EDWARD
> (3 rows)

Agreed. Definitely a POLA violation.

> I'm sure the latter is harder to do and slower to execute; but the
> former just doesn't seem defensible as correct.

This same gripe is held by the author of that sorting document I
linked to from the Unicode consortium, with a very similar example. So
it seems like this could be a win from several perspectives, as it
would enable the strxfrm() optimisation. I'm pretty sure that
pg_upgrade wouldn't be very happy about this, so we'd have to have a
legacy compatibility mode.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

pgsql-hackers by date:

From: Merlin Moncure
Date: 19 June 2012, 13:16:23
Subject: Re: pgsql_fdw in contrib

From: Kohei KaiGai
Date: 19 June 2012, 13:26:53
Subject: Re: WIP Patch: Selective binary conversion of CSV file foreign tables

Re: sortsupport for text - Mailing list pgsql-hackers

Previous

Next