Re: sortsupport for text - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: sortsupport for text
Date
Msg-id CAEYLb_XhhPopKcW5MYtLpoAS1zXry0GRqf0KqYP7tRZAq6bd5w@mail.gmail.com
Whole thread Raw
In response to Re: sortsupport for text  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: sortsupport for text
List pgsql-hackers
On 19 June 2012 16:17, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> wrote:
>
>> So, just to give a bit more weight to my argument that we should
>> recognise that equivalent strings ought to be treated identically
>
> Since we appear to be questioning everything in this area, I'll
> raise something which has been bugging me for a while: in some other
> systems I've used, the "tie-breaker" comparison for equivalent
> values comes after equivalence sorting on *all* sort keys, rather
> than *each* sort key.

Are you sure that they actually have a tie-breaker, and don't just
make the distinction between equality and equivalence (if only
internally)? I would have checked that myself already, but I don't
have access to any other RDBMS that I'd expect to care about these
kinds of distinctions. They make sense for ensuring that the text
comparator's notion of equality is consistent with text's general
notion (if that's bitwise equality, which I suspect it is in these
other products too for the same reasons it is for us). I don't see why
you'd want a tie-breaker across multiple keys. I mean, you could, I
just don't see any reason to.

> test=# select * from c order by 2;
>  last_name | first_name
> -----------+------------
>  smith     | bob
>  SMITH     | EDWARD
>  smith     | peter
> (3 rows)
>
> This seems completely wrong:
>
> test=# select * from c order by 1,2;
>  last_name | first_name
> -----------+------------
>  smith     | bob
>  smith     | peter
>  SMITH     | EDWARD
> (3 rows)

Agreed. Definitely a POLA violation.

> I'm sure the latter is harder to do and slower to execute; but the
> former just doesn't seem defensible as correct.

This same gripe is held by the author of that sorting document I
linked to from the Unicode consortium, with a very similar example. So
it seems like this could be a win from several perspectives, as it
would enable the strxfrm() optimisation. I'm pretty sure that
pg_upgrade wouldn't be very happy about this, so we'd have to have a
legacy compatibility mode.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: pgsql_fdw in contrib
Next
From: Kohei KaiGai
Date:
Subject: Re: WIP Patch: Selective binary conversion of CSV file foreign tables