Re: Super PathKeys (Allowing sort order through precision loss functions) - Mailing list pgsql-hackers

From David Rowley
Subject Re: Super PathKeys (Allowing sort order through precision loss functions)
Date
Msg-id CAKJS1f8BLAWNUevJEMX+sbj58QBpyyKJLwTkOM9A3fs1Ltf7SA@mail.gmail.com
Whole thread Raw
In response to Re: Super PathKeys (Allowing sort order through precision lossfunctions)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Super PathKeys (Allowing sort order through precision lossfunctions)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 31 October 2018 at 14:23, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> The other thing likely affecting this is locale / collation. Probably
> not for date_trunc, but certainly for things like substr()/trim(),
> mentioned by Simon upthread.
>
> In some languages the rules are pretty complex, and there's no chance
> it'll survive arbitrary substr() applied to the string. For example, in
> Czech we mostly sort character-by-character, but "ch" is an exception
> sorted in between "h" and "i".
>
> So essentially "hhhh <= hchh <= hihh". Obviously, substr($1,0,3) cuts
> the "ch" in half, changing the ordering:
>
> create table x (y text collate "cs_CZ");
> insert into x values ('hhhh'), ('hchh'), ('hihh');
>
> test=# select y from x order by 1;
>   y
> ------
>  hhhh
>  hchh
>  hihh
> (3 rows)
>
> test=# select substr(y,0,3) from x order by 1;
>  substr
> --------
>  hc
>  hh
>  hi
> (3 rows)
>
> I'm preeeeeeetty sure other languages have even funnier rules ...

That's pretty interesting, but as mentioned in my initial email...
More careful thought would be needed for other numerical types and
text types, I imagine, though I've not thought much about that.

I don't really think trim() or substr() would ever work for the reason
that they don't always operate on a prefix of the string. What you've
mentioned seems to rule out LEFT().

> I'm mildly suspicious of this data set, because it's essentially
> perfectly sorted/deterministic, and the Sort node won't have to do
> anything. So perhaps it just works by chance.
>
> Consider this instead:
>
> create table dt (ts timestamp, x text);
>
> insert into dt select * from (select d, (case when random() < 0.5 then
> 'month' else 'hour' end) from generate_series('2018-01-01'::timestamp,
> '2018-12-31'::timestamp, '1 hour'::interval) d) as foo order by random();
>

[...]

>  2018-01-01 00:00:00
>  2018-01-01 00:00:00
>  2018-01-01 00:00:00
>  2018-01-02 13:00:00
>  ... kaboooooom! ...

Yeah. This is an issue. Thanks for the test case. However, I
acknowledged that in my reply to Tom. I did overlook it, which was
completely down to lack of imagination on my part. I'd never
considered using date_trunc() without a const 1st argument before.  It
seems simple enough to disable the optimisation in that case. I've
attached a patch which does that. Maybe that'll help us look beyond
this and focus on any other reasons why this is not possible.

It's also true that this diminishes the usefulness of the idea, but
part of the reason I've posting the idea so early after having thought
about it is precisely to see if this is going to float or sink.
Maybe we'll decide the scope of usefulness is so small that it's not
worth it, or that each function has such different requirements that
we can't reasonably make it work by adding a few columns to pg_proc.
I'm personally more interested in the cases that can work. I
understand there is no shortage of cases where it can't.

Giving that we require const arguments away from the orderkey, perhaps
it could be made to work for simple arithmetic OpExprs. I'm not sure
if the use cases are actually there for that sort of thing and I've
seen WHERE indexcol+0 = <n> used many times to disable the use of
indexes, so making pathkeys see through those might be more annoying
than useful... But it's a thought...

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: ToDo: show size of partitioned table
Next
From: Amit Langote
Date:
Subject: Re: FDW Parallel Append