Thread: [HACKERS] Order-preserving function transforms and EquivalenceClass

[HACKERS] Order-preserving function transforms and EquivalenceClass

From
Mat Arye
Date:
Hi,

I am on a team developing an open-source extension for time-series data storage in PostgreSQL (https://github.com/timescaledb/timescaledb). 

We are trying to extend/hook into the planner so that it understands that date_trunc('minute', time) has the same ordering as time (or rather that a sort ordering on the latter is always a valid sort ordering on the former). But this question really applies to any order-preserving transform such as (time+1) vs (time). 

Let's assume we can detect such cases. How do we tell the planner about it. 

I see two ways of doing this:

(1) Having an EquivalenceClass with two members - the "time" and "date_trunc" member. Then the pathkey for the sort would reference this member and understand the equivalence. I think this is a pretty clean way of doing things. But this equivalence between "time" and "date_trunc" only applies to sorts. It should not apply to joins (for which EquivalenceClass is also used) because these values are not actually equivalent. They are only equivalent for sorting purposes. I know an EquivalenceClass has a ec_has_volatile field which marks the EquivalenceClass as only for sorts and not for joins. But is it an incorrect hack to set this for cases where the EquivalenceMembers are not volatile? Is there some other way as marking an EquivalenceClass as only for sorts that I am missing? Another wrinkle is that while a date_trunc sort can be safely represented by a time sort the reverse isn't true. Has there been any discussion on supporting such cases in EquivalenceClasses? I'd be happy to submit a patch to the core if people think this is worthwhile. 

(2) I can create new EquivalenceClass objects and pathkeys and use planner hooks to substitute the appropriate pathkeys for doing things like finding indexes etc. I have a prototype where this works, but I just think approach 1 is much cleaner. 

Any thoughts would be greatly appreciated. I am pretty new to the planner codebase and just want to avoid any obvious pitfalls.

Thanks,
Matvey Arye

Re: Order-preserving function transforms and EquivalenceClass

From
Tom Lane
Date:
Mat Arye <mat@timescaledb.com> writes:
> We are trying to extend/hook into the planner so that it understands that
> date_trunc('minute', time) has the same ordering as time (or rather that a
> sort ordering on the latter is always a valid sort ordering on the former).
> But this question really applies to any order-preserving transform such as
> (time+1) vs (time).

Hmm ... it seems like the thing that is missing from your statement of the
problem is the notion of one ordering being a refinement of another.  You
can't just say that "date_trunc('minute', time) has the same ordering as
time", or that for a float value x "floor(x) has the same ordering as x".
You can say that a data series ordered by x is also ordered by floor(x),
*but not vice versa*.  So "same" seems like the wrong word.  A related
concept that the planner does understand today is that a data series
ordered by x,y is also ordered by x ... but not vice versa.

The EquivalenceClass mechanism doesn't seem to me to be up to representing
such considerations.  Maybe it could be extended, but I think there's some
pretty fundamental design work needed here.  And I'm not sure it should be
that mechanism in particular: ECs are mostly about making transitivity
deductions, ie a=b and b=c implies a=c.  I'm not very sure what we'd want
an ordering-refinements notion to act like, but I'm pretty sure it's not
like equality.

Thinking about the "ORDER BY x,y" vs "ORDER BY x" precedent, I wonder
whether you could do something in the PathKey mechanism.  But PathKeys
currently depend on per-column EquivalenceClasses, so it's not real
obvious how to proceed there either.

If you've got ideas about what this should look like, I'm all ears.
But it's going to take some pretty basic work, you're not going to
just plug into already-existing mechanism.
        regards, tom lane