Thread: [HACKERS] Order-preserving function transforms and EquivalenceClass
Hi,
I am on a team developing an open-source extension for time-series data storage in PostgreSQL (https://github.com/ timescaledb/timescaledb).
We are trying to extend/hook into the planner so that it understands that date_trunc('minute', time) has the same ordering as time (or rather that a sort ordering on the latter is always a valid sort ordering on the former). But this question really applies to any order-preserving transform such as (time+1) vs (time).
Let's assume we can detect such cases. How do we tell the planner about it.
I see two ways of doing this:
(1) Having an EquivalenceClass with two members - the "time" and "date_trunc" member. Then the pathkey for the sort would reference this member and understand the equivalence. I think this is a pretty clean way of doing things. But this equivalence between "time" and "date_trunc" only applies to sorts. It should not apply to joins (for which EquivalenceClass is also used) because these values are not actually equivalent. They are only equivalent for sorting purposes. I know an EquivalenceClass has a ec_has_volatile field which marks the EquivalenceClass as only for sorts and not for joins. But is it an incorrect hack to set this for cases where the EquivalenceMembers are not volatile? Is there some other way as marking an EquivalenceClass as only for sorts that I am missing? Another wrinkle is that while a date_trunc sort can be safely represented by a time sort the reverse isn't true. Has there been any discussion on supporting such cases in EquivalenceClasses? I'd be happy to submit a patch to the core if people think this is worthwhile.
(2) I can create new EquivalenceClass objects and pathkeys and use planner hooks to substitute the appropriate pathkeys for doing things like finding indexes etc. I have a prototype where this works, but I just think approach 1 is much cleaner.
Any thoughts would be greatly appreciated. I am pretty new to the planner codebase and just want to avoid any obvious pitfalls.
Thanks,
Matvey Arye
Mat Arye <mat@timescaledb.com> writes: > We are trying to extend/hook into the planner so that it understands that > date_trunc('minute', time) has the same ordering as time (or rather that a > sort ordering on the latter is always a valid sort ordering on the former). > But this question really applies to any order-preserving transform such as > (time+1) vs (time). Hmm ... it seems like the thing that is missing from your statement of the problem is the notion of one ordering being a refinement of another. You can't just say that "date_trunc('minute', time) has the same ordering as time", or that for a float value x "floor(x) has the same ordering as x". You can say that a data series ordered by x is also ordered by floor(x), *but not vice versa*. So "same" seems like the wrong word. A related concept that the planner does understand today is that a data series ordered by x,y is also ordered by x ... but not vice versa. The EquivalenceClass mechanism doesn't seem to me to be up to representing such considerations. Maybe it could be extended, but I think there's some pretty fundamental design work needed here. And I'm not sure it should be that mechanism in particular: ECs are mostly about making transitivity deductions, ie a=b and b=c implies a=c. I'm not very sure what we'd want an ordering-refinements notion to act like, but I'm pretty sure it's not like equality. Thinking about the "ORDER BY x,y" vs "ORDER BY x" precedent, I wonder whether you could do something in the PathKey mechanism. But PathKeys currently depend on per-column EquivalenceClasses, so it's not real obvious how to proceed there either. If you've got ideas about what this should look like, I'm all ears. But it's going to take some pretty basic work, you're not going to just plug into already-existing mechanism. regards, tom lane