2018-03-12 14:03 GMT-03:00 Mat Arye <mat@timescale.com>: > I have a question about statistics hooks. I am trying to teach the planner > that when grouping by something like date_trunc('1 day', time) will produce > a lot less rows than the number of distinct time values. I want to do that > in an extension. The problem is that I don't see a way to make the > get_relation_stats_hook work well fo that since by the time it's called you > only see the `time` var and not the full expression. None of the other hooks > seem appropriate either. So 2 questions: > Isn't it the case to extend the available hook? > 1) Would people be opposed to adding a code hook somewhere at the start of > `examine_variable` (selfuncs.c) to allow creating statistics on complete > expressions? I can submit a patch if this seems reasonable. > If you explain the use case maybe it could be considered.
So the use-case is an analytical query like
SELECT date_trunc('hour', time) AS MetricMinuteTs, AVG(value) as avg
FROM hyper
WHERE time >= '2001-01-04T00:00:00' AND time <= '2001-01-05T01:00:00'
GROUP BY MetricMinuteTs
ORDER BY MetricMinuteTs DESC;
Right now this query will choose a much-less-efficient GroupAggregate plan
instead of a HashAggregate. It will choose this because it thinks the number of groups
produced here is 9,000,000 because that's the number of distinct time values there are.
But, because date_trunc "buckets" the values there will be about 24 groups (1 for each hour).
> 2) Do patches that add code hooks (and are probably under 10 lines) need to > go through the entire commitfest process. I guess what I am really asking is > if PG12 would be the first version such a patch could appear in or is PG11 > still a possibility? Just wondering what the policy on such stuff is. > If it is a new feature and is not in the last CF, it won't be considered for v11 (even small patches).