Re: Additional Statistics Hooks - Mailing list pgsql-hackers
From | Mat Arye |
---|---|
Subject | Re: Additional Statistics Hooks |
Date | |
Msg-id | CADsUR0BC4E_n=msYGzcBa4M_crvpO6qqyx5eQuEveOA3BD+PjA@mail.gmail.com Whole thread Raw |
In response to | Re: Additional Statistics Hooks (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Additional Statistics Hooks
|
List | pgsql-hackers |
On Tue, Mar 13, 2018 at 6:31 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 13 March 2018 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While it would certainly be nice to have better behavior for that,
> "add a hook so users who can write C can fix it by hand" doesn't seem
> like a great solution. On top of the sheer difficulty of writing a
> hook function, you'd have the problem that no pre-written hook could
> know about all available functions. I think somehow we'd need a way
> to add per-function knowledge, perhaps roughly like the protransform
> feature.
I think this isn't either-or. I think a general hook can be useful for extensions
that want to optimize particular data distributions/workloads using domain-knowledge about functions common for those workloads.
That way users working with that data can use extensions to optimize workloads without writing C themselves. I also think a
protransform like feature would add a lot of power to the native planner but this could take a while
to get into core properly and may not handle all kinds of data distributions/cases.
An example, of a case a protransform type system would not be able to optimize is mathematical operator expressions like bucketing integers by decile --- (integer / 10) * 10.
This is somewhat analogous to date_trunc in the integer space and would also change the number of resulting distinct rows.
I always imagined that extended statistics could be used for this.
Right now the estimates are much better when you create an index on
the function, but there's no real reason to limit the stats that are
gathered to just plain columns + expression indexes.
I believe I'm not the only person to have considered this. Originally
extended statistics were named multivariate statistics. I think it was
Dean and I (maybe others too) that suggested to Tomas to give the
feature a more generic name so that it can be used for a more general
purpose later.
I also think that the point with extended statistics is a good one and points to the need for more experimentation/experience which I think
a C hook is better suited for. Putting in a hook will allow extension writers like us to experiment and figure out the kinds of transform on statistics that are useful while having
a small footprint on the core. I think designing a protransform-like system would benefit from more experience with the kinds of transformations that are useful.
For example, can anything be done if the interval passed to date_trunc is not constant, or is it not even worth bothering with that case? Maybe extended
statistics is a better approach, etc.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: