Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PoC/WIP: Extended statistics on expressions |
Date | |
Msg-id | 4a2918bb-49b0-ccd5-3af7-f7795941a6ad@enterprisedb.com Whole thread Raw |
In response to | Re: PoC/WIP: Extended statistics on expressions (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: PoC/WIP: Extended statistics on expressions
|
List | pgsql-hackers |
On 12/7/20 5:02 PM, Dean Rasheed wrote: > On Mon, 7 Dec 2020 at 14:15, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> >> On 12/7/20 10:56 AM, Dean Rasheed wrote: >>> it might actually be >>> neater to have separate documented syntaxes for single- and >>> multi-column statistics: >>> >>> CREATE STATISTICS [ IF NOT EXISTS ] statistics_name >>> ON (expression) >>> FROM table_name >>> >>> CREATE STATISTICS [ IF NOT EXISTS ] statistics_name >>> [ ( statistics_kind [, ... ] ) ] >>> ON { column_name | (expression) } , { column_name | (expression) } [, ...] >>> FROM table_name >> >> I think it makes sense in general. I see two issues with this approach, >> though: >> >> * By adding expression/standard stats for individual statistics, it >> makes the list of statistics longer - I wonder if this might have >> measurable impact on lookups in this list. >> >> * I'm not sure it's a good idea that the second syntax would always >> build the per-expression stats. Firstly, it seems a bit strange that it >> behaves differently than the other kinds. Secondly, I wonder if there >> are cases where it'd be desirable to explicitly disable building these >> per-expression stats. For example, what if we have multiple extended >> statistics objects, overlapping on a couple expressions. It seems >> pointless to build the stats for all of them. >> > > Hmm, I'm not sure it would really be a good idea to build MCV stats on > expressions without also building the standard stats for those > expressions, otherwise the assumptions that > mcv_combine_selectivities() makes about simple_sel and mcv_basesel > wouldn't really hold. But then, if multiple MCV stats shared the same > expression, it would be quite wasteful to build standard stats on the > expression more than once. > Yeah. You're right it'd be problematic to build MCV on expressions without having the per-expression stats. In fact, that's exactly the problem what forced me to add the per-expression stats to this patch. Originally I planned to address it in a later patch, but I had to move it forward. So I think you're right we need to ensure we have standard stats for each expression at least once, to make this work well. > It feels like it should build a single extended stats object for each > unique expression, with appropriate dependencies for any MCV stats > that used those expressions, but I'm not sure how complex that would > be. Dropping the last MCV stat object using a standard expression stat > object might reasonably drop the expression stats ... except if they > were explicitly created by the user, independently of any MCV stats. > That could get quite messy. > Possibly. But I don't think it's worth the extra complexity. I don't expect people to have a lot of overlapping stats, so the amount of wasted space and CPU time is expected to be fairly limited. So I don't think it's worth spending too much time on this now. Let's just do what you proposed, and revisit this later if needed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: