Re: Additional improvements to extended statistics - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Additional improvements to extended statistics |
Date | |
Msg-id | 20200324010838.hbv5gzhexcv5cilg@development Whole thread Raw |
In response to | Re: Additional improvements to extended statistics (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: Additional improvements to extended statistics
|
List | pgsql-hackers |
On Mon, Mar 23, 2020 at 08:21:42AM +0000, Dean Rasheed wrote: >On Sat, 21 Mar 2020 at 21:59, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> Ah, right. Yeah, I think that should work. I thought there would be some >> volatility due to groups randomly not making it into the MCV list, but >> you're right it's possible to construct the data in a way to make it >> perfectly deterministic. So that's what I've done in the attached patch. >> > >Looking through those new tests, another issue occurred to me -- under >some circumstances this patch can lead to extended stats being applied >twice to the same clause, which is not great, because it involves >quite a lot of extra work, and also because it can lead to >overestimates because of the way that MCV stats are applied as a delta >correction to simple_sel. > >The way this comes about is as follows -- if we start with an OR >clause, that will be passed as a single-item implicitly AND'ed list to >clauselist_selectivity(), and from there to >statext_clauselist_selectivity() and then to >statext_mcv_clauselist_selectivity(). This will call >clauselist_selectivity_simple() to get simple_sel, before calling >mcv_clauselist_selectivity(), which will recursively compute all the >MCV corrections. However, the call to clauselist_selectivity_simple() >will call clause_selectivity() for each clause (just a single OR >clause in this example), which will now call >clauselist_selectivity_or(), which will go back into >statext_clauselist_selectivity() with "is_or = true", which will apply >the MCV corrections to the same set of clauses that the outer call was >about to process. > Hmmm. So let's consider a simple OR clause with two arguments, both covered by single statistics object. Something like this: CREATE TABLE t (a int, b int); INSERT INTO t SELECT mod(i, 10), mod(i, 10) FROM generate_series(1,100000); CREATE STATISTICS s (mcv) ON a,b FROM t; SELECT * FROM t WHERE a = 0 OR b = 0; Which is estimated correctly, but the calls graph looks like this: clauselist_selectivity statext_clauselist_selectivity statext_mcv_clauselist_selectivity <--- clauselist_selectivity_simple clause_selectivity clauselist_selectivity_or statext_clauselist_selectivity statext_mcv_clauselist_selectivity <--- clauselist_selectivity_simple_or clause_selectivity clause_selectivity mcv_clauselist_selectivity clauselist_selectivity_simple_or mcv_clauselist_selectivity clauselist_selectivity_simple (already estimated) IIUC the problem you have in mind is that we end up calling statext_mcv_clauselist_selectivity twice, once for the top-level AND clause with a single argument, and then recursively for the expanded OR clause. Indeed, that seems to be applying the correction twice. >I'm not sure what's the best way to resolve that. Perhaps >statext_mcv_clauselist_selectivity() / statext_is_compatible_clause() >should ignore OR clauses from an AND-list, on the basis that they will >get processed recursively later. Or perhaps estimatedclauses can >somehow be used to prevent this, though I'm not sure exactly how that >would work. I don't know. I feel uneasy about just ignoring some of the clauses, because what happens for complex clauses, where the OR is not directly in the AND clause, but is negated or something like that? Isn't it the case that clauselist_selectivity_simple (and the OR variant) should ignore extended stats entirely? That is, we'd need to add a flag (or _simple variant) to clause_selectivity, so that it calls causelist_selectivity_simple_or. So the calls would look like this: clauselist_selectivity statext_clauselist_selectivity statext_mcv_clauselist_selectivity clauselist_selectivity_simple <--- disable extended stats clause_selectivity clauselist_selectivity_simple_or clause_selectivity clause_selectivity mcv_clauselist_selectivity clauselist_selectivity_simple already estimated I've only quickly hacked clause_selectivity, but it does not seems very invasive (of course, it means disruption to clause_selectivity callers, but I suppose most call clauselist_selectivity). BTW do you have an example where this would actually cause an issue? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: