Re: Additional improvements to extended statistics - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Additional improvements to extended statistics |
Date | |
Msg-id | 56df91cd-c7dc-f46b-6bc3-cb36982ff9ae@enterprisedb.com Whole thread Raw |
In response to | Re: Additional improvements to extended statistics (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: Additional improvements to extended statistics
|
List | pgsql-hackers |
On 11/17/20 4:35 PM, Dean Rasheed wrote: > On Thu, 12 Nov 2020 at 14:18, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> Here is an improved WIP version of the patch series, modified to address >> the issue with repeatedly applying the extended statistics, as discussed >> with Dean in this thread. It's a bit rough and not committable, but I >> need some feedback so I'm posting it in this state. > > As it stands, it doesn't compile if 0003 is applied, because it missed > one of the callers of clauselist_selectivity_simple(), but that's > easily fixed. > >> 0001 is the original patch improving estimates of OR clauses >> >> 0002 adds thin wrappers for clause[list]_selectivity, with "internal" >> functions allowing to specify whether to keep considering extended stats >> >> 0003 does the same for the "simple" functions >> >> >> I've kept it like this to demonstrate that 0002 is not sufficient. In my >> response from March 24 I wrote this: >> >>> Isn't it the case that clauselist_selectivity_simple (and the OR >>> variant) should ignore extended stats entirely? That is, we'd need >>> to add a flag (or _simple variant) to clause_selectivity, so that it >>> calls causelist_selectivity_simple_or. >> But that's actually wrong, as 0002 shows (as it breaks a couple of >> regression tests), because of the way we handle OR clauses. At the top >> level, an OR-clause is actually just a single clause and it may get >> passed to clauselist_selectivity_simple. So entirely disabling extended >> stats for the "simple" functions would also mean disabling extended >> stats for a large number of OR clauses. Which is clearly wrong. >> >> So 0003 addresses that, by adding a flag to the two "simple" functions. >> Ultimately, this should probably do the same thing as 0002 and add thin >> wrappers, because the existing functions are part of the public API. > > I agree that, taken together, these patches fix the > multiple-extended-stats-evaluation issue. However: > > I think this has ended up with too many variants of these functions, > since we now have "_internal" and "_simple" variants, and you're > proposing adding more. The original purpose of the "_simple" variants > was to compute selectivities without looking at extended stats, and > now the "_internal" variants compute selectivities with an additional > "use_extended_stats" flag to control whether or not to look at > extended stats. Thus they're basically the same, and could be rolled > together. > Yeah, I agree there were far too many functions. Your patch looks much cleaner / saner than the one I shared last week. > Additionally, it's worth noting that the "_simple" variants expose the > "estimatedclauses" bitmap as an argument, which IMO is a bit messy as > an API. All callers of the "_simple" functions outside of clausesel.c > actually pass in estimatedclauses=NULL, so it's possible to refactor > and get rid of that, turning estimatedclauses into a purely internal > variable. > Hmmm. I think there were two reasons for exposing the estimatedclauses bitmap like that: (a) we used the function internally and (b) we wanted to allow cases where the user code might do something with the bitmap. The first item is not an issue - we can hide that. As for the second item, my guess is it was unnecessary future-proofing - we don't know about any use case that might need this, so +1 to get rid of it. > Also, it's quite messy that clauselist_selectivity_simple_or() needs > to be passed a Selectivity input (the final argument) that is the > selectivity of any already-estimated clauses, or the value to return > if no not-already-estimated clauses are found, and must be 0.0 when > called from the extended stats code. > True. > Attached is the kind of thing I had in mind (as a single patch, since > I don't think it's worth splitting up). This replaces the "_simple" > and "_internal" variants of these functions with "_opt_ext_stats" > variants whose signatures match the originals except for having the > single extra "use_extended_stats" boolean parameter. Additionally, the > "_simple" functions are merged into the originals (making them more > like they were in PG11) so that the "estimatedclauses" bitmap and > partial-OR-list Selectivity become internal details, no longer exposed > in the API. > Seems fine to me, although the "_opt_ext_stats" is rather cryptic. AFAICS we use "_internal" for similar functions. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: