Re: Additional improvements to extended statistics - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Additional improvements to extended statistics
Date
Msg-id CAEZATCVE2Jx1QB0FFazvgjKAY25tv-EWX9C5XSefEZY-gxn75A@mail.gmail.com
Whole thread Raw
In response to Re: Additional improvements to extended statistics  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Additional improvements to extended statistics  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Thu, 12 Nov 2020 at 14:18, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> Here is an improved WIP version of the patch series, modified to address
> the issue with repeatedly applying the extended statistics, as discussed
> with Dean in this thread. It's a bit rough and not committable, but I
> need some feedback so I'm posting it in this state.

As it stands, it doesn't compile if 0003 is applied, because it missed
one of the callers of clauselist_selectivity_simple(), but that's
easily fixed.

> 0001 is the original patch improving estimates of OR clauses
>
> 0002 adds thin wrappers for clause[list]_selectivity, with "internal"
> functions allowing to specify whether to keep considering extended stats
>
> 0003 does the same for the "simple" functions
>
>
> I've kept it like this to demonstrate that 0002 is not sufficient. In my
> response from March 24 I wrote this:
>
> > Isn't it the case that clauselist_selectivity_simple (and the OR
> > variant) should ignore extended stats entirely? That is, we'd need
> > to add a flag (or _simple variant) to clause_selectivity, so that it
> > calls causelist_selectivity_simple_or.
> But that's actually wrong, as 0002 shows (as it breaks a couple of
> regression tests), because of the way we handle OR clauses. At the top
> level, an OR-clause is actually just a single clause and it may get
> passed to clauselist_selectivity_simple. So entirely disabling extended
> stats for the "simple" functions would also mean disabling extended
> stats for a large number of OR clauses. Which is clearly wrong.
>
> So 0003 addresses that, by adding a flag to the two "simple" functions.
> Ultimately, this should probably do the same thing as 0002 and add thin
> wrappers, because the existing functions are part of the public API.

I agree that, taken together, these patches fix the
multiple-extended-stats-evaluation issue. However:

I think this has ended up with too many variants of these functions,
since we now have "_internal" and "_simple" variants, and you're
proposing adding more. The original purpose of the "_simple" variants
was to compute selectivities without looking at extended stats, and
now the "_internal" variants compute selectivities with an additional
"use_extended_stats" flag to control whether or not to look at
extended stats. Thus they're basically the same, and could be rolled
together.

Additionally, it's worth noting that the "_simple" variants expose the
"estimatedclauses" bitmap as an argument, which IMO is a bit messy as
an API. All callers of the "_simple" functions outside of clausesel.c
actually pass in estimatedclauses=NULL, so it's possible to refactor
and get rid of that, turning estimatedclauses into a purely internal
variable.

Also, it's quite messy that clauselist_selectivity_simple_or() needs
to be passed a Selectivity input (the final argument) that is the
selectivity of any already-estimated clauses, or the value to return
if no not-already-estimated clauses are found, and must be 0.0 when
called from the extended stats code.

Attached is the kind of thing I had in mind (as a single patch, since
I don't think it's worth splitting up). This replaces the "_simple"
and "_internal" variants of these functions with "_opt_ext_stats"
variants whose signatures match the originals except for having the
single extra "use_extended_stats" boolean parameter. Additionally, the
"_simple" functions are merged into the originals (making them more
like they were in PG11) so that the "estimatedclauses" bitmap and
partial-OR-list Selectivity become internal details, no longer exposed
in the API.

Regards,
Dean

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_proc.dat "proargmodes is not a 1-D char array"
Next
From: 方徳輝
Date:
Subject: Is postgres ready for 2038?