Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From David Rowley
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id CAKJS1f9jKkjs5NB3MwOMYsGzWCX6ptsOX2NDHMhJ7oJfbfxEqg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
 Thanks for making those changes.

On Fri, 18 Jan 2019 at 10:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> A couple of items remains:
>
> > 15. I see you broke out the remainder of the code from
> > clauselist_selectivity() into clauselist_selectivity_simple().  The
> > comment looks like just a copy and paste from the original.  That
> > seems like quite a bit of duplication. Is it better to maybe trim down
> > the original one?
>
> I don't follow - where do you see the code duplication? Essentially, we
> have clauselist_selectivity and clauselist_selectivity_simple, but the
> first one calls the second one. The "simple" version is needed because
> in some cases we need to perform estimation without multivariate stats
> (e.g. to prevent infinite loop due to recursion).

It was the comment duplication that I was complaining about. I think
clauselist_selectivity()'s comment can be simplified to mention it
attempts to apply extended statistics and applies
clauselist_selectivity_simple on any stats that remain. Plus any
details that are specific to extended statistics.  That way if anyone
wants further detail on what happens to the remaining clauses they can
look at the comment above clauselist_selectivity_simple().

> > 18. In dependencies_clauselist_selectivity() there seem to be a new
> > bug introduced.  We do:
> >
> > /* mark this one as done, so we don't touch it again. */
> > *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
> >
> > but the bms_is_member() check that skipped these has been removed.
> >
> > It might be easier to document if we just always do:
> >
> >  if (bms_is_member(listidx, *estimatedclauses))
> > continue;
> >
> > at the start of both loops. list_attnums can just be left unset for
> > the originally already estimatedclauses.
>
> This was already discussed - I don't think there's any bug, but I'll
> look into refactoring the code somehow to make it clear.

On looking at this a bit more it seems that since the estimated attr
is removed from the clauses_attnums Bitmapset that
find_strongest_dependency() will no longer find a dependency for that
clause and dependency_implies_attribute() will just return false where
the bms_is_member(listidx, *estimatedclauses) would have done
previously. I'll mean we could get more calls of
dependency_implies_attribute(), but that function is even cheaper than
bms_is_member() so I guess there's no harm in this change.

> > 25. Does statext_is_compatible_clause_internal)_ need to skip over
> > RelabelTypes?
>
> I don't think it should, because while RelabelType nodes represent casts
> to binary-compatible types, there's no guarantee the semantics actually
> is compatible.

The code that looks through RelabelTypes for normal stats is in
examine_variable(). This code allows the following to estimate 4 rows.
I guess if we didn't use that then we'd just need to treat it like
some unknown expression and use DEFAULT_NUM_DISTINCT.

create table a (t varchar);
insert into a select v.v from (values('One'),('Two'),('Three')) as
v(v), generate_Series(1,4);
analyze a;
explain (summary off, timing off, analyze) select * from a where t = 'One';
                               QUERY PLAN
-------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1.15 rows=4 width=4) (actual rows=4 loops=1)
   Filter: ((t)::text = 'One'::text)
   Rows Removed by Filter: 8
(3 rows)

Why do you think its okay for the normal stats to look through
RelabelTypes but not the new code you're adding?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Hubert Zhang
Date:
Subject: Re: Control your disk usage in PG: Introduction to Disk Quota Extension
Next
From: Amit Langote
Date:
Subject: Re: problems with foreign keys on partitioned tables