Home > mailing lists

Re: Selectivity estimation for inet operators - Mailing list pgsql-hackers

From	Emre Hasegeli
Subject	Re: Selectivity estimation for inet operators
Date	August 31, 2014 17:59:42
Msg-id	20140831175918.GB8990@hasegeli-2.local Whole thread Raw
In response to	Re: Selectivity estimation for inet operators (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > * inet_mcv_join_selec() is O(n^2) where n is the number of entries in
> > the MCV lists. With the max statistics target of 10000, a worst case
> > query on my laptop took about 15 seconds to plan. Maybe that's
> > acceptable, but you went through some trouble to make planning of MCV vs
> > histogram faster, by the log2 method to compare only some values, so I
> > wonder why you didn't do the same for the MCV vs MCV case?
>
> Actually, what I think needs to be asked is the opposite question: why is
> the other code ignoring some of the statistical data?  If the user asked
> us to collect a lot of stats detail it seems reasonable that he's
> expecting us to use it to get more accurate estimates.  It's for sure
> not obvious why these estimators should take shortcuts that are not being
> taken in the much-longer-established code for scalar comparison estimates.

It will still use more statistical data, when statistics_target is
higher.  It was not sure that the user wants to spent O(n^2) amount
of time based on statistics_target.  Attached version is without
this optimization.  Estimates are better without it, but planning
takes more time.

> I'm not exactly convinced that the math adds up in this logic, either.
> The way in which it combines results from looking at the MCV lists and
> at the histograms seems pretty arbitrary.

I taught the product of the join will be

    (left_mcv + left_histogram) * (right_mcv + right_histogram) * selectivity

and tried to calculate it as in the following:

    (left_mcv * right_mcv * selectivity) +
    (right_mcv * left_histogram * selectivity) +
    (left_mcv * right_histogram * selectivity) +
    (left_histogram * right_histogram * selectivity)

where left_histogram is

    1.0 - left_nullfrac - left_mcv

I fixed calculation for the MCV vs histogram part.  The estimates of
inner join are very close to the actual rows with statistics_target = 1000.
I think the calculation should be right.

Attachment

inet-selfuncs-v10.patch

pgsql-hackers by date:

From: Emre Hasegeli
Date: 31 August 2014, 17:57:37
Subject: Re: Selectivity estimation for inet operators

From: Tom Lane
Date: 31 August 2014, 18:00:46
Subject: Re: Built-in binning functions

Re: Selectivity estimation for inet operators - Mailing list pgsql-hackers

Attachment

Previous

Next