Re: Using multiple extended statistics for estimates - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Using multiple extended statistics for estimates
Date
Msg-id 20191114200420.c3x2jf7ywzs23uzo@development
Whole thread Raw
In response to Re: Using multiple extended statistics for estimates  (Mark Dilger <hornschnorter@gmail.com>)
Responses Re: Using multiple extended statistics for estimates
Re: Using multiple extended statistics for estimates
Re: Using multiple extended statistics for estimates
List pgsql-hackers
On Thu, Nov 14, 2019 at 10:23:44AM -0800, Mark Dilger wrote:
>
>
>On 11/14/19 7:55 AM, Tomas Vondra wrote:
>>On Wed, Nov 13, 2019 at 10:04:36AM -0800, Mark Dilger wrote:
>>>
>>>
>>>On 11/13/19 7:28 AM, Tomas Vondra wrote:
>>>>Hi,
>>>>
>>>>here's an updated patch, with some minor tweaks based on the review and
>>>>added tests (I ended up reworking those a bit, to make them more like
>>>>the existing ones).
>>>
>>>Thanks, Tomas, for the new patch set!
>>>
>>>Attached are my review comments so far, in the form of a patch 
>>>applied on top of yours.
>>>
>>
>>Thanks.
>>
>>1) It's not clear to me why adding 'const' to the List parameters would
>>   be useful? Can you explain?
>
>When I first started reviewing the functions, I didn't know if those 
>lists were intended to be modified by the function.  Adding 'const' 
>helps document that the function does not intend to change them.
>

Hmmm, ok. I'll think about it, but we're not really using const* in this
way very much I think - at least not in the surrounding code.

>>2) I think you're right we can change find_strongest_dependency to do
>>
>>    /* also skip weaker dependencies when attribute count matches */
>>    if (strongest->nattributes == dependency->nattributes &&
>>        strongest->degree >= dependency->degree)
>>        continue;
>>
>>   That'll skip some additional dependencies, which seems OK.
>>
>>3) It's not clear to me what you mean by
>>
>>     * TODO: Improve this code comment.  Specifically, why would we
>>     * ignore that no rows will match?  It seems that such a discovery
>>     * would allow us to return an estimate of 0 rows, and that would
>>     * be useful.
>>
>>   added to dependencies_clauselist_selectivity. Are you saying we
>>   should also compute selectivity estimates for individual clauses and
>>   use Min() as a limit? Maybe, but that seems unrelated to the patch.
>
>I mean that the comment right above that TODO is hard to understand. 
>You seem to be saying that it is good and proper to only take the 
>selectivity estimate from the final clause in the list, but then go on 
>to say that other clauses might prove that no rows will match.  So 
>that implies that by ignoring all but the last clause, we're ignoring 
>such other clauses that prove no rows can match.  But why would we be 
>ignoring those?
>
>I am not arguing that your code is wrong.  I'm just critiquing the 
>hard-to-understand phrasing of that code comment.
>

Aha, I think I understand now - thanks for the explanation. You're right
the comment is trying to explain why just taking the last clause for a
given attnum is fine. I'll try to make the comment clearer.

For the case with equal Const values that should be mostly obvious, i.e.
"a=1 AND a=1 AND a=1" has the same selectivity as "a=1".

The case with different Const values is harder, unfortunately. It might
seem obvious that "a=1 AND a=2" means there are no matching rows, but
that heavily relies on the semantics of the equality operator. And we
can't simply compare the Const values either, I'm afraid, because there
are cases with cross-type operators like

  a = 1::int AND a = 1.0::numeric

where the Consts are of different type, yet both conditions can be true.

So it would be pretty tricky to do this, and the current code does not
even try to do that.

Instead, it just assumes that it's mostly fine to overestimate, because
then at runtime we'll simply end up with 0 rows here.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: format of pg_upgrade loadable_libraries warning
Next
From: Tom Lane
Date:
Subject: Re: Using multiple extended statistics for estimates