Re: Using multiple extended statistics for estimates - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Using multiple extended statistics for estimates
Date
Msg-id e41167d8-53cd-0f18-9301-85fd9a70d8d8@gmail.com
Whole thread Raw
In response to Re: Using multiple extended statistics for estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Using multiple extended statistics for estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers

On 11/14/19 7:55 AM, Tomas Vondra wrote:
> On Wed, Nov 13, 2019 at 10:04:36AM -0800, Mark Dilger wrote:
>>
>>
>> On 11/13/19 7:28 AM, Tomas Vondra wrote:
>>> Hi,
>>>
>>> here's an updated patch, with some minor tweaks based on the review and
>>> added tests (I ended up reworking those a bit, to make them more like
>>> the existing ones).
>>
>> Thanks, Tomas, for the new patch set!
>>
>> Attached are my review comments so far, in the form of a patch applied 
>> on top of yours.
>>
> 
> Thanks.
> 
> 1) It's not clear to me why adding 'const' to the List parameters would
>    be useful? Can you explain?

When I first started reviewing the functions, I didn't know if those 
lists were intended to be modified by the function.  Adding 'const' 
helps document that the function does not intend to change them.

> 2) I think you're right we can change find_strongest_dependency to do
> 
>     /* also skip weaker dependencies when attribute count matches */
>     if (strongest->nattributes == dependency->nattributes &&
>         strongest->degree >= dependency->degree)
>         continue;
> 
>    That'll skip some additional dependencies, which seems OK.
> 
> 3) It's not clear to me what you mean by
> 
>      * TODO: Improve this code comment.  Specifically, why would we
>      * ignore that no rows will match?  It seems that such a discovery
>      * would allow us to return an estimate of 0 rows, and that would
>      * be useful.
> 
>    added to dependencies_clauselist_selectivity. Are you saying we
>    should also compute selectivity estimates for individual clauses and
>    use Min() as a limit? Maybe, but that seems unrelated to the patch.

I mean that the comment right above that TODO is hard to understand. You 
seem to be saying that it is good and proper to only take the 
selectivity estimate from the final clause in the list, but then go on 
to say that other clauses might prove that no rows will match.  So that 
implies that by ignoring all but the last clause, we're ignoring such 
other clauses that prove no rows can match.  But why would we be 
ignoring those?

I am not arguing that your code is wrong.  I'm just critiquing the 
hard-to-understand phrasing of that code comment.

-- 
Mark Dilger



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Proposal: Add more compile-time asserts to exposeinconsistencies.
Next
From: Tom Lane
Date:
Subject: Re: SKIP_LOCKED test causes random buildfarm failures