On Thu, Mar 19, 2020 at 07:53:39PM +0000, Dean Rasheed wrote:
>On Wed, 18 Mar 2020 at 00:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> OK, I took a look. I think from the correctness POV the patch is OK, but
>> I think the dependencies_clauselist_selectivity() function now got a bit
>> too complex. I've been able to parse it now, but I'm sure I'll have
>> trouble in the future :-(
>>
>> Can we refactor / split it somehow and move bits of the logic to smaller
>> functions, or something like that?
>>
>
>Yeah, it has gotten a bit long. It's somewhat tricky splitting it up,
>because of the number of shared variables used throughout the
>function, but here's an updated patch splitting it into what seemed
>like the 2 most logical pieces. The first piece (still in
>dependencies_clauselist_selectivity()) works out what dependencies
>can/should be applied, and the second piece in a new function does the
>actual work of applying the list of functional dependencies to the
>clause list.
>
>I think that has made it easier to follow, and it has also reduced the
>complexity of the final "no applicable stats" branch.
>
Seems OK to me.
I'd perhaps name deps_clauselist_selectivity differently, it's a bit too
similar to dependencies_clauselist_selectivity. Perhaps something like
clauselist_apply_dependencies? But that's a minor detail.
>> Another thing I'd like to suggest is keeping the "old" formula, and
>> instead of just replacing it with
>>
>> P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)
>>
>> but explaining how the old formula may produce nonsensical selectivity,
>> and how the new formula addresses that issue.
>>
>
>I think this is purely a comment issue? I've added some more extensive
>comments attempting to justify the formulae.
>
Yes, it was purely a comment issue. Seems fine now.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services