On Tue, Mar 17, 2020 at 06:05:17PM +0100, Tomas Vondra wrote:
>On Tue, Mar 17, 2020 at 04:14:26PM +0000, Dean Rasheed wrote:
>>On Tue, 17 Mar 2020 at 15:37, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>>
>>>On Tue, Mar 17, 2020 at 12:42:52PM +0000, Dean Rasheed wrote:
>>>
>>>>The other thing that I'm still concerned about is the possibility of
>>>>returning estimates with P(a,b) > P(a) or P(b). I think that such a
>>>>thing becomes much more likely with the new types of clause supported
>>>>here, because they now allow multiple values from each column, where
>>>>before we only allowed one. I took another look at the patch I posted
>>>>on the other thread, and I've convinced myself that it's correct.
>>>>Attached is an updated version, with some cosmetic tidying up and now
>>>>with some additional regression tests.
>>>
>>>Yeah, I agree that's something we need to fix. Do you plan to push the
>>>fix, or do you want me to do it?
>>>
>>
>>I can push it. Have you had a chance to review it?
>>
>
>Not yet, I'll take a look today.
>
OK, I took a look. I think from the correctness POV the patch is OK, but
I think the dependencies_clauselist_selectivity() function now got a bit
too complex. I've been able to parse it now, but I'm sure I'll have
trouble in the future :-(
Can we refactor / split it somehow and move bits of the logic to smaller
functions, or something like that?
Another thing I'd like to suggest is keeping the "old" formula, and
instead of just replacing it with
P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)
but explaining how the old formula may produce nonsensical selectivity,
and how the new formula addresses that issue.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services