On Tue, Nov 26, 2019 at 10:56:33AM +0000, Dean Rasheed wrote:
>On Sun, 24 Nov 2019 at 23:40, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> Hi,
>>
>> Attached are two patched, related to this bug report.
>>
>>
>> 0001 - Fix choose_best_statistics to check clauses individually
>> ---------------------------------------------------------------
>>
>> This modifies the choose_best_statistics function to properly check
>> which clauses are actually covered by each statistic object, and only
>> use attnums from those.
>>
>> The patch ended up pretty small, because we already have all the
>> necessary info (per-clause attnums) precalculated. Which means this
>> should not be much more expensive than before.
>>
>> The main drawback is that this does change signature of a function
>> defined in statistics.h - we have to pass more info (per-clause bitmaps
>> and info which clauses are already estimated). Which means ABI break.
>>
>> I'm not sure how likely it is that external code is calling this
>> function, but the probability is non-zero. So maybe even if the patch is
>> fairly small, in backbranches we should use the simple fix with just
>> returning if the list is NIL.
>>
>
>On a quick read-through that algorithm makes a lot more sense. It
>seems pretty unlikely that anyone would be using
>choose_best_statistics() anywhere else, so I think maybe it's fine to
>change that in back-branches.
>
>A couple of comments:
>
>1). I think you should pass estimatedClauses to
>choose_best_statistics() as a pure input parameter (i.e., remove one
>"*"), since it doesn't (and must not) modify that set.
Yeah, good point.
>In fact, on closer inspection, I don't think you need to pass it to
>choose_best_statistics() at all, since its callers already check
>clauses against estimatedClauses. Therefore, in
>choose_best_statistics(), incompatible and already-estimated clauses
>both appear the same (as NULL/empty attribute sets), and therefore the
>estimatedClauses check will never be tripped.
>
Right, but I'm thinking about the patch that allows applying multiple
statistics. With that applied, this changes to a while loop - and we'll
either have to rebuild the list_attnums or pass the bitmap.
>2). The new parameter "clauses" should probably be called something
>like "clause_attnums" or some such, to better reflect what it actually
>is. And it should be documented that NULL values represent
>incompatible/already-estimated clauses.
>
Yes, agreed.
>3). In statext_mcv_clauselist_selectivity(), the check for at least 2
>attributes is no longer needed, because choose_best_statistics() now
>does that, so there's also no need to compute clauses_attnums there.
>
Good point. I'll replace that with an assert.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services