Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes
Date
Msg-id 53950f93-8cce-4440-52f1-5879653c6803@enterprisedb.com
Whole thread Raw
In response to Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes  (David Geier <geidav.pg@gmail.com>)
Responses Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes
Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes
List pgsql-hackers
On 11/18/22 09:54, David Geier wrote:
> Thanks everyone for the great feedback and suggestions.
> 
>>
>>> Yes, it is.  Using zero flag would short-cut get_attstatsslot() to just
>>> return whether the slot type exists without loading it.  Do you think we
>>> need to emphasize this use case in the comments for 'flags'?
>> Perhaps, it's not really obvious now.
> 
> Comment added.
> 
> 
>> I wonder whether we need to also check statistic_proc_security_check()
>>> when determining if MCVs exists in both sides.
>> Yeah, I thought about hoisting the statistic_proc_security_check
>> tests up into get_mcv_stats.  I don't think it's a great idea
>> though.  Again, it'd complicate untangling this if we ever
>> generalize the use of MCVs in this function.  Also, I don't
>> think we should be micro-optimizing the case where the security
>> check doesn't pass --- if it doesn't, you're going to be hurting
>> from bad plans a lot more than you are from some wasted cycles
>> here.
> 
> Sounds reasonable.
> 
> Attached is v2 of the patch.
> This is basically Tom's version plus a comment for the flags of
> get_attstatslot() as suggested by Richard.
> 

Seems fine. I wonder if we could/could introduce a new constant for 0,
similar to ATTSTATSSLOT_NUMBERS/ATTSTATSSLOT_VALUES, instead of using a
magic constant. Say, ATTSTATSSLOT_NONE or ATTSTATSSLOT_CHECK.

> I couldn't come up with any reasonable way of writing an automated test
> for that.
> Any ideas?
> 

I don't think you can write a test for this, because there is no change
to behavior that can be observed by the user. If one side has no MCV,
the only difference is whether we try to load the other MCV or not.
There's no impact on estimates, because we won't use it.

IMO the best thing we can do is check coverage, that the new code is
exercised in regression tests. And I think that's fine.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: David Geier
Date:
Subject: Re: CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1
Next
From: Masahiko Sawada
Date:
Subject: Re: Add index scan progress to pg_stat_progress_vacuum