Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes
Date
Msg-id 2f30ba9c-b707-55ae-0afd-68eb6a8a7708@enterprisedb.com
Whole thread Raw
In response to Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes  (David Geier <geidav.pg@gmail.com>)
Responses Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes
List pgsql-hackers

On 11/14/22 10:19, David Geier wrote:
> Hi Tom,
>> There won't *be* any MCV stats for a column that ANALYZE perceives to
>> be unique, so I'm not quite sure where the claimed savings comes from.
>
> We save if one join attribute is unique while the other isn't. In that
> case stored MCV stats are read for the non-unique attribute but then
> never used. This is because MCV stats in join selectivity estimation are
> only used if they're present on both columns
>

Right - if we only have MCV on one side of the join, we currently end up
loading the MCV we have only to not use it anyway. The uniqueness is a
simple way to detect some of those cases. I'd bet the savings can be
quite significant for small joins and/or cases with large MCV.

I wonder if we might be yet a bit smarter, though.

For example, assume the first attribute is not defined as "unique" but
we still don't have a MCV (it may be unique - or close to unique - in
practice, or maybe it's just uniform distribution). We end up with

  have_mcvs1 = false

Can't we just skip trying to load the second MCV? So we could do

    if (have_mcvs1 && HeapTupleIsValid(vardata2.statsTuple))
    { ... try loading mcv2 ... }

Or perhaps what if we have a function that quickly determines if the
attribute has MCV, without loading it? I'd bet the expensive part of
get_attstatslot() is the deconstruct_array().

We could have a function that only does the first small loop over slots,
and returns true/false if we have a slot of the requested stakind. It
might even check the isunique flag first, to make it more convenient.

And only if both sides return "true" we'd load the MCV, deconstruct the
array and all that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: allowing for control over SET ROLE
Next
From: Masahiko Sawada
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply