Re: Use zero for nullness estimates of system attributes - Mailing list pgsql-hackers

From Jim Finnerty
Subject Re: Use zero for nullness estimates of system attributes
Date
Msg-id 1548516385152-0.post@n3.nabble.com
Whole thread Raw
In response to Re: Use zero for nullness estimates of system attributes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Use zero for nullness estimates of system attributes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
It's related, but what I was referring to applies even to the uncorrelated
case: suppose you have something like:

select x, sum(z) 
from t
where
    x > 5 and y in ('a', 'b', 'c')
group by x;

let's say that 'a', 'b', and 'c' are not frequent values of y, so the
estimated selectivity is based on the n_distinct of y and the 3 values.  Now
imagine that x > 5 is applied first.  That reduces the number of qualifying
rows by the selectivity of (x > 5), but it may also reduce the number of
distinct values of y.  If it reduces the n_distinct of y, then the IN
predicate selectivity should be adjusted also.

The negative n_distinct representation (for n_distinct a large fraction of
numrows) already accounts for an automatic scaling of n_distinct when the
table size grows or shrinks.  Adjusting the n_distinct of y after applying
predicate (x > 5) is conceptually the same thing, except that the scaling is
not linear, but would be based on a (fairly cheap) probabilistic formula of
how many distinct y's can be expected to survive after applying predicate x,
rather than assuming that all of them do.

If you have correlation information for columns (x, y) you'd want to use
that information instead, but in the absence of that information you could
make this adjustment.

It might make a good intern project if somebody wants to investigate it.



-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Thread-unsafe coding in ecpg
Next
From: Tom Lane
Date:
Subject: Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance