Re: Adjust ndistinct for eqjoinsel - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Adjust ndistinct for eqjoinsel
Date
Msg-id 3394045.1657900601@sss.pgh.pa.us
Whole thread Raw
In response to Adjust ndistinct for eqjoinsel  (Zhenghua Lyu <zlyu@vmware.com>)
List pgsql-hackers
Zhenghua Lyu <zlyu@vmware.com> writes:
>     I run TPC-DS benchmark for Postgres and find the join size estimation has several problems.
>     For example, Ndistinct is key to join selectivity's estimation, this value does not take restrictions
>     of the rel, I hit some cases in the function eqjoinsel, nd is much larger than vardata.rel->rows.

>     Accurate estimation need good math model that considering dependency of join var and vars in restriction.
>     But at least, indistinct should not be greater than the number of rows.

>     See the attached patch to adjust nd in eqjoinsel.

We're very unlikely to accept this with no test case and no explanation
of why it's not an overcorrection.  get_variable_numdistinct already
clamps its result to rel->tuples, and I think that by using rel->rows
instead you are probably double-counting the selectivity of the rel's
restriction clauses.

See the sad history of commit 7f3eba30c, which did something
pretty close to this and eventually got almost entirely reverted
(97930cf57, 0d3b231ee).  I'd be the first to agree that better
estimates here would be great, but it's not as simple as it looks.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Add function to return backup_label and tablespace_map
Next
From: Melanie Plageman
Date:
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)