Re: [HACKERS] <> join selectivity estimate question - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] <> join selectivity estimate question
Date
Msg-id 14543.1489769980@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] <> join selectivity estimate question  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] <> join selectivity estimate question  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> The relevant code is in neqsel().  It estimates the fraction of rows
> that will be equal, and then does 1 - that number.  Evidently, the
> query planner thinks that l1.l_suppkey = l2.l_suppkey would almost
> always be true, and therefore l1.l_suppkey <> l2.l_suppkey will almost
> always be false.  I think the presumed selectivity of l1.l_suppkey =
> l2.l_suppkey is being computed by var_eq_non_const(), but I'm a little
> puzzled by that function is managing to produce a selectivity estimate
> of, essentially, 1.

No, I believe it's going through neqjoinsel and thence to eqjoinsel_semi.
This query will have been flattened into a semijoin.

I can reproduce a similarly bad estimate in the regression database:

regression=# explain select * from tenk1 a where exists(select * from tenk1 b where a.thousand = b.thousand and
a.twothousand<> b.twothousand);                              QUERY PLAN                                 
-------------------------------------------------------------------------Hash Semi Join  (cost=583.00..1067.25 rows=1
width=244) Hash Cond: (a.thousand = b.thousand)  Join Filter: (a.twothousand <> b.twothousand)  ->  Seq Scan on tenk1 a
(cost=0.00..458.00 rows=10000 width=244)  ->  Hash  (cost=458.00..458.00 rows=10000 width=8)        ->  Seq Scan on
tenk1b  (cost=0.00..458.00 rows=10000 width=8) 
(6 rows)

The problem here appears to be that we don't have any MCV list for
the "twothousand" column (because it has a perfectly flat distribution),
and the heuristic that eqjoinsel_semi is using for the no-MCVs case
is falling down badly.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands:\quit_if, \quit_unless)
Next
From: Michael Banck
Date:
Subject: Re: [HACKERS] [patch] reorder tablespaces in basebackup tar streamfor backup_label