[PERFORM] join estimate of subqueries with range conditions and constraintexclusion - Mailing list pgsql-performance

From Justin Pryzby
Subject [PERFORM] join estimate of subqueries with range conditions and constraintexclusion
Date
Msg-id 20170524211730.GM31097@telsasoft.com
Whole thread Raw
In response to [PERFORM] self join estimate and constraint exclusion  (Justin Pryzby <pryzby@telsasoft.com>)
Responses [PERFORM] Re: join estimate of subqueries with range conditions and constraintexclusion  (Justin Pryzby <pryzby@telsasoft.com>)
Re: [PERFORM] join estimate of subqueries with range conditions andconstraint exclusion  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-performance
We got bitten again by what appears to be the same issue I reported (perhaps
poorly) here:
https://www.postgresql.org/message-id/20170326193344.GS31628%40telsasoft.com

We have PG9.6.3 table heirarchies partitioned by time.  Our reports use
subqueries each with their own copies of a range clauses on time column, as
needed to get constraint exclusion reference:
https://www.postgresql.org/message-id/25076.1366321335%40sss.pgh.pa.us

    SELECT * FROM
    (SELECT * FROM t WHERE col>const) a JOIN
    (SELECT * FROM t WHERE col>const) b USING (col)

I'm diagnosing a bad estimate/plan due to excessively high n_distinct leading
to underestimated rowcount when selecting from a small fraction of the table
heirarchy.  This leads intermittently to bad things, specifically a cascade of
misestimates and associated nested loops around millions of rows.

Artificial/generated/contrived test case, involving table with 99 instances
each of 99 values:

postgres=# CREATE TABLE t(i INT);
postgres=# TRUNCATE t;INSERT INTO t SELECT i FROM generate_series(1,99) i,generate_series(1,99);ANALYZE t;
postgres=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, n_distinct,
array_length(most_common_vals,1)n_mcv, array_length(histogram_bounds,1) n_hist, (SELECT MAX(x) FROM
unnest(most_common_vals::text::text[])x) maxmcv, (histogram_bounds::text::text[])[array_length(histogram_bounds,1)]
maxhistFROM pg_stats WHERE attname~'i' AND tablename='t' GROUP BY 1,2,3,4,5,6,7,8 ORDER BY 1 DESC; 
-[ RECORD 1 ]--
frac_mcv   | 1
tablename  | t
attname    | i
n_distinct | 99
n_mcv      | 99
n_hist     |
maxmcv     | 99
maxhist    |

range query (which could use constraint exclusion), but bad estimate:
postgres=# explain ANALYZE SELECT * FROM (SELECT * FROM t WHERE i<2) AS a JOIN (SELECT * FROM t WHERE i<2) AS b USING
(i);
 Merge Join  (cost=339.59..341.57 rows=99 width=4) (actual time=8.272..16.892 rows=9801 loops=1)

range query which could NOT use constraint exclusion, good estimate:
postgres=# explain ANALYZE SELECT * FROM (SELECT * FROM t) AS a JOIN (SELECT * FROM t) AS b USING (i) WHERE i<2;
 Hash Join  (cost=264.52..541.54 rows=9801 width=4) (actual time=12.688..22.325 rows=9801 loops=1)

non-range query, good estimate:
postgres=# explain ANALYZE SELECT * FROM (SELECT * FROM t WHERE i=3) AS a JOIN (SELECT * FROM t WHERE i=3) AS b USING
(i);
 Nested Loop  (cost=0.00..455.78 rows=9801 width=4) (actual time=0.482..15.820 rows=9801 loops=1)

My understanding:
Postgres estimates join selectivity using number of distinct values of
underlying.  For the subqueries "a" and "b", the estimate is same as for
underlying table "t", even when selecting only a small fraction of the table...
This is adt/selfuncs:eqjoinsel_inner().

Note, in my tests, report queries on the child table have correct estimates;
and, queries with only "push down" WHERE clause outside the subquery have
correct estimate (but not constraint exclusion), apparently due to
calc_joinrel_size_estimate() returning the size of the parent table, planning
an join without restriction clause, following by filtering the join result, at
which point I guess the MCV list becomes useful and estimate is perfect..

    SELECT * FROM
    (SELECT * FROM t)a JOIN(SELECT * FROM t)b
    USING (col) WHERE col>const

So my original question is basically still opened ... is it possible to get
both good estimates/plans AND constraint exclusion ??

Thanks
Justin


pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PERFORM] Query is running very slow......
Next
From: Daulat Ram
Date:
Subject: [PERFORM] Query is very much slow