Re: Incorrect choice of Nested Loop for a skewed distribution - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: Incorrect choice of Nested Loop for a skewed distribution
Date
Msg-id 20190905001832.GJ7201@telsasoft.com
Whole thread Raw
In response to Incorrect choice of Nested Loop for a skewed distribution  (Oleg Kharin <ok@uvadrev.ru>)
Responses Re: Incorrect choice of Nested Loop for a skewed distribution  (Oleg Kharin <ok@uvadrev.ru>)
List pgsql-performance
On Tue, Sep 03, 2019 at 09:47:42PM +0400, Oleg Kharin wrote:
> Hi All,
> 
> With PostgreSQL 10 and 11, the planner doesn't use the lists of most common
> values to determine the selectivity of "=" for Nested Loop as it does for a
> normal inner join in eqjoinsel_inner(). Incorrect choice of a nested loops
> join strategy causes poor query performance.
> To demonstrate it one can make the following test case:
...
> The columns f and g of both tables t and u have a skewed distribution: 10010
> values of 0 and 10010 unique values starting from 1.
> Let's see query plan for the join of t and u:
> 
>   explain analyze select * from t,u where t.f=u.f and t.g=u.g;
>  Nested Loop  (cost=0.29..7629.95 rows=25055030 width=16) (actual time=0.042..22540.629 rows=20020 loops=1)
>    ->  Seq Scan on u  (cost=0.00..289.20 rows=20020 width=8) (actual time=0.011..3.025 rows=20020 loops=1)
>    ->  Index Scan using t_f on t  (cost=0.29..0.36 rows=1 width=8) (actual time=0.565..1.125 rows=1 loops=20020)
>          Index Cond: (f = u.f)
>          Filter: (u.g = g)
>          Rows Removed by Filter: 5004
>  Planning Time: 0.394 ms
>  Execution Time: 22542.639 ms
> 
> After dropping the index
>   drop index t_f;
> we obtain much better query plan (without Nested Loop):
>  Merge Join  (cost=3439.09..442052.26 rows=25055030 width=16) (actual time=15.708..32.735 rows=20020 loops=1)
>    Merge Cond: ((t.f = u.f) AND (t.g = u.g))
>    ->  Sort  (cost=1719.54..1769.59 rows=20020 width=8) (actual time=8.189..10.189 rows=20020 loops=1)
>          Sort Key: t.f, t.g
>          Sort Method: quicksort  Memory: 1707kB
>          ->  Seq Scan on t  (cost=0.00..289.20 rows=20020 width=8) (actual time=0.012..2.958 rows=20020 loops=1)
>    ->  Sort  (cost=1719.54..1769.59 rows=20020 width=8) (actual time=7.510..9.459 rows=20020 loops=1)
>          Sort Key: u.f, u.g
>          Sort Method: quicksort  Memory: 1707kB
>          ->  Seq Scan on u  (cost=0.00..289.20 rows=20020 width=8) (actual time=0.008..2.748 rows=20020 loops=1)
>  Planning Time: 0.239 ms
>  Execution Time: 34.585 ms

>  Nested Loop  (cost=0.29..     7629.95 rows=25055030 width=16) (actual...
>  Merge Join   (cost=3439.09..442052.26 rows=25055030 width=16) (actual...

When you dropped the index, you necessarily refused the plan involving index scan.
So it did merge join instead (which it thinks of as expensive because it has to
sort both sides).

As you saw, the rowcount estimate of the join result is badly off.  But that's
true of both plans.

Choice of join type is affected by the size of its *inputs*, but doesn't and
shouldn't have any effect on the result rowcount (or its) estimate.  The
rowcount *output* of the join would only affect any *following* plan nodes (of
which there are none in this case).

You suggested using the MCV list, but I don't think that's possible, since the
nested loop is evaluating its "inner" table multiple times, in a "loop":

>    ->  Index Scan using t_f on t  (cost=0.29..0.36 rows=1 width=8) (actual time=0.565..1.125 rows=1 loops=20020)

Hypothetically, one could plan the query 20k times, for each value of u.f and
u.g, but that's tantamount to actually executing the query, which is why it
uses (I gather) var_eq_non_const.

If you enable BUFFERS, you can see:

postgres=# explain (analyze on,buffers) select * from t,u WHERE t.g=u.g AND t.f=u.f ;
 Nested Loop  (cost=0.29..7629.95 rows=25055030 width=16) (actual time=0.031..22634.482 rows=20020 loops=1)
   Buffers: shared hit=770913
   ->  Seq Scan on t  (cost=0.00..289.20 rows=20020 width=8) (actual time=0.011..22.883 rows=20020 loops=1)
         Buffers: shared hit=89
   ->  Index Scan using u_f_idx on u  (cost=0.29..0.36 rows=1 width=8) (actual time=0.596..1.125 rows=1 loops=20020)
         ...
         Buffers: shared hit=770824

vs.

postgres=# SET enable_nestloop=off;
postgres=# explain (analyze on,buffers) select * from t,u WHERE t.g=u.g AND t.f=u.f ;
 Merge Join  (cost=3439.09..442052.26 rows=25055030 width=16) (actual time=74.262..187.454 rows=20020 loops=1)
   Merge Cond: ((t.g = u.g) AND (t.f = u.f))
   Buffers: shared hit=178
   ...

So the nest loop plan is hitting 770k buffer pages (5GB) while looping 20k
times, rather than 178 buffers when each page is read once (to sort).

Perhaps you could get a good plan by playing with these, but it's unclear why
that's necessary.

 effective_cache_size         | 1280
 cpu_index_tuple_cost         | 0.005
 cpu_operator_cost            | 0.0025
 cpu_tuple_cost               | 0.01
 random_page_cost             | 4
 seq_page_cost                | 1

Also, since PG96, FKs can improve join estimates:

postgres=# CREATE UNIQUE INDEX ON u(f,g);
postgres=# CREATE UNIQUE INDEX ON t(f,g);
postgres=# ALTER TABLE t ADD CONSTRAINT fk_f FOREIGN KEY (f,g) REFERENCES u(f,g);
postgres=# ALTER TABLE u ADD CONSTRAINT fk_u FOREIGN KEY (f,g) REFERENCES t(f,g);
postgres=# explain analyze select * from t,u WHERE t.g=u.g AND t.f=u.f ;
 Hash Join  (cost=589.50..999.14 rows=20020 width=16) (actual time=29.054..69.296 rows=20020 loops=1)
   Hash Cond: ((t.g = u.g) AND (t.f = u.f))
   ->  Seq Scan on t  (cost=0.00..289.20 rows=20020 width=8) (actual time=0.016..11.331 rows=20020 loops=1)
   ->  Hash  (cost=289.20..289.20 rows=20020 width=8) (actual time=28.980..28.981 rows=20020 loops=1)
         Buckets: 32768  Batches: 1  Memory Usage: 1039kB
         ->  Seq Scan on u  (cost=0.00..289.20 rows=20020 width=8) (actual time=0.010..12.730 rows=20020 loops=1)

Justin



pgsql-performance by date:

Previous
From: Fredrik Blomqvist
Date:
Subject: Re: Upsert performance considerations (~1 mil/hour)
Next
From: Oleg Kharin
Date:
Subject: Re: Incorrect choice of Nested Loop for a skewed distribution