Re: Bad plan after vacuum analyze - Mailing list pgsql-performance

From Tom Lane
Subject Re: Bad plan after vacuum analyze
Date
Msg-id 17017.1115843555@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bad plan after vacuum analyze  (Guillaume Smet <guillaume_ml@smet.org>)
Responses Re: Bad plan after vacuum analyze
List pgsql-performance
Ah-ha, I can replicate the problem.  This example uses tenk1 from the
regression database, which has a column unique2 containing just the
integers 0..9999.

regression=# create table t1(f1 int);
CREATE TABLE
regression=# insert into t1 values(5);
INSERT 154632 1
regression=# insert into t1 values(7);
INSERT 154633 1
regression=# analyze t1;
ANALYZE
regression=# explain analyze select * from tenk1 right join t1 on (unique2=f1);
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Merge Right Join  (cost=1.03..1.37 rows=2 width=248) (actual time=0.507..0.617 rows=2 loops=1)
   Merge Cond: ("outer".unique2 = "inner".f1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..498.24 rows=10024 width=244) (actual time=0.126..0.242
rows=9loops=1) 
   ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.145..0.153 rows=2 loops=1)
         Sort Key: t1.f1
         ->  Seq Scan on t1  (cost=0.00..1.02 rows=2 width=4) (actual time=0.029..0.049 rows=2 loops=1)
 Total runtime: 1.497 ms
(7 rows)

The planner correctly perceives that only a small part of the unique2
index will need to be scanned, and hence thinks the merge is cheap ---
much cheaper than if the whole index had to be scanned.  And it is.
Notice that only 9 rows were actually pulled from the index.  Once
we got to unique2 = 8, nodeMergejoin.c could see that no more matches
to f1 were possible.

But watch this:

regression=# insert into t1 values(null);
INSERT 154634 1
regression=# explain analyze select * from tenk1 right join t1 on (unique2=f1);
                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
 Merge Right Join  (cost=1.03..1.37 rows=2 width=248) (actual time=0.560..290.874 rows=3 loops=1)
   Merge Cond: ("outer".unique2 = "inner".f1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..498.24 rows=10024 width=244) (actual time=0.139..106.982
rows=10000loops=1) 
   ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.181..0.194 rows=3 loops=1)
         Sort Key: t1.f1
         ->  Seq Scan on t1  (cost=0.00..1.02 rows=2 width=4) (actual time=0.032..0.067 rows=3 loops=1)
 Total runtime: 291.670 ms
(7 rows)

See what happened to the actual costs of the indexscan?  All of a sudden
we had to scan the whole index because there was a null in the other
input, and nulls sort high.

I wonder if it is worth fixing nodeMergejoin.c to not even try to match
nulls to the other input.  We'd have to add a check to see if the join
operator is strict or not, but it nearly always will be.

The alternative would be to make the planner only believe in the
short-circuit path occuring if it thinks that the other input is
entirely non-null ... but this seems pretty fragile, since it only
takes one null to mess things up, and ANALYZE can hardly be counted
on to detect one null in a table.

In the meantime it seems like the quickest answer for Guillaume might
be to try to avoid keeping any NULLs in parent_application_id.

            regards, tom lane

pgsql-performance by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Partitioning / Clustering
Next
From: Guillaume Smet
Date:
Subject: Re: Bad plan after vacuum analyze