pg 9.3 exists subselect with limit brakes query plan - Mailing list pgsql-bugs

From Kószó József
Subject pg 9.3 exists subselect with limit brakes query plan
Date
Msg-id 53279529.2070902@freemail.hu
Whole thread Raw
Responses Re: pg 9.3 exists subselect with limit brakes query plan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hello,

We plan to upgrade from PostgreSQL 8.3 to 9.3 and experience some
planner-related problems in our applications with subselects using
EXISTS and LIMIT keywords. We have two similar environment with these
postgres versions, in 8.3 does not matter, if the subselect using LIMIT
or not, but in 9.3 the LIMIT keyword causes odd behavior, enforces seq
scans. In 8.3 always get similar plans to both case, but in 9.3 the
LIMIT keyword slows down the query. Additionally without index to
subselect id field the plan and running time worse with one magnitude in
9.3 than 8.3.

The two test case:
  SELECT ... WHERE EXISTS (SELECT ...);
  SELECT ... WHERE EXISTS (SELECT ... LIMIT 1);

We try to make some application-independent, reproducible tests to show
the difference between versions:


===== Initializations =====


# createdb limit1test -U postgres

# pgbench -i -s 10 limit1test -U postgres

limit1test=# select count(*) from pgbench_branches; count
-------    10

limit1test=# select count(*) from pgbench_accounts;  count
--------- 1000000

limit1test=# select B.bid, count(*) from pgbench_branches B join
pgbench_accounts using (bid) group by 1 order by 1; bid | count
-----+--------   1 | 100000   2 | 100000   3 | 100000   4 | 100000   5 | 100000   6 | 100000   7 | 100000   8 | 100000
9 | 100000  10 | 100000 



===== PostgreSQL 9.3 =====


# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid);
--------------------------------------------------------------------- Nested Loop Semi Join  (cost=0.00..26402.35
rows=10width=4) (actual  
time=0.012..4972.868 rows=10 loops=1)   Join Filter: (b.bid = a.bid)   Rows Removed by Join Filter: 4500000   ->  Seq
Scanon pgbench_branches b  (cost=0.00..1.10 rows=10  
width=4) (actual time=0.003..0.013 rows=10 loops=1)   ->  Seq Scan on pgbench_accounts a  (cost=0.00..26394.00
rows=1000000 width=4) (actual time=0.002..242.321 rows=450001 loops=10) Total runtime: 4972.904 ms



limit1test=# explain analyze select bid from pgbench_branches B where
exists (select bid from pgbench_accounts A where A.bid=B.bid limit 1);
--------------------------------------------------------------------- Seq Scan on pgbench_branches b  (cost=0.00..3.99
rows=5width=4)  
(actual time=0.018..382.774 rows=10 loops=1)   Filter: (SubPlan 1)   SubPlan 1     ->  Limit  (cost=0.00..0.29 rows=1
width=4)(actual  
time=38.269..38.269 rows=1 loops=10)           ->  Seq Scan on pgbench_accounts a  (cost=0.00..28894.00
rows=100000 width=4) (actual time=38.261..38.261 rows=1 loops=10)                 Filter: (bid = b.bid)
RowsRemoved by Filter: 450000 Total runtime: 382.816 ms 



# create index pgbench_accounts_bid on pgbench_accounts(bid);



# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid);
-------------------------------------------------------------------- Nested Loop Semi Join  (cost=0.42..2739.24 rows=10
width=4)(actual  
time=0.030..0.131 rows=10 loops=1)   ->  Seq Scan on pgbench_branches b  (cost=0.00..1.10 rows=10
width=4) (actual time=0.003..0.011 rows=10 loops=1)   ->  Index Only Scan using pgbench_accounts_bid on
pgbench_accountsa  (cost=0.42..2733.62 rows=100000 width=4) (actual time=0.010..0.010  
rows=1 loops=10)         Index Cond: (bid = b.bid)         Heap Fetches: 0 Total runtime: 0.169 ms



# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid limit 1);
-------------------------------------------------------------------- Seq Scan on pgbench_branches b  (cost=0.00..3.99
rows=5width=4)  
(actual time=0.016..384.974 rows=10 loops=1)   Filter: (SubPlan 1)   SubPlan 1     ->  Limit  (cost=0.00..0.29 rows=1
width=4)(actual  
time=38.490..38.490 rows=1 loops=10)           ->  Seq Scan on pgbench_accounts a  (cost=0.00..28894.00
rows=100000 width=4) (actual time=38.482..38.482 rows=1 loops=10)                 Filter: (bid = b.bid)
RowsRemoved by Filter: 450000 Total runtime: 385.015 ms 



# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid*100);
-------------------------------------------------------------------- Nested Loop Semi Join  (cost=0.43..2989.31 rows=10
width=4)(actual  
time=0.055..0.055 rows=0 loops=1)   ->  Seq Scan on pgbench_branches b  (cost=0.00..1.10 rows=10
width=4) (actual time=0.004..0.009 rows=10 loops=1)   ->  Index Only Scan using pgbench_accounts_bid on
pgbench_accountsa  (cost=0.43..2983.63 rows=100000 width=4) (actual time=0.002..0.002  
rows=0 loops=10)         Index Cond: (bid = (b.bid * 100))         Heap Fetches: 0 Total runtime: 0.087 ms



# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid*100 limit 1);
------------------------------------------------------------------- Seq Scan on pgbench_branches b  (cost=0.00..4.24
rows=5width=4)  
(actual time=1019.139..1019.139 rows=0 loops=1)   Filter: (SubPlan 1)   Rows Removed by Filter: 10   SubPlan 1     ->
Limit (cost=0.00..0.31 rows=1 width=4) (actual  
time=101.907..101.907 rows=0 loops=10)           ->  Seq Scan on pgbench_accounts a  (cost=0.00..31394.00
rows=100000 width=4) (actual time=101.898..101.898 rows=0 loops=10)                 Filter: (bid = (b.bid * 100))
         Rows Removed by Filter: 1000000 Total runtime: 1019.166 ms 



# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid offset 1);
------------------------------------------------------------------ Seq Scan on pgbench_branches b  (cost=0.00..6.06
rows=5width=4)  
(actual time=0.042..0.166 rows=10 loops=1)   Filter: (SubPlan 1)   SubPlan 1     ->  Limit  (cost=0.46..3542.39
rows=99997width=4) (actual  
time=0.014..0.014 rows=1 loops=10)           ->  Index Only Scan using pgbench_accounts_bid on
pgbench_accounts a  (cost=0.42..3542.39 rows=99998 width=4) (actual
time=0.011..0.012 rows=2 loops=10)                 Index Cond: (bid = b.bid)                 Heap Fetches: 0 Total
runtime:0.200 ms 



===== PostgreSQL 8.3 =====


# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid);
--------------------------------------------------------------------- Seq Scan on branches b  (cost=0.00..3.99 rows=5
width=4)(actual  
time=0.020..436.306 rows=10 loops=1)   Filter: (subplan)   SubPlan     ->  Seq Scan on accounts a  (cost=0.00..28894.00
rows=100000 
width=4) (actual time=43.618..43.618 rows=1 loops=10)           Filter: (bid = $0) Total runtime: 436.348 ms



# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid limit 1);
---------------------------------------------------------------------- Seq Scan on branches b  (cost=0.00..3.99 rows=5
width=4)(actual  
time=0.019..442.392 rows=10 loops=1)   Filter: (subplan)   SubPlan     ->  Limit  (cost=0.00..0.29 rows=1 width=4)
(actual 
time=44.231..44.231 rows=1 loops=10)           ->  Seq Scan on accounts a  (cost=0.00..28894.00 rows=100000
width=4) (actual time=44.223..44.223 rows=1 loops=10)                 Filter: (bid = $0) Total runtime: 442.440 ms



# create index accounts_bid on accounts(bid);



# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid);
---------------------------------------------------------------------- Seq Scan on branches b  (cost=0.00..1.55 rows=5
width=4)(actual  
time=0.049..0.211 rows=10 loops=1)   Filter: (subplan)   SubPlan     ->  Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.017..0.017
rows=1 loops=10)           Index Cond: (bid = $0) Total runtime: 0.258 ms



# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid limit 1);
--------------------------------------------------------------------- Seq Scan on branches b  (cost=0.00..1.55 rows=5
width=4)(actual  
time=0.059..0.180 rows=10 loops=1)   Filter: (subplan)   SubPlan     ->  Limit  (cost=0.00..0.04 rows=1 width=4)
(actual 
time=0.015..0.015 rows=1 loops=10)           ->  Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.014..0.014
rows=1 loops=10)                 Index Cond: (bid = $0) Total runtime: 0.211 ms



# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid*100);
--------------------------------------------------------------------- Seq Scan on branches b  (cost=0.00..1.57 rows=5
width=4)(actual  
time=0.050..0.050 rows=0 loops=1)   Filter: (subplan)   SubPlan     ->  Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.003..0.003
rows=0 loops=10)           Index Cond: (bid = ($0 * 100)) Total runtime: 0.082 ms



# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid*100 limit 1);
--------------------------------------------------------------------- Seq Scan on branches b  (cost=0.00..1.57 rows=5
width=4)(actual  
time=0.066..0.066 rows=0 loops=1)   Filter: (subplan)   SubPlan     ->  Limit  (cost=0.00..0.05 rows=1 width=4) (actual

time=0.005..0.005 rows=0 loops=10)           ->  Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.003..0.003
rows=0 loops=10)                 Index Cond: (bid = ($0 * 100)) Total runtime: 0.095 ms



# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid offset 1);
---------------------------------------------------------------------- Seq Scan on branches b  (cost=0.00..2.00 rows=5
width=4)(actual  
time=0.038..0.181 rows=10 loops=1)   Filter: (subplan)   SubPlan     ->  Limit  (cost=0.04..4493.36 rows=99999 width=4)
(actual 
time=0.016..0.016 rows=1 loops=10)           ->  Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.013..0.014
rows=2 loops=10)                 Index Cond: (bid = $0) Total runtime: 0.213 ms



===== Postgres settings =====


PostgreSQL 9.3:

version
---------------------------------------------------------------------- PostgreSQL 9.3.3 on x86_64-iso-8859-1-linux-gnu,
compiledby gcc (Debian  
4.7.2-5) 4.7.2, 64-bit
 work_mem
---------- 128MB
 shared_buffers
---------------- 1536MB
 effective_cache_size
---------------------- 3GB
 default_statistics_target
--------------------------- 100



PostgreSQL 8.3:

version
----------------------------------------------------------------------- PostgreSQL 8.3.14 on x86_64-pc-linux-gnu,
compiledby GCC gcc-4.3.real  
(Debian 4.3.2-1.1) 4.3.2
 work_mem
---------- 128MB
 shared_buffers
---------------- 1GB
 effective_cache_size
---------------------- 2GB
 default_statistics_target
--------------------------- 100


*_cost settings are all the same (default values), and vacuum, analyze
or reindex does not make any difference.


Any suggestions, what is causes this difference, and how to resolve it
globally? Or are these examples are too special/simple, and seqscan plan
wins in 9.3?



# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid limit 1);
------------------------------------------------------------------- Seq Scan on pgbench_branches b  (cost=0.00..3.99
rows=5width=4)  
(actual time=0.028..379.426 rows=10 loops=1)   Filter: (SubPlan 1)   SubPlan 1     ->  Limit  (cost=0.00..0.29 rows=1
width=4)(actual  
time=37.934..37.934 rows=1 loops=10)           ->  Seq Scan on pgbench_accounts a  (cost=0.00..28894.00
rows=100000 width=4) (actual time=37.927..37.927 rows=1 loops=10)                 Filter: (bid = b.bid)
RowsRemoved by Filter: 450000 Total runtime: 379.464 ms 


# set enable_seqscan=off;


# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid limit 1);
------------------------------------------------------------------- Index Only Scan using pgbench_branches_pkey on
pgbench_branchesb  
(cost=0.14..12.82 rows=5 width=4) (actual time=0.039..0.145 rows=10 loops=1)   Filter: (SubPlan 1)   Heap Fetches: 0
SubPlan1     ->  Limit  (cost=0.42..0.45 rows=1 width=4) (actual  
time=0.012..0.012 rows=1 loops=10)           ->  Index Only Scan using pgbench_accounts_bid on
pgbench_accounts a  (cost=0.42..2850.43 rows=100000 width=4) (actual
time=0.011..0.011 rows=1 loops=10)                 Index Cond: (bid = b.bid)                 Heap Fetches: 0 Total
runtime:0.178 ms 



Thanks,
József Kószó



pgsql-bugs by date:

Previous
From: Venkata Balaji Nagothi
Date:
Subject: Re: BUG #9604: Unable to access table remotely
Next
From: Jeff Frost
Date:
Subject: Re: BUG #9606: pg_class relhaspkey column not updated on removal of primary key