Thread: Postgresql optimisator deoptimise queries sometime...

Postgresql optimisator deoptimise queries sometime...

From

Maxim Boguk

Date:

03 September 2008, 11:20:04

postgresql version 8.3:

I found issue when optimisator had tried rollup subrequest (without attempt compare final cost with direct plan) and
finishedwith bad plan. 
The simplest test is below:

Preparing data:

testdb=# INSERT INTO table2 select (random()*99+1)::integer from generate_series(1,100000);
testdb=# drop table table2;
DROP TABLE
testdb=# drop table table1;
DROP TABLE
testdb=# CREATE TABLE table1 (id serial primary key);
CREATE TABLE
testdb=# INSERT INTO table1 select generate_series(1,50);
INSERT 0 50
testdb=# ANALYZE table1;
ANALYZE
testdb=# CREATE TABLE table2 (fk integer not null references table1(id));
CREATE TABLE
testdb=# INSERT INTO table2 select (random()*49+1)::integer from generate_series(1,50000);
INSERT 0 50000
testdb=# ANALYZE table2;
ANALYZE

Now lets try execute next query:
SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) from table1 where (select count(*) from table2
wheretable2.fk=table1.id)>1000 

testdb=# EXPLAIN ANALYZE SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) from table1 where
(selectcount(*) from table2 where table2.fk=table1.id)>1000; 
                                                        QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
  Seq Scan on table1  (cost=0.00..56918.96 rows=17 width=4) (actual time=52.576..1450.798 rows=33 loops=1)
    Filter: ((subplan) > 1000)
    SubPlan
      ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=17.459..17.460 rows=1 loops=50)
            ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.029..16.022 rows=1000
loops=50)
                  Filter: (fk = $0)
      ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=17.484..17.486 rows=1 loops=33)
            ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.029..16.002 rows=1037
loops=33)
                  Filter: (fk = $0)
  Total runtime: 1453.577 ms
(10 rows)

oops... grouping query executing twice per row... but this is ok i think.


Lets try more optimal query form (eliminate twice calling count(*) query...):
testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as
countfrom table1) as t1 where count>1000; 
                                                        QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
  Seq Scan on table1  (cost=0.00..56918.96 rows=17 width=4) (actual time=53.473..1525.859 rows=33 loops=1)
    Filter: ((subplan) > 1000)
    SubPlan
      ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=18.445..18.446 rows=1 loops=50)
            ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.033..17.001 rows=1000
loops=50)
                  Filter: (fk = $0)
      ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=18.262..18.263 rows=1 loops=33)
            ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.033..16.777 rows=1037
loops=33)
                  Filter: (fk = $0)
  Total runtime: 1526.027 ms
(10 rows)

Hey... i dont asked rollup subrequest... and calculate subplan twice again per row...


Workaround ofcource easy (add offset 0 to subquery) (And lead to lower estimated cost!):
testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as
countfrom table1 offset 0) as t1 where count>1000; 
                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan t1  (cost=0.00..42477.75 rows=17 width=12) (actual time=35.393..908.265 rows=33 loops=1)
    Filter: (t1.count > 1000)
    ->  Limit  (cost=0.00..42477.12 rows=50 width=4) (actual time=16.925..908.092 rows=50 loops=1)
          ->  Seq Scan on table1  (cost=0.00..42477.12 rows=50 width=4) (actual time=16.921..907.950 rows=50 loops=1)
                SubPlan
                  ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=18.148..18.149 rows=1 loops=50)
                        ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.032..16.719
rows=1000loops=50) 
                              Filter: (fk = $0)
  Total runtime: 908.421 ms
(9 rows)


So i think it is bug if planner rollup internal query without even try compare cost with direct plan.
Writing such not easy to understand workarounds as 'offset 0' i think bad style.


==============================================================================================================================================================================

Even worse situation when internal subrequest can produce volatile results:

testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id and
random()>0.1)as count from table1) as t1 where count>900; 
                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
  Seq Scan on table1  (cost=0.00..73557.24 rows=17 width=4) (actual time=47.385..1346.824 rows=32 loops=1)
    Filter: ((subplan) > 900)
    SubPlan
      ->  Aggregate  (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.333..16.334 rows=1 loops=50)
            ->  Seq Scan on table2  (cost=0.00..1097.00 rows=333 width=0) (actual time=0.025..14.995 rows=900 loops=50)
                  Filter: ((fk = $0) AND (random() > 0.1::double precision))
      ->  Aggregate  (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.537..16.539 rows=1 loops=32)
            ->  Seq Scan on table2  (cost=0.00..1097.00 rows=333 width=0) (actual time=0.028..15.141 rows=934 loops=32)
                  Filter: ((fk = $0) AND (random() > 0.1::double precision))
  Total runtime: 1346.972 ms
(10 rows)
This plan can produce just wrong result!!!! (wich is clear bug).

VS right plan:

testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id and
random()>0.1)as count from table1 offset 0) as t1 where count>900; 
                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan t1  (cost=0.00..54894.38 rows=17 width=12) (actual time=31.181..800.898 rows=35 loops=1)
    Filter: (t1.count > 900)
    ->  Limit  (cost=0.00..54893.75 rows=50 width=4) (actual time=14.992..800.754 rows=50 loops=1)
          ->  Seq Scan on table1  (cost=0.00..54893.75 rows=50 width=4) (actual time=14.988..800.602 rows=50 loops=1)
                SubPlan
                  ->  Aggregate  (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.003..16.004 rows=1 loops=50)
                        ->  Seq Scan on table2  (cost=0.00..1097.00 rows=333 width=0) (actual time=0.025..14.725
rows=898loops=50) 
                              Filter: ((fk = $0) AND (random() > 0.1::double precision))
  Total runtime: 801.021 ms
(9 rows)


--
Maxim Boguk

Re: Postgresql optimisator deoptimise queries sometime...

From

Maxim Boguk

Date:

05 September 2008, 07:13:23

Anyone can commet that issue?

More extremal sample (simplified version of what i get in real world situation):
same table data...

Query:
select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as total from table1) as t1
wheretotal=990 or total=991 or total=992 or total=993 or  
total=994 or total=995 or total=996 or total=997 or total=998 or total=999 or total=1000 or total=1001 or total=1002 or
total=1003or total=1004 or total=1005 or total=1006 or  
total=1007 or total=1008 or total=1009 or total=1010;

But postgres use bad bad plan for that query:
testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as
totalfrom table1) as t1 where total=990 or total=991 or  
total=992 or total=993 or total=994 or total=995 or total=996 or total=997 or total=998 or total=999 or total=1000 or
total=1001or total=1002 or total=1003 or total=1004 or  
total=1005 or total=1006 or total=1007 or total=1008 or total=1009 or total=1010;

                                                QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on table1  (cost=0.00..906433.96 rows=17 width=4) (actual time=1035.481..15695.443 rows=12 loops=1)
    Filter: (((subplan) = 990) OR ((subplan) = 991) OR ((subplan) = 992) OR ((subplan) = 993) OR ((subplan) = 994) OR
((subplan)= 995) OR ((subplan) = 996) OR ((subplan) = 997) OR  
((subplan) = 998) OR ((subplan) = 999) OR ((subplan) = 1000) OR ((subplan) = 1001) OR ((subplan) = 1002) OR ((subplan)
=1003) OR ((subplan) = 1004) OR ((subplan) = 1005) OR  
((subplan) = 1006) OR ((subplan) = 1007) OR ((subplan) = 1008) OR ((subplan) = 1009) OR ((subplan) = 1010))
    SubPlan
      ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=16.308..16.309 rows=1 loops=39)
            ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.021..14.839 rows=1000
loops=39)
                  Filter: (fk = $0)
      ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=16.286..16.288 rows=1 loops=39)
            ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.021..14.817 rows=1000
loops=39)
                  Filter: (fk = $0)
      ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=16.434..16.436 rows=1 loops=39)
            ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.021..14.957 rows=1000
loops=39)
                  Filter: (fk = $0)
     ........
     17 more aggregate seq scans
     ........
      ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=16.316..16.317 rows=1 loops=12)
           ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.020..14.845 rows=1000 loops=12)
                 Filter: (fk = $0)
  Total runtime: 15696.295 ms
(70 rows)

vs right version:

testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as
totalfrom table1 offset 0) as t1 where total=990 or total=991  
or total=992 or total=993 or total=994 or total=995 or total=996 or total=997 or total=998 or total=999 or total=1000
ortotal=1001 or total=1002 or total=1003 or total=1004 or  
total=1005 or total=1006 or total=1007 or total=1008 or total=1009 or total=1010;

                                     QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan t1  (cost=0.00..42480.25 rows=17 width=12) (actual time=63.121..804.438 rows=12 loops=1)
    Filter: ((t1.total = 990) OR (t1.total = 991) OR (t1.total = 992) OR (t1.total = 993) OR (t1.total = 994) OR
(t1.total= 995) OR (t1.total = 996) OR (t1.total = 997) OR  
(t1.total = 998) OR (t1.total = 999) OR (t1.total = 1000) OR (t1.total = 1001) OR (t1.total = 1002) OR (t1.total =
1003)OR (t1.total = 1004) OR (t1.total = 1005) OR (t1.total =  
1006) OR (t1.total = 1007) OR (t1.total = 1008) OR (t1.total = 1009) OR (t1.total = 1010))
    ->  Limit  (cost=0.00..42477.12 rows=50 width=4) (actual time=15.029..804.190 rows=50 loops=1)
          ->  Seq Scan on table1  (cost=0.00..42477.12 rows=50 width=4) (actual time=15.027..804.053 rows=50 loops=1)
                SubPlan
                  ->  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual time=16.072..16.073 rows=1 loops=50)
                        ->  Seq Scan on table2  (cost=0.00..847.00 rows=1000 width=0) (actual time=0.020..14.599
rows=1000loops=50) 
                              Filter: (fk = $0)
  Total runtime: 804.552 ms
(9 rows)


performance difference 20 times... :(((

I think is is just missoptimisation from db side.

PS: in real world query work around view:
CREATE VIEW test_view as SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as total from table1;

and i have no way put offset 0 into query
select * from test_view where total=990 or total=991 or total=992 or total=993 or total=994 or total=995 or total=996
ortotal=997 or total=998 or total=999 or total=1000 or  
total=1001 or total=1002 or total=1003 or total=1004 or total=1005 or total=1006 or total=1007 or total=1008 or
total=1009or total=1010; 


--
Maxim Boguk