potential performance gain by query planner optimization - Mailing list pgsql-performance

From Kneringer, Armin
Subject potential performance gain by query planner optimization
Date
Msg-id 12F45C8182386842930DCE2051B2AA601AD60AC1@fabamailserver.fabagl.fabasoft.com
Whole thread Raw
Responses Re: potential performance gain by query planner optimization
Re: potential performance gain by query planner optimization
List pgsql-performance
Hi there.

I think I found a potential performance gain if the query planner would be optimized. All Tests has been performed with
8.4.1(and earlier versions) on CentOS 5.3 (x64) 

The following query will run on my database (~250 GB) for ca. 1600 seconds and the sort will result in a disk merge
deployingca. 200 GB of data to the local disk (ca. 180.000 tmp-files) 

explain SELECT DISTINCT t4.objid
FROM fscsubfile t4, cooobject t6
 NOT EXISTS (
 WHERE t6.objid = t4.objid AND
 t4.fileresporgid = 573936067464397682 AND
   NOT EXISTS (
   SELECT 1
   FROM ataggval q1_1,
   atdateval t5
   WHERE q1_1.objid = t4.objid AND
   q1_1.attrid = 281479288456451 AND
   q1_1.aggrid = 0 AND
   t5.aggrid = q1_1.aggval AND
   t5.objid = t4.objid AND
   t5.attrid = 281479288456447 ) AND
 ((t6.objclassid IN (285774255832590,285774255764301))) AND
 ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
 ORDER BY t4.objid;

                                                                                  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=2592137103.99..2592137104.00 rows=1 width=8)
   ->  Sort  (cost=2592137103.99..2592137104.00 rows=1 width=8)
         Sort Key: t4.objid
         ->  Nested Loop  (cost=1105592553.38..2592137103.98 rows=1 width=8)
               ->  Hash Anti Join  (cost=1105592553.38..2592137095.75 rows=1 width=8)
                     Hash Cond: ((t4.objid = q1_1.objid) AND (t4.objid = t5.objid))
                     ->  Bitmap Heap Scan on fscsubfile t4  (cost=154.42...14136.40 rows=5486 width=8)
                           Recheck Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid >
573936097512390656::bigint)AND (objid < 573936101807357952::bigint)) 
                           ->  Bitmap Index Scan on ind_fscsubfile_filerespons  (cost=0.00..153.05 rows=5486 width=0)
                                 Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid >
573936097512390656::bigint)AND (objid < 573936101807357952::bigint)) 
                     ->  Hash  (cost=11917516.57..11917516.57 rows=55006045159 width=16)
                           ->  Nested Loop  (cost=0.00..11917516.57 rows=55006045159 width=16)
                                 ->  Seq Scan on atdateval t5  (cost=0.00...294152.40 rows=1859934 width=12)
                                       Filter: (attrid = 281479288456447::bigint)
                                 ->  Index Scan using ind_ataggval on ataggval q1_1  (cost=0.00..6.20 rows=4 width=12)
                                       Index Cond: ((q1_1.attrid = 281479288456451::bigint) AND (q1_1.aggval =
t5.aggrid))
                                       Filter: (q1_1.aggrid = 0)
               ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8)
                     Index Cond: (t6.objid = t4.objid)
                     Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
(20 rows)


As the disks pace is limited on my test system I can't provide the "explain analyze" output
If I change the query as follows the query takes only 12 seconds and only needs 2 tmp files for sorting.
(Changed lines are marked with [!!!!!] as I don't know HTML-Mails will be delivered without conversion

explain SELECT DISTINCT t4.objid
FROM fscsubfile t4, cooobject t6
WHERE t6.objid = t4.objid AND
t4.fileresporgid = 573936067464397682 AND
   NOT EXISTS (
   SELECT 1
   FROM ataggval q1_1,
   atdateval t5
   WHERE q1_1.objid = t4.objid AND
   q1_1.attrid = 281479288456451 AND
   q1_1.aggrid = 0 AND
   t5.aggrid = q1_1.aggval AND
   t5.objid = q1_1.objid AND                 [!!!!!]
   t5.attrid = 281479288456447 ) AND
   ((t6.objclassid IN (285774255832590,285774255764301))) AND
   ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
 ORDER BY t4.objid;
                                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=918320.29..971968.88 rows=1 width=8)
   ->  Nested Loop  (cost=918320.29..971968.88 rows=1 width=8)
         ->  Merge Anti Join  (cost=918320.29..971960.65 rows=1 width=8)
               Merge Cond: (t4.objid = q1_1.objid)
               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4  (cost=0.00..19016.05 rows=5486
width=8)
                     Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint)
AND(objid < 573936101807357952::bigint)) 
               ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8)
                     ->  Merge Join  (cost=912418.42..956372.47 rows=22689 width=8)
                           Merge Cond: ((t5.objid = q1_1.objid) AND (t5.aggrid = q1_1.aggval))
                           ->  Sort  (cost=402024.80..406674.63 rows=1859934 width=12)
                                 Sort Key: t5.objid, t5.aggrid
                                 ->  Bitmap Heap Scan on atdateval t5  (cost=43749.07..176555.24 rows=1859934 width=12)
                                       Recheck Cond: (attrid = 281479288456447::bigint)
                                       ->  Bitmap Index Scan on ind_atdateval  (cost=0.00..43284.08 rows=1859934
width=0)
                                             Index Cond: (attrid = 281479288456447::bigint)
                           ->  Materialize  (cost=510392.25..531663.97 rows=1701738 width=12)
                                 ->  Sort  (cost=510392.25..514646.59 rows=1701738 width=12)
                                       Sort Key: q1_1.objid, q1_1.aggval
                                       ->  Bitmap Heap Scan on ataggval q1_1  (cost=44666.00..305189.47 rows=1701738
width=12)
                                             Recheck Cond: (attrid = 281479288456451::bigint)
                                             Filter: (aggrid = 0)
                                             ->  Bitmap Index Scan on ind_ataggval  (cost=0.00..44240.56 rows=1860698
width=0)
                                                   Index Cond: (attrid = 281479288456451::bigint)
         ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8)
               Index Cond: (t6.objid = t4.objid)
               Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
(26 rows)

explain analyze SELECT DISTINCT t4.objid
FROM fscsubfile t4, cooobject t6
WHERE t6.objid = t4.objid AND
t4.fileresporgid = 573936067464397682 AND
  NOT EXISTS (
  SELECT 1
  FROM ataggval q1_1,
  atdateval t5
  WHERE q1_1.objid = t4.objid AND
  q1_1.attrid = 281479288456451 AND
  q1_1.aggrid = 0 AND
  t5.aggrid = q1_1.aggval AND
  t5.objid = q1_1.objid AND                 [!!!!!]
  t5.attrid = 281479288456447 ) AND
((t6.objclassid IN (285774255832590,285774255764301))) AND
((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
ORDER BY t4.objid;
                                                                                     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=918320.29..971968.88 rows=1 width=8) (actual time=12079.598..12083.048 rows=64 loops=1)
   ->  Nested Loop  (cost=918320.29..971968.88 rows=1 width=8) (actual time=12079.594..12083.010 rows=64 loops=1)
         ->  Merge Anti Join  (cost=918320.29..971960.65 rows=1 width=8) (actual time=12037.524..12081.989 rows=108
loops=1)
               Merge Cond: (t4.objid = q1_1.objid)
               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4  (cost=0.00..19016.05 rows=5486
width=8)(actual time=0.073..83.498 rows=63436 loops=1) 
                     Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint)
AND(objid < 573936101807357952::bigint)) 
               ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8) (actual time=8866.253..11753.055
rows=1299685loops=1) 
                     ->  Merge Join  (cost=912418.42..956372.47 rows=22689 width=8) (actual time=8866.246..11413.397
rows=1299685loops=1) 
                           Merge Cond: ((t5.objid = q1_1.objid) AND (t5.aggrid = q1_1.aggval))
                           ->  Sort  (cost=402024.80..406674.63 rows=1859934 width=12) (actual time=3133.362..3774.076
rows=1299685loops=1) 
                                 Sort Key: t5.objid, t5.aggrid
                                 Sort Method:  external merge  Disk: 47192kB
                                 ->  Bitmap Heap Scan on atdateval t5  (cost=43749.07..176555.24 rows=1859934 width=12)
(actualtime=282.454..1079.038 rows=1857906 loops=1) 
                                       Recheck Cond: (attrid = 281479288456447::bigint)
                                       ->  Bitmap Index Scan on ind_atdateval  (cost=0.00..43284.08 rows=1859934
width=0)(actual time=258.749...258.749 rows=1857906 loops=1) 
                                             Index Cond: (attrid = 281479288456447::bigint)
                           ->  Materialize  (cost=510392.25..531663.97 rows=1701738 width=12) (actual
time=5732.872..6683.784rows=1299685 loops=1) 
                                 ->  Sort  (cost=510392.25..514646.59 rows=1701738 width=12) (actual
time=5732.866..6387.188rows=1299685 loops=1) 
                                      Sort Key: q1_1.objid, q1_1.aggval
                                       Sort Method:  external merge  Disk: 39920kB
                                       ->  Bitmap Heap Scan on ataggval q1_1  (cost=44666.00..305189.47 rows=1701738
width=12)(actual time=1644.983..3634.044 rows=1857906 loops=1) 
                                             Recheck Cond: (attrid = 281479288456451::bigint)
                                             Filter: (aggrid = 0)
                                             ->  Bitmap Index Scan on ind_ataggval  (cost=0.00..44240.56 rows=1860698
width=0)(actual time=1606.325..1606.325 rows=1877336 loops=1) 
                                                   Index Cond: (attrid = 281479288456451::bigint)
         ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8) (actual time=0.009..0.009
rows=1loops=108) 
               Index Cond: (t6.objid = t4.objid)
               Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
Total runtime: 12108.663 ms
(29 rows)


Another way to optimize my query is to change it as follows:
(Once again changes are marked with [!!!!!]

explain SELECT DISTINCT t4.objid
FROM fscsubfile t4, cooobject t6
WHERE t6.objid = t4.objid AND
t4.fileresporgid = 573936067464397682 AND
   NOT EXISTS (
   SELECT 1
   FROM ataggval q1_1,
   atdateval t5
   WHERE q1_1.objid = t5.objid AND                 [!!!!!]
   q1_1.attrid = 281479288456451 AND
   q1_1.aggrid = 0 AND
   t5.aggrid = q1_1.aggval AND
   t5.objid = t4.objid AND
   t5.attrid = 281479288456447 ) AND
 ((t6.objclassid IN (285774255832590,285774255764301))) AND
 ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
 ORDER BY t4.objid;
                                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=916978.86..969139.72 rows=1 width=8)
   ->  Nested Loop  (cost=916978.86..969139.72 rows=1 width=8)
         ->  Merge Anti Join  (cost=916978.86..969131.49 rows=1 width=8)
               Merge Cond: (t4.objid = t5.objid)
               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4  (cost=0.00..19016.05 rows=5486
width=8)
                     Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint)
AND(objid < 573936101807357952::bigint)) 
               ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8)
                     ->  Merge Join  (cost=912418.42..956372.47 rows=22689 width=8)
                           Merge Cond: ((t5.objid = q1_1.objid) AND (t5.aggrid = q1_1.aggval))
                           ->  Sort  (cost=402024.80..406674.63 rows=1859934 width=12)
                                 Sort Key: t5.objid, t5.aggrid
                                 ->  Bitmap Heap Scan on atdateval t5  (cost=43749.07..176555.24 rows=1859934 width=12)
                                       Recheck Cond: (attrid = 281479288456447::bigint)
                                       ->  Bitmap Index Scan on ind_atdateval  (cost=0.00..43284.08 rows=1859934
width=0)
                                             Index Cond: (attrid = 281479288456447::bigint)
                           ->  Materialize  (cost=510392.25..531663.97 rows=1701738 width=12)
                                 ->  Sort  (cost=510392.25..514646.59 rows=1701738 width=12)
                                       Sort Key: q1_1.objid, q1_1.aggval
                                       ->  Bitmap Heap Scan on ataggval q1_1  (cost=44666.00..305189.47 rows=1701738
width=12)
                                             Recheck Cond: (attrid = 281479288456451::bigint)
                                             Filter: (aggrid = 0)
                                             ->  Bitmap Index Scan on ind_ataggval  (cost=0.00..44240.56 rows=1860698
width=0)
                                                   Index Cond: (attrid = 281479288456451::bigint)
         ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8)
               Index Cond: (t6.objid = t4.objid)
               Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
(26 rows)


explain analyze SELECT DISTINCT t4.objid
FROM fscsubfile t4, cooobject t6
WHERE t6.objid = t4.objid AND
t4.fileresporgid = 573936067464397682 AND
  NOT EXISTS (
  SELECT 1
  FROM ataggval q1_1,
  atdateval t5
  WHERE q1_1.objid = t5.objid AND                 [!!!!!]
  q1_1.attrid = 281479288456451 AND
  q1_1.aggrid = 0 AND
  t5.aggrid = q1_1.aggval AND
  t5.objid = t4.objid AND
  t5.attrid = 281479288456447 ) AND
((t6.objclassid IN (285774255832590,285774255764301))) AND
((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
ORDER BY t4.objid;
                                                                                     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Unique  (cost=916978.86..969139.72 rows=1 width=8) (actual time=12102.964..12106.409 rows=64 loops=1)
   ->  Nested Loop  (cost=916978.86..969139.72 rows=1 width=8) (actual time=12102.959..12106.375 rows=64 loops=1)
         ->  Merge Anti Join  (cost=916978.86..969131.49 rows=1 width=8) (actual time=12060.916..12105.374 rows=108
loops=1)
               Merge Cond: (t4.objid = t5.objid)
               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4  (cost=0.00..19016.05 rows=5486
width=8)(actual time=0.080..81.397 rows=63436 loops=1) 
                     Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint)
AND(objid < 573936101807357952::bigint)) 
               ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8) (actual time=8874.492..11778.254
rows=1299685loops=1) 
                     ->  Merge Join  (cost=912418.42..956372.47 rows=22689 width=8) (actual time=8874.484..11437.175
rows=1299685loops=1) 
                           Merge Cond: ((t5.objid = q1_1.objid) AND (t5.aggrid = q1_1.aggval))
                           ->  Sort  (cost=402024.80..406674.63 rows=1859934 width=12) (actual time=3117.555..3756.062
rows=1299685loops=1) 
                                 Sort Key: t5.objid, t5.aggrid
                                 Sort Method:  external merge  Disk: 39920kB
                                 ->  Bitmap Heap Scan on atdateval t5  (cost=43749.07..176555.24 rows=1859934 width=12)
(actualtime=289.475..1079.624 rows=1857906 loops=1) 
                                       Recheck Cond: (attrid = 281479288456447::bigint)
                                       ->  Bitmap Index Scan on ind_atdateval  (cost=0.00..43284.08 rows=1859934
width=0)(actual time=265.720...265.720 rows=1857906 loops=1) 
                                             Index Cond: (attrid = 281479288456447::bigint)
                           ->  Materialize  (cost=510392.25..531663.97 rows=1701738 width=12) (actual
time=5756.915..6707.864rows=1299685 loops=1) 
                                 ->  Sort  (cost=510392.25..514646.59 rows=1701738 width=12) (actual
time=5756.909..6409.819rows=1299685 loops=1) 
                                       Sort Key: q1_1.objid, q1_1.aggval
                                       Sort Method:  external merge  Disk: 39920kB
                                       ->  Bitmap Heap Scan on ataggval q1_1  (cost=44666.00..305189.47 rows=1701738
width=12)(actual time=1646.955..3628.918 rows=1857906 loops=1) 
                                             Recheck Cond: (attrid = 281479288456451::bigint)
                                             Filter: (aggrid = 0)
                                             ->  Bitmap Index Scan on ind_ataggval  (cost=0.00..44240.56 rows=1860698
width=0)(actual time=1608.233..1608.233 rows=1877336 loops=1) 
                                                   Index Cond: (attrid = 281479288456451::bigint)
         ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8) (actual time=0.008..0.009
rows=1loops=108) 
               Index Cond: (t6.objid = t4.objid)
               Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
Total runtime: 12129.613 ms
(29 rows)



As the query performs in roughly 12 seconds in both (changed) cases you might advise to change my queries :-)
(In fact we are working on this)
As the primary performance impact can also be reproduced in a small database (querytime > 1 minute) I checked this
issueon MS-SQL server and Oracle. On MSSQL server there is no difference in the execution plan if you change the query
anthe performance is well. Oralce shows a slightly difference but the performance is also well. 
As I mentioned we are looking forward to change our query but in my opinion there could be a general performance gain
ifthis issue is addressed. (especially if you don't know you run into this issue on the query performance is sufficient
enough)

greets
Armin

pgsql-performance by date:

Previous
From: Jochen Erwied
Date:
Subject: Re: dates and partitioning
Next
From: Pavel Stehule
Date:
Subject: Re: potential performance gain by query planner optimization