performance regression in 9.2 CTE with SRF function - Mailing list pgsql-hackers

From Pavel Stehule
Subject performance regression in 9.2 CTE with SRF function
Date
Msg-id CAFj8pRAfaCBEPfo4Zpyck1wVgShboXWA2VEY+=rGfeLDgUQCBw@mail.gmail.com
Whole thread Raw
Responses Re: performance regression in 9.2 CTE with SRF function  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hello

In Czech discussion group was reported performance regression of CTE
query. I wrote a test, when I can show it.

Execution on 9.2, 9.3 is about 9 sec, on 8.4, 9.1 is about 30ms with
default work_mem ='1MB'


optimal plan (8.4)
                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=45.54..69.32 rows=441 width=4) (actual
time=29.943..29.943 rows=0 loops=1)
   Output: x2.a
   Hash Cond: (x2.a = pl.a)
   CTE pl
     ->  HashAggregate  (cost=17.50..18.39 rows=89 width=4) (actual
time=6.723..7.016 rows=89 loops=1)
           Output: x.a
           ->  Seq Scan on x  (cost=0.00..15.00 rows=1000 width=4)
(actual time=0.026..3.390 rows=1000 loops=1)
                 Output: x.a
   ->  Seq Scan on x2  (cost=24.25..41.75 rows=500 width=4) (actual
time=21.178..21.757 rows=131 loops=1)
         Output: x2.a, x2.b
         Filter: (NOT (hashed SubPlan 2))
         SubPlan 2
           ->  CTE Scan on pl  (cost=0.00..24.03 rows=89 width=4)
(actual time=0.208..18.821 rows=1000 loops=1)
                 Output: foo(pl.a)
   ->  Hash  (cost=1.78..1.78 rows=89 width=4) (actual
time=7.895..7.895 rows=89 loops=1)
         Output: pl.a
         ->  CTE Scan on pl  (cost=0.00..1.78 rows=89 width=4) (actual
time=6.733..7.592 rows=89 loops=1)
               Output: pl.a
 Total runtime: 30.046 ms

suboptimal plan (9.3)
                                                          QUERY PLAN

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Nested Loop  (cost=18.39..346248.92 rows=441 width=4) (actual
time=9373.295..9373.295 rows=0 loops=1)
   Output: x2.a
   Join Filter: (x2.a = pl.a)
   Rows Removed by Join Filter: 11837
   CTE pl
     ->  HashAggregate  (cost=17.50..18.39 rows=89 width=4) (actual
time=4.363..4.539 rows=89 loops=1)
           Output: x.a
           ->  Seq Scan on public.x  (cost=0.00..15.00 rows=1000
width=4) (actual time=0.031..2.133 rows=1000 loops=1)
                 Output: x.a
   ->  CTE Scan on pl  (cost=0.00..1.78 rows=89 width=4) (actual
time=4.371..4.519 rows=89 loops=1)
         Output: pl.a
   ->  Materialize  (cost=0.00..345562.50 rows=500 width=4) (actual
time=0.203..105.051 rows=133 loops=89)
         Output: x2.a
         ->  Seq Scan on public.x2  (cost=0.00..345560.00 rows=500
width=4) (actual time=17.914..9330.645 rows=133 loops=1)
               Output: x2.a
               Filter: (NOT (SubPlan 2))
               Rows Removed by Filter: 867
               SubPlan 2
                 ->  CTE Scan on pl pl_1  (cost=0.00..468.59
rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000)
                       Output: foo(pl_1.a)
 Total runtime: 9373.444 ms

Higher work_mem 3MB fixes this issue
                                                       QUERY PLAN

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Hash Join  (cost=712.37..736.15 rows=441 width=4) (actual
time=38.024..38.024 rows=0 loops=1)
   Output: x2.a
   Hash Cond: (x2.a = pl.a)
   CTE pl
     ->  HashAggregate  (cost=17.50..18.39 rows=89 width=4) (actual
time=8.457..8.759 rows=89 loops=1)
           Output: x.a
           ->  Seq Scan on public.x  (cost=0.00..15.00 rows=1000
width=4) (actual time=0.029..5.073 rows=1000 loops=1)
                 Output: x.a
   ->  Seq Scan on public.x2  (cost=691.09..708.59 rows=500 width=4)
(actual time=27.629..28.152 rows=105 loops=1)
         Output: x2.a, x2.b
         Filter: (NOT (hashed SubPlan 2))
         Rows Removed by Filter: 895
         SubPlan 2
           ->  CTE Scan on pl pl_1  (cost=0.00..468.59 rows=89000
width=4) (actual time=0.328..24.224 rows=1000 loops=1)
                 Output: foo(pl_1.a)
   ->  Hash  (cost=1.78..1.78 rows=89 width=4) (actual
time=9.650..9.650 rows=89 loops=1)
         Output: pl.a
         Buckets: 1024  Batches: 1  Memory Usage: 3kB
         ->  CTE Scan on pl  (cost=0.00..1.78 rows=89 width=4) (actual
time=8.468..9.346 rows=89 loops=1)
               Output: pl.a
 Total runtime: 38.153 ms

There is terribly wrong estimation when work_mem is lower

               SubPlan 2
                 ->  CTE Scan on pl pl_1  (cost=0.00..468.59
rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000)
                       Output: foo(pl_1.a)

tables x and x2 are wery small - 64KB

link to original thread - sorry it is in Czech language (but attached
plans are in English :))
http://groups.google.com/group/postgresql-cz/browse_thread/thread/75e0e34e88950802?hl=cs

Regards
Pavel Stehule

Attachment

pgsql-hackers by date:

Previous
From: "Erik Rijkers"
Date:
Subject: Re: [COMMITTERS] pgsql: psql: Improve expanded print output in tuples-only mode
Next
From: Miroslav Šimulčík
Date:
Subject: Re: function for setting/getting same timestamp during whole transaction