Thread: performance regression in 9.2 CTE with SRF function
Hello In Czech discussion group was reported performance regression of CTE query. I wrote a test, when I can show it. Execution on 9.2, 9.3 is about 9 sec, on 8.4, 9.1 is about 30ms with default work_mem ='1MB' optimal plan (8.4) QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Hash Join (cost=45.54..69.32 rows=441 width=4) (actual time=29.943..29.943 rows=0 loops=1) Output: x2.a Hash Cond: (x2.a = pl.a) CTE pl -> HashAggregate (cost=17.50..18.39 rows=89 width=4) (actual time=6.723..7.016 rows=89 loops=1) Output: x.a -> Seq Scan on x (cost=0.00..15.00 rows=1000 width=4) (actual time=0.026..3.390 rows=1000 loops=1) Output: x.a -> Seq Scan on x2 (cost=24.25..41.75 rows=500 width=4) (actual time=21.178..21.757 rows=131 loops=1) Output: x2.a, x2.b Filter: (NOT (hashed SubPlan 2)) SubPlan 2 -> CTE Scan on pl (cost=0.00..24.03 rows=89 width=4) (actual time=0.208..18.821 rows=1000 loops=1) Output: foo(pl.a) -> Hash (cost=1.78..1.78 rows=89 width=4) (actual time=7.895..7.895 rows=89 loops=1) Output: pl.a -> CTE Scan on pl (cost=0.00..1.78 rows=89 width=4) (actual time=6.733..7.592 rows=89 loops=1) Output: pl.a Total runtime: 30.046 ms suboptimal plan (9.3) QUERY PLAN ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Nested Loop (cost=18.39..346248.92 rows=441 width=4) (actual time=9373.295..9373.295 rows=0 loops=1) Output: x2.a Join Filter: (x2.a = pl.a) Rows Removed by Join Filter: 11837 CTE pl -> HashAggregate (cost=17.50..18.39 rows=89 width=4) (actual time=4.363..4.539 rows=89 loops=1) Output: x.a -> Seq Scan on public.x (cost=0.00..15.00 rows=1000 width=4) (actual time=0.031..2.133 rows=1000 loops=1) Output: x.a -> CTE Scan on pl (cost=0.00..1.78 rows=89 width=4) (actual time=4.371..4.519 rows=89 loops=1) Output: pl.a -> Materialize (cost=0.00..345562.50 rows=500 width=4) (actual time=0.203..105.051 rows=133 loops=89) Output: x2.a -> Seq Scan on public.x2 (cost=0.00..345560.00 rows=500 width=4) (actual time=17.914..9330.645 rows=133 loops=1) Output: x2.a Filter: (NOT (SubPlan 2)) Rows Removed by Filter: 867 SubPlan 2 -> CTE Scan on pl pl_1 (cost=0.00..468.59 rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000) Output: foo(pl_1.a) Total runtime: 9373.444 ms Higher work_mem 3MB fixes this issue QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Hash Join (cost=712.37..736.15 rows=441 width=4) (actual time=38.024..38.024 rows=0 loops=1) Output: x2.a Hash Cond: (x2.a = pl.a) CTE pl -> HashAggregate (cost=17.50..18.39 rows=89 width=4) (actual time=8.457..8.759 rows=89 loops=1) Output: x.a -> Seq Scan on public.x (cost=0.00..15.00 rows=1000 width=4) (actual time=0.029..5.073 rows=1000 loops=1) Output: x.a -> Seq Scan on public.x2 (cost=691.09..708.59 rows=500 width=4) (actual time=27.629..28.152 rows=105 loops=1) Output: x2.a, x2.b Filter: (NOT (hashed SubPlan 2)) Rows Removed by Filter: 895 SubPlan 2 -> CTE Scan on pl pl_1 (cost=0.00..468.59 rows=89000 width=4) (actual time=0.328..24.224 rows=1000 loops=1) Output: foo(pl_1.a) -> Hash (cost=1.78..1.78 rows=89 width=4) (actual time=9.650..9.650 rows=89 loops=1) Output: pl.a Buckets: 1024 Batches: 1 Memory Usage: 3kB -> CTE Scan on pl (cost=0.00..1.78 rows=89 width=4) (actual time=8.468..9.346 rows=89 loops=1) Output: pl.a Total runtime: 38.153 ms There is terribly wrong estimation when work_mem is lower SubPlan 2 -> CTE Scan on pl pl_1 (cost=0.00..468.59 rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000) Output: foo(pl_1.a) tables x and x2 are wery small - 64KB link to original thread - sorry it is in Czech language (but attached plans are in English :)) http://groups.google.com/group/postgresql-cz/browse_thread/thread/75e0e34e88950802?hl=cs Regards Pavel Stehule
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes: > In Czech discussion group was reported performance regression of CTE > query. I wrote a test, when I can show it. I don't see anything tremendously wrong here. The older branches are choosing the right plan for entirely wrong reasons: they don't notice that "select foo(a) from pl" has a set-returning function in the targetlist and so don't adjust the estimated number of result rows for that. In this particular example, foo() seems to return an average of about 11 rows per call, versus the default estimate of 1000 rows per call, so the size of the result is overestimated and that dissuades the planner from using a hashed subplan. But the error could easily have gone the other way, causing the planner to use a hashed subplan when it shouldn't, and the consequences of that are even worse. So I don't think that ignoring SRFs in the targetlist is better. If you add "ROWS 10" or so to the declaration of the function, you get a better row estimate and it goes back to the hashed subplan. regards, tom lane
2013/2/11 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> In Czech discussion group was reported performance regression of CTE >> query. I wrote a test, when I can show it. > > I don't see anything tremendously wrong here. The older branches are > choosing the right plan for entirely wrong reasons: they don't notice > that "select foo(a) from pl" has a set-returning function in the > targetlist and so don't adjust the estimated number of result rows > for that. In this particular example, foo() seems to return an average > of about 11 rows per call, versus the default estimate of 1000 rows per > call, so the size of the result is overestimated and that dissuades > the planner from using a hashed subplan. But the error could easily > have gone the other way, causing the planner to use a hashed subplan > when it shouldn't, and the consequences of that are even worse. So > I don't think that ignoring SRFs in the targetlist is better. no, there is strange estimation -> Seq Scan on public.x2 (cost=0.00..345560.00 rows=500 width=4) (actual time=17.914..9330.645 rows=133 loops=1) Output: x2.a Filter: (NOT (SubPlan 2)) Rows Removed by Filter: 867 SubPlan 2 -> CTE Scan on pl pl_1 (cost=0.00..468.59 rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000) Output: foo(pl_1.a) CTE Scan expect rows=89000 I don't know how is possible to take too high number Regards Pavel > > If you add "ROWS 10" or so to the declaration of the function, you > get a better row estimate and it goes back to the hashed subplan. > > regards, tom lane
> > no, there is strange estimation > > -> Seq Scan on public.x2 (cost=0.00..345560.00 rows=500 > width=4) (actual time=17.914..9330.645 rows=133 loops=1) > Output: x2.a > Filter: (NOT (SubPlan 2)) > Rows Removed by Filter: 867 > SubPlan 2 > -> CTE Scan on pl pl_1 (cost=0.00..468.59 > rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000) > Output: foo(pl_1.a) > > CTE Scan expect rows=89000 > > I don't know how is possible to take too high number > respective why estimation is unstrable first (1MB work_mem) -> CTE Scan on pl pl_1 (cost=0.00..468.59 rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000) Output: foo(pl_1.a) second (3MB work_mem) -> Hash (cost=1.78..1.78 rows=89 width=4) (actual time=9.650..9.650 rows=89 loops=1) Output: pl.a Buckets: 1024 Batches: 1 Memory Usage: 3kB -> CTEScan on pl (cost=0.00..1.78 rows=89 width=4) (actual time=8.468..9.346 rows=89 loops=1) Output: pl.a I expect so estimation not depends on work_mem Best regards Pavel > Regards > > Pavel > >> >> If you add "ROWS 10" or so to the declaration of the function, you >> get a better row estimate and it goes back to the hashed subplan. >> >> regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: >> no, there is strange estimation >> >> SubPlan 2 >> -> CTE Scan on pl pl_1 (cost=0.00..468.59 >> rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000) >> Output: foo(pl_1.a) Nothing strange about it. 89 rows out from the underlying CTE (which appears to be dead accurate) times 1000 for the SRF's expansion. > respective why estimation is unstrable It isn't --- you are looking at two different subplans there, one with a SRF in it and one without. regards, tom lane
2013/2/11 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >>> no, there is strange estimation >>> >>> SubPlan 2 >>> -> CTE Scan on pl pl_1 (cost=0.00..468.59 >>> rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000) >>> Output: foo(pl_1.a) > > Nothing strange about it. 89 rows out from the underlying CTE (which > appears to be dead accurate) times 1000 for the SRF's expansion. > >> respective why estimation is unstrable > > It isn't --- you are looking at two different subplans there, one with > a SRF in it and one without. ok, I understand now Thank you Regards Pavel > > regards, tom lane