Thread: Singleton SELECT inside cursor loop
AWS RDS Postgresql 12.11 It runs quickly, no matter what the fetched cursor values are, as long as the cursor returns less than 8 rows. After that, the singleton SELECT's performance tanks. We've got plenty of other procedures which have a similar "Singleton SELECT inside a cursor" design (many even query the same table that the problematic query does), but only this procedure's singleton SELECT tanks if the cursor returns more than 8 rows. It only runs once a month, so we've only just noticed the problem. Why would that be? (We recently updated from 12.10 to 12.11, if that's relevant.) -- Angular momentum makes the world go 'round.
I extracted the singleton into a function, and that helped me isolate the problem. Note how quickly it runs the first five times, but takes 780x longer the sixth time I run it. Exiting psql and entering again causes the same slowness the sixth time it's run. sides=> select strans.TASK001785639(1, 507650, '2022-08-01', '2022-08-31'); task001785639 --------------- 498 (1 row) sides=> select strans.TASK001785639(1, 507650, '2022-08-01', '2022-08-31'); task001785639 --------------- 498 (1 row) sides=> \timing Timing is on. sides=> select strans.TASK001785639(1, 507650, '2022-08-01', '2022-08-31'); task001785639 --------------- 498 (1 row) Time: 457.325 ms sides=> select strans.TASK001785639(1, 507650, '2022-08-01', '2022-08-31'); task001785639 --------------- 498 (1 row) Time: 459.724 ms sides=> select strans.TASK001785639(1, 507650, '2022-08-01', '2022-08-31'); task001785639 --------------- 498 (1 row) Time: 461.155 ms sides=> select strans.TASK001785639(1, 507650, '2022-08-01', '2022-08-31'); task001785639 --------------- 498 (1 row) Time: 358426.159 ms (05:58.426) On 10/1/22 01:29, Ron wrote: > AWS RDS Postgresql 12.11 > > It runs quickly, no matter what the fetched cursor values are, as long as > the cursor returns less than 8 rows. After that, the singleton SELECT's > performance tanks. > > We've got plenty of other procedures which have a similar "Singleton > SELECT inside a cursor" design (many even query the same table that the > problematic query does), but only this procedure's singleton SELECT tanks > if the cursor returns more than 8 rows. > > It only runs once a month, so we've only just noticed the problem. > > Why would that be? (We recently updated from 12.10 to 12.11, if that's > relevant.) > -- Angular momentum makes the world go 'round.
Attachment
On 2022-10-01 02:22:22 -0500, Ron wrote: > I extracted the singleton into a function, and that helped me isolate the > problem. I don't know what a singleton select is (duckduckgo tells me it has something to do with DB2 and/or COBOL, or maybe it's a whisky), but: > Note how quickly it runs the first five times, but takes 780x longer the > sixth time I run it. Exiting psql and entering again causes the same > slowness the sixth time it's run. PostgresSQL evaluates the exact plan the first 5 times it encounters the same prepared query. After that it switches to a generic plan if it thinks that the generic plan isn't worse than the specialized plans. If the plan suddenly gets worse after 5 executions, you've probably run into a case where the generic plan is worse although the cost computed by the planner isn't. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
## Ron (ronljohnsonjr@gmail.com): > Note how quickly it runs the first five times, but takes 780x longer the > sixth time I run it. Exiting psql and entering again causes the same > slowness the sixth time it's run. Tanks at the sixth time? That rings a bell: "The current rule for this is that the first five executions are done with custom plans[...]" from https://www.postgresql.org/docs/12/sql-prepare.html And your functions create prepared statements under the hood: https://www.postgresql.org/docs/12/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING I guess you should visit https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE and try if plan_cache_mode = force_custom_plan helps here. Regards, Christoph -- Spare Space
On 10/1/22 07:20, Peter J. Holzer wrote: > On 2022-10-01 02:22:22 -0500, Ron wrote: >> I extracted the singleton into a function, and that helped me isolate the >> problem. > I don't know what a singleton select is (duckduckgo tells me it has > something to do with DB2 and/or COBOL, or maybe it's a whisky), but: A query which returns one row. Typically SELECT COUNT/MAX/etc without a GROUP BY clause. -- Angular momentum makes the world go 'round.
On 10/1/22 07:21, Christoph Moench-Tegeder wrote: > ## Ron (ronljohnsonjr@gmail.com): > >> Note how quickly it runs the first five times, but takes 780x longer the >> sixth time I run it. Exiting psql and entering again causes the same >> slowness the sixth time it's run. > Tanks at the sixth time? That rings a bell: "The current rule for this > is that the first five executions are done with custom plans[...]" from > https://www.postgresql.org/docs/12/sql-prepare.html The question then is "why am I just now seeing the problem?" We've been using v12 for two years, and it just happened. The only recent change is that I upgraded it from RDS 12.10 to 12.11 a couple of weeks ago. > And your functions create prepared statements under the hood: > https://www.postgresql.org/docs/12/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING The same thing happens when I put the SELECT in a prepared statement, so that seems the likely cause. > I guess you should visit > https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE > and try if plan_cache_mode = force_custom_plan helps here. That solved the problem. Thank you! -- Angular momentum makes the world go 'round.
## Ron (ronljohnsonjr@gmail.com): > The question then is "why am I just now seeing the problem?" We've been > using v12 for two years, and it just happened. > > The only recent change is that I upgraded it from RDS 12.10 to 12.11 a > couple of weeks ago. That's correlation, but no proof for causality. Now that you've confirmed that you have indeed a mismatch between generic and custom plan, you could compare those (EXPLAIN EXECUTE) and see where the generic plan goes wrong. Otherwise, prime suspects are bad statistics (Were stats accurate before the update? How about now? Ran VACUUM ANALYZE recently? If not, does it improve matters?) If would help if you could compare execution plans with plans from before the update, but I guess you didn't document those? Regards, Christoph -- Spare Space
On 10/1/22 14:54, Christoph Moench-Tegeder wrote: > ## Ron (ronljohnsonjr@gmail.com): > >> The question then is "why am I just now seeing the problem?" We've been >> using v12 for two years, and it just happened. >> >> The only recent change is that I upgraded it from RDS 12.10 to 12.11 a >> couple of weeks ago. > That's correlation, but no proof for causality. Right. But it is an important change which happened between job runs (the 22nd of each month). > Now that you've confirmed that you have indeed a mismatch between generic > and custom plan, you could compare those (EXPLAIN EXECUTE) and see > where the generic plan goes wrong. I'll rerun the EXPLAIN EXECUTE with and without "plan_cache_mode = force_custom_plan", and attach them in a reply some time soon. > Otherwise, prime suspects are bad > statistics (Were stats accurate before the update? How about now? > Ran VACUUM ANALYZE recently? VACUUM yesterday at 04:00, and ANALYZE at 09:00, both from a cron job. Manually ran ANALYZE last night during the dig into the problem. > If not, does it improve matters?) Nope. Not one bit. > If would help if you could compare execution plans with plans from > before the update, but I guess you didn't document those? So, since ANALYZE did not help. -- Angular momentum makes the world go 'round.
On 10/1/22 15:42, Ron wrote:
Attached are explain outputs from: , and then
TASK001785639_explain_output_custom.log: a "first five" fast execution
TASK001785639_explain_output_generic.log: "the sixth" (generic) plan when it took 6 minutes.
Next are similar plans except where default_statistic_target = 1000, and the table is reanalyzed. It didn't help with this query.
TASK001785639_explain_output_custom_def_stats_1000.log
TASK001785639_explain_output_generic_def_stats_1000.log
Finally is the explain output from "plan_cache_mode = force_custom_plan":
TASK001785639_explain_output_force_custom_def_stats_1000.log
According to meld diff, custom_def_stats_1000 and force_custom_def_stats_1000 have surprisingly similar plans (the only difference being that in the forced custom plan, 2 workers were launched, and so they filtered out some rows. Execution time was about 460ms in both.
[snip]
EDIT: No, since ANALYZE did not help.
On 10/1/22 14:54, Christoph Moench-Tegeder wrote:## Ron (ronljohnsonjr@gmail.com):The question then is "why am I just now seeing the problem?" We've been using v12 for two years, and it just happened. The only recent change is that I upgraded it from RDS 12.10 to 12.11 a couple of weeks ago.That's correlation, but no proof for causality.
Right. But it is an important change which happened between job runs (the 22nd of each month).Now that you've confirmed that you have indeed a mismatch between generic and custom plan, you could compare those (EXPLAIN EXECUTE) and see where the generic plan goes wrong.
I'll rerun the EXPLAIN EXECUTE with and without "plan_cache_mode = force_custom_plan", and attach them in a reply some time soon.
Attached are explain outputs from: , and then
TASK001785639_explain_output_custom.log: a "first five" fast execution
TASK001785639_explain_output_generic.log: "the sixth" (generic) plan when it took 6 minutes.
Next are similar plans except where default_statistic_target = 1000, and the table is reanalyzed. It didn't help with this query.
TASK001785639_explain_output_custom_def_stats_1000.log
TASK001785639_explain_output_generic_def_stats_1000.log
Finally is the explain output from "plan_cache_mode = force_custom_plan":
TASK001785639_explain_output_force_custom_def_stats_1000.log
According to meld diff, custom_def_stats_1000 and force_custom_def_stats_1000 have surprisingly similar plans (the only difference being that in the forced custom plan, 2 workers were launched, and so they filtered out some rows. Execution time was about 460ms in both.
[snip]
If would help if you could compare execution plans with plans from before the update, but I guess you didn't document those?
So, since ANALYZE did not help.
EDIT: No, since ANALYZE did not help.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.