Thread: Performance regression on CVS head
I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, but to my surprise the baseline run with CVS head, without the patch, behaved very differently than it did back in March. I rerun the a shorter 1h test with CVS head from May 20th, and March 6th (which is when I ran the earlier tests), and something has clearly been changed between those dates that affects the test. Test run 248 is with CVS checkout from May 20th, and 249 is from March 6th: http://community.enterprisedb.com/oldestxmin/ Vacuum on the stock table is started right after the rampup, and the drop in performance happens at the very moment that the vacuum finishes. Anyone have an explanation for this? One theory is that after VACUUM has populated the FSM, all updates need to do one extra I/O to read in a page with free space to insert to, instead of just extending the relation. But I don't think anything has changed recently in that area. Another theory is that the VACUUM updates some stats, which changes the access plan used to a much worse one. But the tables have been analyzed before the test, and again I don't remember any changes to that recently. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, > but to my surprise the baseline run with CVS head, without the patch, > behaved very differently than it did back in March. > I rerun the a shorter 1h test with CVS head from May 20th, and March 6th > (which is when I ran the earlier tests), and something has clearly been > changed between those dates that affects the test. Test run 248 is with > CVS checkout from May 20th, and 249 is from March 6th: May 20th is not quite my idea of "HEAD" ;-). It might be worth checking current code before investing any think-time on this. But having said that, it looks a bit like a planner problem --- if I'm reading the graphs correctly, I/O wait time goes through the roof, suggesting a change to a much less efficient plan. regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, >> but to my surprise the baseline run with CVS head, without the patch, >> behaved very differently than it did back in March. > >> I rerun the a shorter 1h test with CVS head from May 20th, and March 6th >> (which is when I ran the earlier tests), and something has clearly been >> changed between those dates that affects the test. Test run 248 is with >> CVS checkout from May 20th, and 249 is from March 6th: > > May 20th is not quite my idea of "HEAD" ;-). It might be worth checking > current code before investing any think-time on this. :) Yeah, I did run it with real head at first. I suspected the n_live_tuples calculations, and that's why I ran it again with a checkout from May 20th. > But having said > that, it looks a bit like a planner problem --- if I'm reading the > graphs correctly, I/O wait time goes through the roof, suggesting a > change to a much less efficient plan. Right. I'll do a "binary search" with a checkouts from different dates runs to pin it down. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, >> but to my surprise the baseline run with CVS head, without the patch, >> behaved very differently than it did back in March. > >> I rerun the a shorter 1h test with CVS head from May 20th, and March 6th >> (which is when I ran the earlier tests), and something has clearly been >> changed between those dates that affects the test. Test run 248 is with >> CVS checkout from May 20th, and 249 is from March 6th: > > May 20th is not quite my idea of "HEAD" ;-). It might be worth checking > current code before investing any think-time on this. But having said > that, it looks a bit like a planner problem --- if I'm reading the > graphs correctly, I/O wait time goes through the roof, suggesting a > change to a much less efficient plan. I tracked this down to the patch to enable plan invalidation for SPI plans: http://archives.postgresql.org/pgsql-committers/2007-03/msg00136.php Apparently the vacuum causes a plan invalidation and a worse plan is chosen. I'll dig deeper into which queries are being affected and why. Unless someone has any better ideas. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Tom Lane wrote: >> Heikki Linnakangas <heikki@enterprisedb.com> writes: >>> I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, >>> but to my surprise the baseline run with CVS head, without the patch, >>> behaved very differently than it did back in March. >> >>> I rerun the a shorter 1h test with CVS head from May 20th, and March >>> 6th (which is when I ran the earlier tests), and something has >>> clearly been changed between those dates that affects the test. Test >>> run 248 is with CVS checkout from May 20th, and 249 is from March 6th: >> >> May 20th is not quite my idea of "HEAD" ;-). It might be worth checking >> current code before investing any think-time on this. But having said >> that, it looks a bit like a planner problem --- if I'm reading the >> graphs correctly, I/O wait time goes through the roof, suggesting a >> change to a much less efficient plan. > > I tracked this down to the patch to enable plan invalidation for SPI plans: > > http://archives.postgresql.org/pgsql-committers/2007-03/msg00136.php > > Apparently the vacuum causes a plan invalidation and a worse plan is > chosen. I'll dig deeper into which queries are being affected and why. > Unless someone has any better ideas. Ok, found it. The plan for stock-level transaction changed as a result of a lot of dead tuples in the district table. I turned autovacuum on for the small, frequently-updated tables, and that fixed the problem. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com