Thread: Performance regression on CVS head

Performance regression on CVS head

From
Heikki Linnakangas
Date:
I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, 
but to my surprise the baseline run with CVS head, without the patch, 
behaved very differently than it did back in March.

I rerun the a shorter 1h test with CVS head from May 20th, and March 6th 
(which is when I ran the earlier tests), and something has clearly been 
changed between those dates that affects the test. Test run 248 is with 
CVS checkout from May 20th, and 249 is from March 6th:
http://community.enterprisedb.com/oldestxmin/

Vacuum on the stock table is started right after the rampup, and the 
drop in performance happens at the very moment that the vacuum finishes.

Anyone have an explanation for this?

One theory is that after VACUUM has populated the FSM, all updates need 
to do one extra I/O to read in a page with free space to insert to, 
instead of just extending the relation. But I don't think anything has 
changed recently in that area. Another theory is that the VACUUM updates 
some stats, which changes the access plan used to a much worse one. But 
the tables have been analyzed before the test, and again I don't 
remember any changes to that recently.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Performance regression on CVS head

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, 
> but to my surprise the baseline run with CVS head, without the patch, 
> behaved very differently than it did back in March.

> I rerun the a shorter 1h test with CVS head from May 20th, and March 6th 
> (which is when I ran the earlier tests), and something has clearly been 
> changed between those dates that affects the test. Test run 248 is with 
> CVS checkout from May 20th, and 249 is from March 6th:

May 20th is not quite my idea of "HEAD" ;-).  It might be worth checking
current code before investing any think-time on this.  But having said
that, it looks a bit like a planner problem --- if I'm reading the
graphs correctly, I/O wait time goes through the roof, suggesting a
change to a much less efficient plan.
        regards, tom lane


Re: Performance regression on CVS head

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, 
>> but to my surprise the baseline run with CVS head, without the patch, 
>> behaved very differently than it did back in March.
> 
>> I rerun the a shorter 1h test with CVS head from May 20th, and March 6th 
>> (which is when I ran the earlier tests), and something has clearly been 
>> changed between those dates that affects the test. Test run 248 is with 
>> CVS checkout from May 20th, and 249 is from March 6th:
> 
> May 20th is not quite my idea of "HEAD" ;-).  It might be worth checking
> current code before investing any think-time on this. 

:) Yeah, I did run it with real head at first. I suspected the 
n_live_tuples calculations, and that's why I ran it again with a 
checkout from May 20th.

> But having said
> that, it looks a bit like a planner problem --- if I'm reading the
> graphs correctly, I/O wait time goes through the roof, suggesting a
> change to a much less efficient plan.

Right.

I'll do a "binary search" with a checkouts from different dates runs to 
pin it down.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Performance regression on CVS head

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, 
>> but to my surprise the baseline run with CVS head, without the patch, 
>> behaved very differently than it did back in March.
> 
>> I rerun the a shorter 1h test with CVS head from May 20th, and March 6th 
>> (which is when I ran the earlier tests), and something has clearly been 
>> changed between those dates that affects the test. Test run 248 is with 
>> CVS checkout from May 20th, and 249 is from March 6th:
> 
> May 20th is not quite my idea of "HEAD" ;-).  It might be worth checking
> current code before investing any think-time on this.  But having said
> that, it looks a bit like a planner problem --- if I'm reading the
> graphs correctly, I/O wait time goes through the roof, suggesting a
> change to a much less efficient plan.

I tracked this down to the patch to enable plan invalidation for SPI plans:

http://archives.postgresql.org/pgsql-committers/2007-03/msg00136.php

Apparently the vacuum causes a plan invalidation and a worse plan is 
chosen. I'll dig deeper into which queries are being affected and why. 
Unless someone has any better ideas.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Performance regression on CVS head

From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>>> I tried to repeat the DBT-2 runs with the "oldestxmin refresh" patch, 
>>> but to my surprise the baseline run with CVS head, without the patch, 
>>> behaved very differently than it did back in March.
>>
>>> I rerun the a shorter 1h test with CVS head from May 20th, and March 
>>> 6th (which is when I ran the earlier tests), and something has 
>>> clearly been changed between those dates that affects the test. Test 
>>> run 248 is with CVS checkout from May 20th, and 249 is from March 6th:
>>
>> May 20th is not quite my idea of "HEAD" ;-).  It might be worth checking
>> current code before investing any think-time on this.  But having said
>> that, it looks a bit like a planner problem --- if I'm reading the
>> graphs correctly, I/O wait time goes through the roof, suggesting a
>> change to a much less efficient plan.
> 
> I tracked this down to the patch to enable plan invalidation for SPI plans:
> 
> http://archives.postgresql.org/pgsql-committers/2007-03/msg00136.php
> 
> Apparently the vacuum causes a plan invalidation and a worse plan is 
> chosen. I'll dig deeper into which queries are being affected and why. 
> Unless someone has any better ideas.

Ok, found it. The plan for stock-level transaction changed as a result 
of a lot of dead tuples in the district table.

I turned autovacuum on for the small, frequently-updated tables, and 
that fixed the problem.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com