Re: Why is PostgreSQL 9.2 slower than 9.1 in my tests? - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Why is PostgreSQL 9.2 slower than 9.1 in my tests?
Date
Msg-id CAMkU=1w=Z9KwJ6sB48f=GOehmp8sm0e0MgR2_4+ycs+qnJ55eg@mail.gmail.com
Whole thread Raw
In response to Re: Why is PostgreSQL 9.2 slower than 9.1 in my tests?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Tuesday, December 11, 2012, Tom Lane wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Tue, Dec 11, 2012 at 2:50 AM, Patryk Sidzina
> <patryk.sidzina@gmail.com> wrote:
>> The differences come up when you change the "INSERT" to "EXECUTE 'INSERT'" (
>> and i checked this time on 3 machines, one of which was Windows):

>> FOR i IN 1..cnt LOOP
>> EXECUTE 'INSERT INTO test_table_md_speed(n) VALUES (' || i || ')';
>> END LOOP;

> The culprit is the commit below.  I don't know exactly why this slows
> down your case.  A preliminary oprofile analysis suggests that it most
> of the slowdown is that it calls AllocSetAlloc more often.  I suspect
> that this slow-down will be considered acceptable trade-off for
> getting good parameterized plans.

I'm having a hard time getting excited about optimizing the above case:
the user can do far more to make it fast than we can, simply by not
using EXECUTE, which is utterly unnecessary in this example.

I assumed his example was an intentionally simplified test-case, not a real world use-case.

For a more realistic use, see "[PERFORM] Performance on Bulk Insert to Partitioned Table".  There too it would probably be best to get rid of the EXECUTE, but doing so in that case would certainly have a high cost in trigger-code complexity and maintainability.  (In my test case of loading 1e7 narrow tuples to 100 partitions, the plan cache change lead to a 26% slow down)

 
Having said that, though, it's not real clear to me why the plancache
changes would have affected the speed of EXECUTE at all --- the whole
point of that command is we don't cache a plan for the query.


Doing a bottom level profile isn't helpful because all of the extra time is in very low level code that is called from everywhere.  Doing call-counts with gprof, I see that there is big increase in the calls to copyObject (which indirectly leads to a big increase in AllocSetAlloc).  Before the change, each EXECUTE had one top-level (i.e. nonrecursive) copyObject call, coming from _SPI_prepare_plan.

After the change, each EXECUTE has 4 such top-level copyObject calls, one each from CreateCachedPlan and CompleteCachedPlan and two from BuildCachedPlan.

Cheers,

Jeff
 

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Performance on Bulk Insert to Partitioned Table
Next
From: Charles Gomes
Date:
Subject: Re: Performance on Bulk Insert to Partitioned Table