Re: Spread checkpoint sync - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Spread checkpoint sync
Date
Msg-id 4D41C881.7090305@2ndquadrant.com
Whole thread Raw
In response to Re: Spread checkpoint sync  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas wrote:
> Based on what I saw looking at this, I'm thinking that the backend
> fsyncs probably happen in clusters - IOW, it's not 2504 backend fsyncs
> spread uniformly throughout the test, but clusters of 100 or more that
> happen in very quick succession, followed by relief when the
> background writer gets around to emptying the queue.

That's exactly the case.  You'll be running along fine, the queue will 
fill, and then hundreds of them can pile up in seconds.  Since the worst 
of that seemed to be during the sync phase of the checkpoint, adding 
additional queue management logic to there is where we started at.  I 
thought this compaction idea would be more difficult to implement than 
your patch proved to be though, so doing this first is working out quite 
well instead.

This is what all the log messages from the patch look like here, at 
scale=500 and shared_buffers=256MB:

DEBUG:  compacted fsync request queue from 32768 entries to 11 entries

That's an 8GB database, and from looking at the relative sizes I'm 
guessing 7 entries refer to the 1GB segments of the accounts table, 2 to 
its main index, and the other 2 are likely branches/tellers data.  Since 
I know the production system I ran into this on has about 400 file 
segments on it regularly dirtied a higher shared_buffers than that, I 
expect this will demolish this class of problem on it, too.

I'll have all the TPS over time graphs available to publish by the end 
of my day here, including tests at a scale of 1000 as well.  Those 
should give a little more insight into how the patch is actually 
impacting high-level performance.  I don't dare disturb the ongoing 
tests by copying all that data out of there until they're finished, will 
be a few hours yet.

My only potential concern over committing this is that I haven't done a 
sanity check over whether it impacts the fsync mechanics in a way that 
might cause an issue.  Your assumptions there are documented and look 
reasonable on quick review; I just haven't had much time yet to look for 
flaws in them.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: SSI patch version 14
Next
From: Noah Misch
Date:
Subject: Re: ALTER TYPE 2: skip already-provable no-work rewrites