Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Gregory Smith
Subject Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date
Msg-id 52E18578.9000700@gmail.com
Whole thread Raw
In response to Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Mel Gorman <mgorman@suse.de>)
List pgsql-hackers
On 1/20/14 9:46 AM, Mel Gorman wrote:
> They could potentially be used to evalate any IO scheduler changes. 
> For example -- deadline scheduler with these parameters has X 
> transactions/sec throughput with average latency of Y millieseconds 
> and a maximum fsync latency of Z seconds. Evaluate how well the 
> out-of-box behaviour compares against it with and without some set of 
> patches. At the very least it would be useful for tracking historical 
> kernel performance over time and bisecting any regressions that got 
> introduced. Once we have a test I think many kernel developers (me at 
> least) can run automated bisections once a test case exists. 

That's the long term goal.  What we used to get out of pgbench were 
things like >60 second latencies when a checkpoint hit with GBs of dirty 
memory.  That does happen in the real world, but that's not a realistic 
case you can tune for very well.  In fact, tuning for it can easily 
degrade performance on more realistic workloads.

The main complexity I don't have a clear view of yet is how much 
unavoidable storage level latency there is in all of the common 
deployment types.  For example, I can take a server with a 256MB 
battery-backed write cache and set dirty_background_bytes to be smaller 
than that.  So checkpoint spikes go away, right?  No. Eventually you 
will see dirty_background_bytes of data going into an already full 256MB 
cache.  And when that happens, the latency will be based on how long it 
takes to write the cached 256MB out to the disks.  If you have a single 
disk or RAID-1 pair, that random I/O could easily happen at 5MB/s or 
less, and that makes for a 51 second cache clearing time.  This is a lot 
better now than it used to be because fsync hasn't flushed the whole 
cache in many years now. (Only RHEL5 systems still in the field suffer 
much from that era of code)  But you do need to look at the distribution 
of latency a bit because of how the cache impact things, you can't just 
consider min/max values.

Take the BBWC out of the equation, and you'll see latency proportional 
to how long it takes to clear the disk's cache out. It's fun "upgrading" 
from a disk with 32MB of cache to 64MB only to watch worst case latency 
double.  At least the kernel does the right thing now, using that cache 
when it can while forcing data out when fsync calls arrive.  (That's 
another important kernel optimization we'll never be able to teach the 
database)

-- 
Greg Smith greg.smith@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Why do we let autovacuum give up?
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Why do we let autovacuum give up?