Re: [Testperf-general] Re: ExclusiveLock - Mailing list pgsql-hackers

From Kenneth Marshall
Subject Re: [Testperf-general] Re: ExclusiveLock
Date
Msg-id 20041123024037.GA8735@it.is.rice.edu
Whole thread Raw
In response to Re: [Testperf-general] Re: ExclusiveLock  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Tue, Nov 23, 2004 at 12:04:17AM +0000, Simon Riggs wrote:
> On Mon, 2004-11-22 at 23:37, Greg Stark wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > 
> > > - Find a way to reduce rotational delay when repeatedly writing last WAL
> > > page 
> > > 
> > > Currently fsync of WAL requires the disk platter to perform a full
> > > rotation to fsync again. One idea is to write the WAL to different
> > > offsets that might reduce the rotational delay. 
> > 
> > Once upon a time when you formatted hard drives you actually gave them an
> > interleave factor for a similar reason. These days you invariably use an
> > interleave of 1, ie, store the blocks continuously. Whether that's because
> > controllers have become fast enough to keep up with the burst rate or because
> > the firmware is smart enough to handle the block interleaving invisibly isn't
> > clear to me.
> > 
> > I wonder if formatting the drive to have an interleave >1 would actually
> > improve performance of the WAL log. 
> > 
> > It would depend a lot on the usage pattern though. A heavily used system might
> > be able to generate enough WAL traffic to keep up with the burst rate of the
> > drive. And an less used system might benefit but might lose.
> > 
> > Probably now the less than saturated system gets close to the average
> > half-rotation-time latency. This idea would only really help if you have a
> > system that happens to be triggering pessimal results worse than that due to
> > unfortunate timing.
> 
> I was asking whether that topic should be removed, since Tom had said it
> had been rejected....
> 
> If you could tell me how to instrument the system to (better) show
> whether such plans as you suggest are workable, I would be greatly
> interested. Anything we do needs to be able to be monitored for
> success/failure.
> 
> -- 
> Best Regards, Simon Riggs
> 

The disk performance has increased so much that the reasons for having
an interleave factor other than 1 (no interleaving) have all but disappeared.
CPU speed has also increased so much relative to disk speed that using some
CPU cycles to improve I/O is a reasonable approach. I have been considering
how this might be accomplished. As Simon so aptly pointed out, we need to
show that it materially affects the performance or it is not worth doing.
The simplest idea I had was to pre-layout the WAL logs in a contiguous fashion
on the disk. Solaris has this ability given appropriate FS parameters and we
should be able to get close on most other OSes. Once that has happened, use
something like the FSM map to show the allocated blocks. The CPU can keep track
of its current disk rotational position (approx. is okay) then when we need to
write a WAL block start writing at the next area that the disk head will be
sweeping. Give it a little leaway for latency in the system and we should be
able to get very low latency for the writes. Obviously, there would be wasted
space but you could intersperse writes to the granularity of space overhead
that you would like to see. As far as implementation, I was reading an
interesting article that used a simple theoretical model to estimate disk head
position to avoid latency.

Yours truly,
Ken Marshall


pgsql-hackers by date:

Previous
From: "Constantin Teodorescu"
Date:
Subject: Big number of schemas (3500) into a single database
Next
From: "Arnold.Zhu"
Date:
Subject: How to make @id or $id as parameter name in plpgsql, is it available?