Buffer Management - Mailing list pgsql-hackers
From | Curt Sampson |
---|---|
Subject | Buffer Management |
Date | |
Msg-id | Pine.NEB.4.43.0206251232130.17448-100000@angelic.cynic.net Whole thread Raw |
In response to | Re: Index Scans become Seq Scans after VACUUM ANALYSE ("J. R. Nield" <jrnield@usol.com>) |
Responses |
Re: Buffer Management
|
List | pgsql-hackers |
I'm splitting off this buffer mangement stuff into a separate thread. On 24 Jun 2002, J. R. Nield wrote: > I'll back off on that. I don't know if we want to use the OS buffer > manager, but shouldn't we try to have our buffer manager group writes > together by files, and pro-actively get them out to disk? The only way the postgres buffer manager can "get [data] out to disk" is to do an fsync(). For data files (as opposed to log files), this can only slow down overall system throughput, as this would only disrupt the OS's write management. > Right now, it > looks like all our write requests are delayed as long as possible and > the order in which they are written is pretty-much random, as is the > backend that writes the block, so there is no locality of reference even > when the blocks are adjacent on disk, and the write calls are spread-out > over all the backends. It doesn't matter. The OS will introduce locality of reference with its write algorithms. Take a look at http://www.cs.wisc.edu/~solomon/cs537/disksched.html for an example. Most OSes use the elevator or one-way elevator algorithm. So it doesn't matter whether it's one back-end or many writing, and it doesn't matter in what order they do the write. > Would it not be the case that things like read-ahead, grouping writes, > and caching written data are probably best done by PostgreSQL, because > only our buffer manager can understand when they will be useful or when > they will thrash the cache? Operating systems these days are not too bad at guessing guessing what you're doing. Pretty much every OS I've seen will do read-ahead when it detects you're doing sequential reads, at least in the forward direction. And Solaris is even smart enough to mark the pages you've read as "not needed" so that they quickly get flushed from the cache, rather than blowing out your entire cache if you go through a large file. > Would O_DSYNC|O_RSYNC turn off the cache? No. I suppose there's nothing to stop it doing so, in some implementations, but the interface is not designed for direct I/O. > Since you know a lot about NetBSD internals, I'd be interested in > hearing about what postgresql looks like to the NetBSD buffer manager. Well, looks like pretty much any program, or group of programs, doing a lot of I/O. :-) > Am I right that strings of successive writes get randomized? No; as I pointed out, they in fact get de-randomized as much as possible. The more proceses you have throwing out requests, the better the throughput will be in fact. > What do our cache-hit percentages look like? I'm going to do some > experimenting with this. Well, that depends on how much memory you have and what your working set is. :-) cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
pgsql-hackers by date: