Re: adding support for posix_fadvise() - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: adding support for posix_fadvise() |
Date | |
Msg-id | 1067851295.2580.12.camel@fuji.krosing.net Whole thread Raw |
In response to | adding support for posix_fadvise() (Neil Conway <neilc@samurai.com>) |
Responses |
Re: adding support for posix_fadvise()
|
List | pgsql-hackers |
Neil Conway kirjutas E, 03.11.2003 kell 08:07: > A couple days ago, Manfred Spraul mentioned the posix_fadvise() API on > -hackers: > > http://www.opengroup.org/onlinepubs/007904975/functions/posix_fadvise.html > > I'm working on making use of posix_fadvise() where appropriate. I can > think of the following places where this would be useful: > > (1) As Manfred originally noted, when we advance to a new XLOG segment, > we can use POSIX_FADV_DONTNEED to let the kernel know we won't be > accessing the old WAL segment anymore. I've attached a quick kludge of a > patch that implements this. I haven't done any benchmarking of it yet, > though (comments or benchmark results are welcome). > > (2) ISTM that we can set POSIX_FADV_RANDOM for *all* indexes, since the > vast majority of the accesses to them shouldn't be sequential. Are there > any situations in which this assumption doesn't hold? (Perhaps B+-tree > bulk loading, or CLUSTER?) Should this be done per-index-AM, or > globally? Perhaps we could do it for all _leaf_ nodes, the root and intermediate nodes are usually better kept in cache. > (3) When doing VACUUM, ANALYZE, or large sequential scans (for some > reasonable definition of "large"), we can use POSIX_FADV_SEQUENTIAL. perhaps just sequential scans without "large" ? > (4) Various other components, such as tuplestore, tuplesort, and any > utility commands that need to scan through an entire user relation for > some reason. Once we've got the APIs for doing this worked out, it > should be relatively easy to add other uses of posix_fadvise(). > > (5) I'm hesitant to make use of POSIX_FADV_DONTNEED in VACUUM, as has > been suggested elsewhere. The problem is that it's all-or-nothing: if > the VACUUM happens to look at hot pages, these will be flushed from the > page cache, so the net result may be a loss. True. POSIX_FADV_DONTNEED should be only used if the page was retrieved by VACUUM. > So what API is desirable for uses 2-4? I'm thinking of adding a new > function to the smgr API, smgradvise(). Given a Relation and an advice, > this would: > > (a) propagate the advice for this relation to all the open FDs for the > relation > > (b) store the new advice somewhere so that new FDs for the relation can > have this advice set for them: clients should just be able to call > smgradvise() without needing to worry if someone else has already called > smgropen() for the relation in the past. One problem is how to store > this: I don't think it can be a field of RelationData, since that is > transient. Any suggestions? also, you may want to restore old FADV* after you are done - just running one seqscan should probably not leave the relation in POSIX_FADV_SEQUENTIAL mode forever. > Note that I'm assuming that we don't need to set advice on sub-sections > of a relation, although the posix_fadvise() API allows it -- does anyone > think that would be useful? > > One potential issue is that when one process calls posix_fadvise() on a > particular FD, I'd expect that other processes accessing the same file > will be affected. For example, enabling FADV_SEQUENTIAL while we're > vacuuming a relation will mean that another client doing a concurrent > SELECT on the relation will see different readahead behavior. That > doesn't seem like a major problem though. > > BTW, posix_fadvise() is currently only supported on Linux 2.6 w/ a > recent version of glibc (BSD hackers, if you're listening, > posix_fadvise() would be a very cool thing to have :P). So we'll need to > do the appropriate configure magic to ensure we only use it where its > available. Thankfully, it is a POSIX standard, so I would expect that in > the years to come it will be available on more platforms. > > Any comments would be welcome. > > -Neil > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings
pgsql-hackers by date: