Re: O_DIRECT in freebsd - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: O_DIRECT in freebsd |
Date | |
Msg-id | 200306222350.h5MNomr03736@candle.pha.pa.us Whole thread Raw |
In response to | Re: O_DIRECT in freebsd (Sean Chittenden <sean@chittenden.org>) |
Responses |
Re: O_DIRECT in freebsd
|
List | pgsql-hackers |
Basically, we don't know when we read a buffer whether this is a read-only or read/write. In fact, we could read it in, and another backend could write it for us. The big issue is that when we do a write, we don't wait for it to get to disk. It seems to use O_DIRECT, we would have to read the buffer in a special way to mark it as read-only, which seems kind of strange. I see no reason we can't use free-behind in the PostgreSQL buffer cache to handle most of the benefits of O_DIRECT, without the read-only buffer restriction. --------------------------------------------------------------------------- Sean Chittenden wrote: > > _That_ is an excellent point. However, do we know at the time we > > open the file descriptor if we will be doing this? > > Doesn't matter, it's an option to fcntl(). > > > What about cache coherency problems with other backends not opening > > with O_DIRECT? > > That's a problem for the kernel VM, if you mean cache coherency in the > VM. If you mean inside of the backend, that could be a stickier > issue, I think. I don't know enough of the internals yet to know if > this is a problem or not, but you're right, it's certainly something > to consider. Is the cache a write behind cache or is it a read > through cache? If it's a read through cache, which I think it is, > then the backend would have to dirty all cache entries pertaining to > the relations being opened with O_DIRECT. The use case for that > being: > > 1) a transaction begins > 2) a few rows out of the huge table are read > 3) a huge query is performed that triggers the use of O_DIRECT > 4) the rows selected in step 2 are updated (this step should poison or > update the cache, actually, and act as a write through cache if the > data is in the cache) > 5) the same few rows are read in again > 6) transaction is committed > > Provided the cache is poisoned or updated in step 4, I can't see how > or where this would be a problem. Please enlighten if there's a > different case that would need to be taken into account. I can't > imagine ever wanting to write out data using O_DIRECT and think that > it's a read only optimization in an attempt to minimize the turn over > in the OS's cache. From fcntl(2): > > O_DIRECT Minimize or eliminate the cache effects of reading and writ- > ing. The system will attempt to avoid caching the data you > read or write. If it cannot avoid caching the data, it will > minimize the impact the data has on the cache. Use of this > flag can drastically reduce performance if not used with > care. > > > > And finally, how do we deal with the fact that writes to O_DIRECT > > files will wait until the data hits the disk because there is no > > kernel buffer cache? > > Well, two things. > > 1) O_DIRECT should never be used on writes... I can't think of a case > where you'd want it off, even when COPY'ing data and restoring a > DB, it just doesn't make sense to use it. The write buffer is > emptied as soon as the pages hit the disk unless something is > reading those bits, but I'd imagine the write buffer would be used > to make sure that as much writing is done to the platter in a > single write by the OS as possible, circumventing that would be > insane (though useful possibly for embedded devices with low RAM, > solid state drives, or some super nice EMC fiber channel storage > device that basically has its own huge disk cache). > > 2) Last I checked PostgreSQL wasn't a threaded app and doesn't use > non-blocking IO. The backend would block until the call returns, > where's the problem? :) > > If anything O_DIRECT would shake out any bugs in PostgreSQL's caching > code, if there are any. -sc > > -- > Sean Chittenden > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: