Dirty Buffer Writing [was Proposed LogWriter Scheme] - Mailing list pgsql-hackers
From | Curtis Faith |
---|---|
Subject | Dirty Buffer Writing [was Proposed LogWriter Scheme] |
Date | |
Msg-id | DMEEJMCDOJAKPPFACMPMKEFCCEAA.curtis@galtair.com Whole thread Raw |
In response to | Re: Proposed LogWriter Scheme, WAS: Potential Large (Greg Copeland <greg@CopelandConsulting.Net>) |
Responses |
Re: Dirty Buffer Writing [was Proposed LogWriter Scheme]
|
List | pgsql-hackers |
> On Sun, 2002-10-06 at 11:46, Tom Lane wrote: > > I can't personally get excited about something that only helps if your > > server is starved for RAM --- who runs servers that aren't fat on RAM > > anymore? But give it a shot if you like. Perhaps your analysis is > > pessimistic. > > <snipped> I don't find it far fetched to > imagine situations where people may commit large amounts of memory for > the database yet marginally starve available memory for file system > buffers. Especially so on heavily I/O bound systems or where sporadicly > other types of non-database file activity may occur. > > <snipped> Of course, that opens the door for simply adding more memory > and/or slightly reducing the amount of memory available to the database > (thus making it available elsewhere). Now, after all that's said and > done, having something like aio in use would seemingly allowing it to be > somewhat more "self-tuning" from a potential performance perspective. Good points. Now for some surprising news (at least it surprised me). I researched the file system source on my system (FreeBSD 4.6) and found that the behavior was optimized for non-database access to eliminate unnecessary writes when temp files are created and deleted rapidly. It was not optimized to get data to the disk in the most efficient manner. The syncer on FreeBSD appears to place dirtied filesystem buffers into work queues that range from 1 to SYNCER_MAXDELAY. Each second the syncer processes one of the queues and increments a counter syncer_delayno. On my system the setting for SYNCER_MAXDELAY is 32. So each second 1/32nd of the writes that were buffered are processed. If the syncer gets behind and the writes for a given second exceed one second to process the syncer does not wait but begins processing the next queue. AFAICT this means that there is no opportunity to have writes combined by the disk since they are processed in buckets based on the time the writes came in. Also, it seems very likely that many installations won't have enough buffers for 30 seconds worth of changes and that there would be some level of SYNCHRONOUS writing because of this delay and the syncer process getting backed up. This might happen once per second as the buffers get full and the syncer has not yet started for that second interval. Linux might handle this better. I saw some emails exchanged a year or so ago about starting writes immediately in a low-priority way but I'm not sure if those patches got applied to the linux kernel or not. The source I had access to seems to do something analogous to FreeBSD but using fixed percentages of the dirty blocks or a minimum number of blocks. They appear to be handled in LRU order however. On-disk caches are much much larger these days so it seems that some way of getting the data out sooner would result in better write performance for the cache. My newer drive is a 10K RPM IBM Ultrastar SCSI and it has a 4M cache. I don't see these caches getting smaller over time so not letting the disk see writes will become more and more of a performance drain. - Curtis
pgsql-hackers by date: