Re: Advice: Where could I be of help? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Advice: Where could I be of help? |
Date | |
Msg-id | 15090.1033685237@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Advice: Where could I be of help? ("Curtis Faith" <curtis@galtair.com>) |
Responses |
Re: Advice: Where could I be of help?
|
List | pgsql-hackers |
"Curtis Faith" <curtis@galtair.com> writes: > Then during execution if the planner turned out to be VERY wrong about > certain assumptions the execution system could update the stats that led to > those wrong assumptions. That way the system would seek the correct values > automatically. That has been suggested before, but I'm unsure how to make it work. There are a lot of parameters involved in any planning decision and it's not obvious which ones to tweak, or in which direction, if the plan turns out to be bad. But if you can come up with some ideas, go to it! > Everytime a query which requires the index scan runs it will blow out the > entire cache since the scan will load more blocks than the cache > holds. Right, that's the scenario that kills simple LRU ... > LRU-2 might be better but it seems like it still won't give enough priority > to the most frequently used blocks. Blocks touched more than once per query (like the upper-level index blocks) will survive under LRU-2. Blocks touched once per query won't. Seems to me that it should be a win. > My modification was to use access counts to increase the durability of the > more accessed blocks. You could do it that way too, but I'm unsure whether the extra complexity will buy anything. Ultimately, I think an LRU-anything algorithm is equivalent to a clock sweep for those pages that only get touched once per some-long-interval: the single-touch guys get recycled in order of last use, which seems just like a clock sweep around the cache. The guys with some amount of preference get excluded from the once-around sweep. To determine whether LRU-2 is better or worse than some other preference algorithm requires a finer grain of analysis than this. I'm not a fan of "more complex must be better", so I'd want to see why it's better before buying into it ... > The kinds of things I was thinking about should be very portable. I found > that simply writing the cache in order of the file system offset results in > very greatly improved performance since it lets the head seek in smaller > increments and much more smoothly, especially with modern disks. Shouldn't the OS be responsible for scheduling those writes appropriately? Ye good olde elevator algorithm ought to handle this; and it's at least one layer closer to the actual disk layout than we are, thus more likely to issue the writes in a good order. It's worth experimenting with, perhaps, but I'm pretty dubious about it. BTW, one other thing that Vadim kept saying we should do is alter the cache management strategy to retain dirty blocks in memory (ie, give some amount of preference to as-yet-unwritten dirty pages compared to clean pages). There is no reliability cost here since the WAL will let us reconstruct any dirty pages if we crash before they get written; and the periodic checkpoints will ensure that we eventually write a dirty block and thus it will become available for recycling. This seems like a promising line of thought that's orthogonal to the basic LRU-vs-whatever issue. Nobody's got round to looking at it yet though. I've got no idea how much preference should be given to a dirty block --- not infinite, probably, but some. regards, tom lane
pgsql-hackers by date: