Re: Enough RAM for entire Database.. cost aside, is this - Mailing list pgsql-general
From | Andy B |
---|---|
Subject | Re: Enough RAM for entire Database.. cost aside, is this |
Date | |
Msg-id | KpmFc.11407$HQ1.9258@fe2.news.blueyonder.co.uk Whole thread Raw |
In response to | Enough RAM for entire Database.. cost aside, is this going to be fastest? ("Andy B" <abhousehuntRE-M--O--V-E@blueyonder.co.uk>) |
List | pgsql-general |
Hello again Mike, Thanks for the replies! Here's my next salvo! > Perhaps I'm a bit naive about complex data structure caching strategies, but > it seems to me that the overhead of tracking tuples (which is what you > would want if you are going to manage your own cache, as opposed to simply > caching disk pages as the OS does) would be memory-prohibitive except in > the most extreme cases of RAM>DB SIZE. But doesn't postgresql already make a decision about what needs to be in its workspace prior to executing a plan? I'm not actually suggesting it do anything different. In fact, the only thing I'm suggesting is that the workspace need not be limited in size. Remember - I'm only talking about the scenario where the entire database - every single tuple - can reside in the workspace (the shared buffer cache) at the same time. There is no extra 'managing your own cache' as you put it. Just let postgresql's postmaster do what it does. (assuming it doesn't kick stuff out of its shared buffer cache unless it needs the space for something else... ok -that may be an incorrect assumption - I need to find out for sure) But I think the overall effort is less, not more. Once in the shared buffer cache, a tuple never needs to be kicked out again to make way for another. That's what is so nice. Everything is just there... No need to calculate a least damaging cache replacement strategy, no need to wait for blocks of data to work their way from the disk or linux buffer cache to the postgresql workspace -it all ends up there, and stays there, with only syncing activity needing to take place when updates happen. > > I find that experience does not bear this out. There is a saying a coworker > of mine has about apps that try to solve problems, in this case caching, > that are well understood and generally handled well at other levels of the > "software stack"... he calls them "too smart by half" :) Well, I guess I'd agree and disagree with your friend. What a political animal I am! Here's why. Imagine this scenario. Your application uses 5 bits of data - A B C D and E. Only 3 can sit in memory at once. You want to do two transactions. The first combines A, B and C, so you load those, in order. Once you're finished with that first transaction, you then require to combine A, C and D. Now, an application driven cache will see immediately that the optimal cache replacement is to load D into the slot taken by B. With the OS in charge, you'll probably end up with a cache thrash. This is simplistic, but it is just this sort of smart caching that can lead to dramatic speedups, and I've got quite a bit of experience of seeing just that. I don't want to devalue the linux cache strategy. It is very appropriate for almost all situations. I'm talking about a very specific scenario here though. Does that make sense? > The problem this this assumption, as Tom Lane has said, is that generally > speaking, a kernel is much faster with respect to cache *lookup* than > postgres is. Postgres assumes that only a subset of the data stored in a > cluster will be needed at any one time. Under this assumption (that it > will need to store and shovel through thousands, instead of millions or > more, tuples) some optimizations have been made that would cause > performance degradation if used over the entire (potentially enormous) > dataset. > Again, in my scenario, postgres is *relieved* of the task of having to decide what data to choose to be in the workspace, since ALL of the data is accessible there in my scenario. The point about postgres having to shovel through millions of tuples is a different point. If I do a query which can only be executed by looking at a million tuples, then that is what has to happen, no matter what caching strategy you employ. Obviously, if you're shovelling data in and out of memory, then the *way* you traverse those tuples becomes important to efficiency, as loading sequential stuff off a disk is always faster. If it's *all* in the workspace, well it doesn't matter at all then - the planner can forget about disks and choose the fastest algorithmic option. Again - this is a simplification, but I hope you can see what I'm getting at. > Come now, Nov. 2001? Most of the issues brought up here have been solved > *in the kernel* since then. (In the general case, with the bevy of IO > schedulers, and the interuptable kernel.) The point is that if your data is sitting in two parts of memory at the same time, then that's a waste of memory, and it's extra time copying all that stuff around. The date of the article isn't relevant to this, I don't think. They make the point that no matter how efficient the OS kernel is, in the sort of scenario I am interested in, it just ends up being an unnecessary step. > > Well, Oracle isn't PG, and I can't speak to their buffering and caching > implementation. I'm just saying that these issues are generic issues for any high performance database server, regardless of whether it is Oracle, Postgresql, Mysql etc. The things they highlight in that article are the very things I allude to here for my special scenario. Even if you're a postgresql fan (and I chose to use it too), it is worth poking your head over the parapet and seeing how others are dealing with these issues too. > > > > (It would be great if you could just load the data directly into the > > shared buffer cache with direct IO, since that OS file buffer->shared > > buffer cache transfer is just a waste of effort in this scenario), and > > <rant> again: EVIDENCE, PLEASE </rant> In this case, I can speak with a little authority (for once) : this is just best practice for a high performance system. If you want an application to get at its data quickly, you use block aligned direct IO to do a single transfer from the disk subsystem (or logical volume or whatever) to the address space where the app wants the data. In fact, the larger you can make the size of a single direct IO transfer, the nearer to the theoretical bandwidth IO limit you can reach. Sticking a generic linux style buffer cache in the middle destroys any chance you will have of touching the performance you could otherwise get from a high performance IO subsystem. You'll just have to trust me on this one.. :) Of course, in my scenario, the only interest in the high speed loading is just to populate the workspace when the server starts up initially, though optimising write performance is something that will benefit you from then on. I've spent years getting data on and off disks quickly, and even something as simple as a non machine page boundary aligned transfer can slow things down by an order of magnitude. It's just a question of doing the straightest, most 'hardware optimised' route. > > We (well, I) don't want you to shut up. Infact I enjoy discussions like > this, because it freshens my memory on just why PG rocks so hard! ;) > Cool! I'm glad that you're really into PG, because I'm sort of counting on it too, so that makes me feel happier, and it's really good to see the newsgroups for the database being so active. Thanks anyway for the replies. I'm going to do a lot more digging and see if I can find anyone who has got their hands dirty trying to do this 'everything in RAM' scenario. I'll come back with anything more interesting that I find. Regards Andy
pgsql-general by date: