Re: Enough RAM for entire Database.. cost aside, is this - Mailing list pgsql-general
From | Andy B |
---|---|
Subject | Re: Enough RAM for entire Database.. cost aside, is this |
Date | |
Msg-id | icvFc.18321$HQ1.1803@fe2.news.blueyonder.co.uk Whole thread Raw |
In response to | Enough RAM for entire Database.. cost aside, is this going to be fastest? ("Andy B" <abhousehuntRE-M--O--V-E@blueyonder.co.uk>) |
List | pgsql-general |
>If all the data fits into memory, then this cache thrashing won't occur, yes? No - it *can* occur in a 2 tier cache strategy. The critical question here is: *If* the data postgresql needs is in the linux buffer cache, what (if anything) does the OS have to do to make it available to the postmaster? The worst case would be an actual memory copy of the data from the linux cache to where postgresql actually asked for the file data to be loaded. I'll need to find out more about the linux buffer cache to answer this question for myself. If a memory copy is required, then this is a good way of visualising it: I live in London, but have to work in 5th avenue, NYC for a week. My company stupidly sets me up in a smelly hotel by JFK airport. London is 'The disk system' 5th Avenue is 'The postgresql workspace' My smelly hotel at JFK is the 'linux buffer cache'. Clearly, it's far quicker to commute to work each day from JFK, than it is to commute from London each day. But wouldn't it be faster still if I had a room in the swanky hotel on Fifth avenue above the office? Yes, I'd still have to arrive at JFK, but then I could stay in 5th avenue all week until I'm finished. And as far as direct IO is concerned, it gets better: that would be like taking a plane right straight from London to 5th avenue, avoiding the tedious immigration line at JFK. Having said all of that, I'll go and find more about the linux buffer cache before I pester the Postrgresql cache experts. <snipped your test plan> Thanks for that test plan... I'm in the same position with you w.r.t. machinery - hence my thread (I was hoping someone had tried this!). But I will get my hands on something quite soon and I'll do your test and post the results. I knew I'd have no option than to become very familiar with the behaviour of whatever DB I chose, so I'm looking forward to it (sort of) >That is true, and will be until PG implements it's own cache memory management AND on-disk storage. I'd be willing to bet that any patches that dramatically increase speed for the majority of use cases would be considered strongly. I have to find out for sure, but since linux is being used in lots of HPC clusters now and for some real time applications, I'd be very surprised if there wasn't already a direct IO strategy in the kernel, but that's one of the things I'm looking at now. That being the case, and assuming that the postgresql cache operates on arbitrary sized pages (it seems set at 8KB pages by default - which, depending on the platform may be good, assuming also that the address space is appropriately machine page aligned), then it might be 'very easy indeed' to upgrade postgresql's cache manager to fire the necessary IO calls at the OS. Again - this is really something I need to talk to a pg cache expert about, but I've got a lot of catching up to do first. By the way - Direct IO doesn't mean the application has to know about the disk configuration or anything messy like that. It is still a block IO operation directed through the kernel. From the programmer point of view the changes are trivial. It just bypasses the OS's own generic file cache, and does a super efficient single DMA memory transfer between the IO system and the host RAM - all of this managed has to be managed by the OS, or you'll end up with a platform specific implementation that is of no real use. I would run a mile if this work required Postgresql to know about actual storage. >Finding target data (complex lookup in the PG shared buffers) will take longer than transferring disk pages from the OS cache, based on index information, and then doing a simple, fast scan. This is the current working assumption for postres. This is what scares me a little. Why would the PG cache lookup be complex? That's a question for me to find the answer to... I'll wade through the developer resources and source code etc. before bothering the experts. >Your contention, correct me if I've gotten this wrong, is that transferring it from the OS cache will be slower than finding it using a more complex lookup, but without moving the data from the OS cache to the PG workspace. >In both scenarios all data is in RAM. Well if the PG cache is slower at dealing with large data, then yes, I concede that this changes the picture. But at this point in time, my gut reaction would be to assume that the PG cache is blazingly efficient. It is, after all, the heart of the DB, and all the DBs I've seen have a similar architecture. >I see what you mean about a waste of memory, but if the costing parameters are tuned well for the use of the specific DB then current evidence with postgres shows that the former strategy is faster. I would be interested to see if the latter scenario could be made to work with the current PG codebase, or even if there is a simple patch that could speed up the shared_buffers case. Well it's still not clear that the memory does get wasted, in which case this 'problem' doesn't exist (See my comment about whether a copy occurs between the linux buffer cache and the postmaster's address space.) If linux just returns instantly and it looks like the data was copied, but it just ended up being a vmem pointer reference change, then the only issue is that of making the IO much quicker (when the disk actually needs to be read from or written to). Anyway.. as I said. I'm in learning mode right now! >I guess, though, that the main problem I see with the big shared_buffers idea on linux specifically is that SHMEM is slow on linux. Rumor has it that the SHMEM implementation on Solaris is much faster than on linux, so PG on Solaris may very well benefit from using your approach, but I just don't see it on linux, which is where this discussion started IIRC. Hmm... again more reading for me. It occurs to me that I haven't *fully* decided on linux at all. Linux is great for developing on a shoestring. At the end of the day, I'm quite committed to postgresql in my project. I like a lot of its features and would be loathed to move away from it, so it might be that the final production DB server ends up running pgsql on an OS other than linux. Still, it would be nice if it did work under linux. >I will wait with baited breath ;) Hehe - I've just read back what I've written, and it seems I have to suddenly understand the depths of both the pg kernel and the linux kernel. And I didn't even know about the linux buffer cache until yesterday! So it'll take a wee while, but I'll be back! Thanks again for your comments, Regards, Andy
pgsql-general by date: