Thread: Is There Any Way ....
... to do the following: (1) Make a table memory-resident only ? (2) Set up user variables in memory that are persistent across all sessions, for as long as the database is up and running ? (3) Assure that a disk-based table is always in memory (outside of keeping it in memory buffers as a result of frequent activity which would prevent LRU operations from taking it out) ?
1) AFAIK, no. Just in case you are thinking "There should be a way coz I know it will be used all the time", you must know that postgresql philosophy is "I'm smarter than you". If table is used all the time, it will be in memory, if not, it won't waste memory. 2) don't know. 3) see number 1) Of course, you could run into a pathological case where table is queried just before being taken out of memory. But it means, the table isn't queried all the time... Greetings... -----Mensaje original----- De: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]En nombre de Lane Van Ingen Enviado el: jueves, 29 de septiembre de 2005 20:21 Para: pgsql-performance@postgresql.org Asunto: [PERFORM] Is There Any Way .... ... to do the following: (1) Make a table memory-resident only ? (2) Set up user variables in memory that are persistent across all sessions, for as long as the database is up and running ? (3) Assure that a disk-based table is always in memory (outside of keeping it in memory buffers as a result of frequent activity which would prevent LRU operations from taking it out) ? ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
Quoting Lane Van Ingen <lvaningen@esncc.com>: > ... to do the following: > (1) Make a table memory-resident only ? Put it on a RAM filesystem. On Linux, shmfs. On *BSD, mfs. Solaris, tmpfs. > (2) Set up user variables in memory that are persistent across all > sessions, for > as long as the database is up and running ? This sounds like a client thing? Dunno. > (3) Assure that a disk-based table is always in memory (outside of > keeping > it in > memory buffers as a result of frequent activity which would prevent > LRU > operations from taking it out) ? > Put on RAM fs (like question 1). Basically, RAM filesystems are on RAM, meaning you need to have enough physical memory to support them. And of course their contents completely disappear between reboots, so you'll need a way to populate them on bootup and make sure that your updates go to a real nonvolatile storage medium (like disks). And you might get swapping on some types of memory filesystems--Solaris' tmpfs is carved out of virtual memory, which means it will cause swapping if tmpfs contents plus the rest of your applications exceed physical memory. > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
On Thu, Sep 29, 2005 at 07:21:08PM -0400, Lane Van Ingen wrote: > (1) Make a table memory-resident only ? You might want to look into memcached, but it's impossible to say whether it will fit your needs or not without more details. /* Steinar */ -- Homepage: http://www.sesse.net/
Lane Van Ingen wrote: > (2) Set up user variables in memory that are persistent across all > sessions, for > as long as the database is up and running ? You could probably write "C" functions (or possibly Perl) to store data in shared memory. Of course you'd have to deal with concurrency issues yourself. Storing the values in a table and having cached access to them during the session is probably your best bet. -- Richard Huxton Archonet Ltd
On 2005-09-30 01:21, Lane Van Ingen wrote: > (3) Assure that a disk-based table is always in memory (outside of keeping > it in > memory buffers as a result of frequent activity which would prevent > LRU > operations from taking it out) ? I was wondering about this too. IMO it would be useful to have a way to tell PG that some tables were needed frequently, and should be cached if possible. This would allow application developers to consider joins with these tables as "cheap", even when querying on columns that are not indexed. I'm thinking about smallish tables like users, groups, *types, etc which would be needed every 2-3 queries, but might be swept out of RAM by one large query in between. Keeping a table like "users" on a RAM fs would not be an option, because the information is not volatile. cheers, stefan
Yes, Stefan, the kind of usage you are mentioning is exactly why I was asking. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Stefan Weiss Sent: Tuesday, October 04, 2005 6:32 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Is There Any Way .... On 2005-09-30 01:21, Lane Van Ingen wrote: > (3) Assure that a disk-based table is always in memory (outside of keeping > it in > memory buffers as a result of frequent activity which would prevent > LRU > operations from taking it out) ? I was wondering about this too. IMO it would be useful to have a way to tell PG that some tables were needed frequently, and should be cached if possible. This would allow application developers to consider joins with these tables as "cheap", even when querying on columns that are not indexed. I'm thinking about smallish tables like users, groups, *types, etc which would be needed every 2-3 queries, but might be swept out of RAM by one large query in between. Keeping a table like "users" on a RAM fs would not be an option, because the information is not volatile. cheers, stefan ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings
On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote: > On 2005-09-30 01:21, Lane Van Ingen wrote: > > (3) Assure that a disk-based table is always in memory (outside of keeping > > it in > > memory buffers as a result of frequent activity which would prevent > > LRU > > operations from taking it out) ? > > I was wondering about this too. IMO it would be useful to have a way to tell > PG that some tables were needed frequently, and should be cached if > possible. This would allow application developers to consider joins with > these tables as "cheap", even when querying on columns that are not indexed. > I'm thinking about smallish tables like users, groups, *types, etc which > would be needed every 2-3 queries, but might be swept out of RAM by one > large query in between. Keeping a table like "users" on a RAM fs would not > be an option, because the information is not volatile. Why do you think you'll know better than the database how frequently something is used? At best, your guess will be correct and PostgreSQL (or the kernel) will keep the table in memory. Or, your guess is wrong and you end up wasting memory that could have been used for something else. It would probably be better if you describe why you want to force this table (or tables) into memory, so we can point you at more appropriate solutions. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Which version of PG are you using? One of the new features for 8.0 was an improved caching algorithm that was smart enough to avoid letting a single big query sweep everything else out of cache. -- Mark Lewis On Tue, 2005-10-04 at 10:45 -0400, Lane Van Ingen wrote: > Yes, Stefan, the kind of usage you are mentioning is exactly why I was > asking. > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Stefan Weiss > Sent: Tuesday, October 04, 2005 6:32 AM > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Is There Any Way .... > > > On 2005-09-30 01:21, Lane Van Ingen wrote: > > (3) Assure that a disk-based table is always in memory (outside of > keeping > > it in > > memory buffers as a result of frequent activity which would prevent > > LRU > > operations from taking it out) ? > > I was wondering about this too. IMO it would be useful to have a way to tell > PG that some tables were needed frequently, and should be cached if > possible. This would allow application developers to consider joins with > these tables as "cheap", even when querying on columns that are not indexed. > I'm thinking about smallish tables like users, groups, *types, etc which > would be needed every 2-3 queries, but might be swept out of RAM by one > large query in between. Keeping a table like "users" on a RAM fs would not > be an option, because the information is not volatile. > > > cheers, > stefan > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend
pg is _very_ stupid about caching. Almost all of the caching is left to the OS, and it's that way by design (as post after post by TL has pointed out). That means pg has almost no ability to take application domain specific knowledge into account when deciding what to cache. There's plenty of papers on caching out there that show that context dependent knowledge leads to more effective caching algorithms than context independent ones are capable of. (Which means said design choice is a Mistake, but unfortunately one with too much inertia behind it currentyl to change easily.) Under these circumstances, it is quite possible that an expert class human could optimize memory usage better than the OS + pg. If one is _sure_ they know what they are doing, I'd suggest using tmpfs or the equivalent for critical read-only tables. For "hot" tables that are rarely written to and where data loss would not be a disaster, "tmpfs" can be combined with an asyncronous writer process push updates to HD. Just remember that a power hit means that The (much) more expensive alternative is to buy SSD(s) and put the critical tables on it at load time. Ron -----Original Message----- From: "Jim C. Nasby" <jnasby@pervasive.com> Sent: Oct 4, 2005 4:57 PM To: Stefan Weiss <spaceman@foo.at> Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Is There Any Way .... On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote: > On 2005-09-30 01:21, Lane Van Ingen wrote: > > (3) Assure that a disk-based table is always in memory (outside of keeping > > it in > > memory buffers as a result of frequent activity which would prevent > > LRU > > operations from taking it out) ? > > I was wondering about this too. IMO it would be useful to have a way to tell > PG that some tables were needed frequently, and should be cached if > possible. This would allow application developers to consider joins with > these tables as "cheap", even when querying on columns that are not indexed. > I'm thinking about smallish tables like users, groups, *types, etc which > would be needed every 2-3 queries, but might be swept out of RAM by one > large query in between. Keeping a table like "users" on a RAM fs would not > be an option, because the information is not volatile. Why do you think you'll know better than the database how frequently something is used? At best, your guess will be correct and PostgreSQL (or the kernel) will keep the table in memory. Or, your guess is wrong and you end up wasting memory that could have been used for something else. It would probably be better if you describe why you want to force this table (or tables) into memory, so we can point you at more appropriate solutions.
On Tue, Oct 04, 2005 at 07:33:47PM -0400, Ron Peacetree wrote: > pg is _very_ stupid about caching. Almost all of the caching is left > to the OS, and it's that way by design (as post after post by TL has > pointed out). > > That means pg has almost no ability to take application domain > specific knowledge into account when deciding what to cache. > There's plenty of papers on caching out there that show that > context dependent knowledge leads to more effective caching > algorithms than context independent ones are capable of. > > (Which means said design choice is a Mistake, but unfortunately > one with too much inertia behind it currentyl to change easily.) > > Under these circumstances, it is quite possible that an expert class > human could optimize memory usage better than the OS + pg. Do you have any examples where this has actually happened? Especially with 8.x, which isn't all that 'stupid' about how it handles buffers? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Ron Peacetree sounds like someone talking out of his _AZZ_. He can save his unreferenced flapdoodle for his SQL Server clients. Maybe he will post references so that we may all learn at the feet of Master Peacetree. :-) douglas On Oct 4, 2005, at 7:33 PM, Ron Peacetree wrote: > pg is _very_ stupid about caching. Almost all of the caching is left > to the OS, and it's that way by design (as post after post by TL has > pointed out). > > That means pg has almost no ability to take application domain > specific knowledge into account when deciding what to cache. > There's plenty of papers on caching out there that show that > context dependent knowledge leads to more effective caching > algorithms than context independent ones are capable of. > > (Which means said design choice is a Mistake, but unfortunately > one with too much inertia behind it currentyl to change easily.) > > Under these circumstances, it is quite possible that an expert class > human could optimize memory usage better than the OS + pg. > > If one is _sure_ they know what they are doing, I'd suggest using > tmpfs or the equivalent for critical read-only tables. For "hot" > tables that are rarely written to and where data loss would not be > a disaster, "tmpfs" can be combined with an asyncronous writer > process push updates to HD. Just remember that a power hit > means that > > The (much) more expensive alternative is to buy SSD(s) and put > the critical tables on it at load time. > > Ron > > > -----Original Message----- > From: "Jim C. Nasby" <jnasby@pervasive.com> > Sent: Oct 4, 2005 4:57 PM > To: Stefan Weiss <spaceman@foo.at> > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Is There Any Way .... > > On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote: >> On 2005-09-30 01:21, Lane Van Ingen wrote: >>> (3) Assure that a disk-based table is always in memory (outside of >>> keeping >>> it in >>> memory buffers as a result of frequent activity which would >>> prevent >>> LRU >>> operations from taking it out) ? >> >> I was wondering about this too. IMO it would be useful to have a way >> to tell >> PG that some tables were needed frequently, and should be cached if >> possible. This would allow application developers to consider joins >> with >> these tables as "cheap", even when querying on columns that are not >> indexed. >> I'm thinking about smallish tables like users, groups, *types, etc >> which >> would be needed every 2-3 queries, but might be swept out of RAM by >> one >> large query in between. Keeping a table like "users" on a RAM fs >> would not >> be an option, because the information is not volatile. > > Why do you think you'll know better than the database how frequently > something is used? At best, your guess will be correct and PostgreSQL > (or the kernel) will keep the table in memory. Or, your guess is wrong > and you end up wasting memory that could have been used for something > else. > > It would probably be better if you describe why you want to force this > table (or tables) into memory, so we can point you at more appropriate > solutions.
Douglas J. Trainor wrote: > > Ron Peacetree sounds like someone talking out of his _AZZ_. > He can save his unreferenced flapdoodle for his SQL Server > clients. Maybe he will post references so that we may all > learn at the feet of Master Peacetree. :-) Although I agree that I would definitely like to see some test cases for what Ron is talking about, I don't think that resorting to insults is going to help the situation. Ron, if you would please -- provide some test cases for what you are describing I am sure that anyone would love to see them. We are all for improving PostgreSQL. Sincerely, Joshua D. Drake -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
Unfortunately, no matter what I say or do, I'm not going to please or convince anyone who has already have made their minds up to the extent that they post comments like Mr Trainor's below. His response style pretty much proves my earlier point that this is presently a religious issue within the pg community. The absolute best proof would be to build a version of pg that does what Oracle and DB2 have done and implement it's own DB specific memory manager and then compare the performance between the two versions on the same HW, OS, and schema. The second best proof would be to set up either DB2 or Oracle so that they _don't_ use their memory managers and compare their performance to a set up that _does_ use said memory managers on the same HW, OS, and schema. I don't currently have the resources for either experiment. Some might even argue that IBM (where Codd and Date worked) and Oracle just _might_ have had justification for the huge effort they put into developing such infrastructure. Then there's the large library of research on caching strategies in just about every HW and SW domain, including DB theory, that points put that the more context dependent, ie application or domain specific awareness, caching strategies are the better they are. Maybe after we do all we can about physical IO and sorting performance I'll take on the religious fanatics on this one. One problem set at a time. Ron -----Original Message----- From: "Joshua D. Drake" <jd@commandprompt.com> Sent: Oct 4, 2005 9:32 PM To: "Douglas J. Trainor" <trainor@transborder.net> Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Is There Any Way .... Douglas J. Trainor wrote: > > Ron Peacetree sounds like someone talking out of his _AZZ_. > He can save his unreferenced flapdoodle for his SQL Server > clients. Maybe he will post references so that we may all > learn at the feet of Master Peacetree. :-) Although I agree that I would definitely like to see some test cases for what Ron is talking about, I don't think that resorting to insults is going to help the situation. Ron, if you would please -- provide some test cases for what you are describing I am sure that anyone would love to see them. We are all for improving PostgreSQL. Sincerely, Joshua D. Drake -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
On Tue, Oct 04, 2005 at 11:06:54PM -0400, Ron Peacetree wrote: > Some might even argue that IBM (where Codd and Date worked) > and Oracle just _might_ have had justification for the huge effort > they put into developing such infrastructure. The OS and FS world is very, very different now than it was when the Oracle and DB2 architectures were being crafted. What may have been an excellent development effort then may not provide such good ROI now. > Then there's the large library of research on caching strategies > in just about every HW and SW domain, including DB theory, > that points put that the more context dependent, ie application > or domain specific awareness, caching strategies are the better > they are. > > Maybe after we do all we can about physical IO and sorting > performance I'll take on the religious fanatics on this one. Actually, the main "religious fanatic" I've seen recently is yourself. While I have a gut feel that some of the issues you raise could certainly do with further investigation, I'm not seeing that much from you other than statements that muchof what postgresql does is wrong (not "wrong for your Ron's use case", but "wrong in every respect"). A little less arrogance and a little more "here are some possibilities for improvement", "here is an estimate of the amount of effort that might be needed" and "here are some rough benchmarks showing the potential return on that investment" would, at the very least, make the threads far less grating to read. Cheers, Steve
On Tue, Oct 04, 2005 at 11:06:54PM -0400, Ron Peacetree wrote: > Unfortunately, no matter what I say or do, I'm not going to please > or convince anyone who has already have made their minds up > to the extent that they post comments like Mr Trainor's below. > His response style pretty much proves my earlier point that this > is presently a religious issue within the pg community. Religious for some. Conservative for others. Sometimes people need to see the way, before they are willing to accept it merely on the say so of another person. In some circles, it is called the scientific method... :-) Also, there is a cost to complicated specific optimizations. They can be a real maintenance and portability head-ache. What is the value ratio of performance to maintenance or portability? > The absolute best proof would be to build a version of pg that does > what Oracle and DB2 have done and implement it's own DB > specific memory manager and then compare the performance > between the two versions on the same HW, OS, and schema. Not necessarily. Even if a version of PostgreSQL were to be written to function in this new model, there would be no guarantee that it was written in the most efficient manner possible. Performance could show PostgreSQL using its own caching, and disk space management implementation, and performing poorly. The only true, and accurate way would be to implement, and then invest time by those most competent to test, and optimize the implementation. At this point, it would become a moving target, as those who believe otherwise, would be free to pursue using more efficient file systems, or modifications to the operating system to better co-operate with PostgreSQL. I don't think there can be a true answer to this one. The more innovative, and clever people, will always be able to make their solution work better. If the difference in performance was really so obvious, there wouldn't be doubters on either side. It would be clear to all. The fact is, there is reason to doubt. Perhaps not doubt that the final solution would be more efficient, but rather, the reason to doubt that the difference in efficiency would be significant. > The second best proof would be to set up either DB2 or Oracle so > that they _don't_ use their memory managers and compare their > performance to a set up that _does_ use said memory managers > on the same HW, OS, and schema. Same as above. If Oracle was designed to work with the functionality, then disabling the functionality, wouldn't prove that an efficient design would perform equally poorly, or even, poorly at all. I think it would be obvious that Oracle would have invested most of their dollars into the common execution paths, with the expected functionality present. > I don't currently have the resources for either experiment. This is the real problem. :-) > Some might even argue that IBM (where Codd and Date worked) > and Oracle just _might_ have had justification for the huge effort > they put into developing such infrastructure. Or, not. They might just have more money to throw at the problem, and be entrenched into their solution to the point that they need to innovate to ensure that their solution appears to be the best. > Then there's the large library of research on caching strategies > in just about every HW and SW domain, including DB theory, > that points put that the more context dependent, ie application > or domain specific awareness, caching strategies are the better > they are. A lot of this is theory. It may be good theory, but there is no guarantee that the variables listed in these theories match, or properly estimate the issues that would be found in a real implementation. > Maybe after we do all we can about physical IO and sorting > performance I'll take on the religious fanatics on this one. > One problem set at a time. In any case, I'm on your side - in theory. Intuitively, I don't understand how anybody could claim that a general solution could ever be faster than a specific solution. Anybody who claimed this, would go down in my books as a fool. It should be obvious to these people that, as an extreme, the entire operating system caching layer, and the entire file system layer could be inlined into PostgreSQL, avoiding many of the expenses involved in switching back and forth between user space and system space, leaving a more efficient, although significantly more complicated solution. Whether by luck, or by experience of those involved, I haven't seen any senior PostgreSQL developers actually stating that it couldn't be faster. Instead, I've seen it claimed that the PostgreSQL developers don't have the resources to attack this problem, as there are other far more pressing features, product defects, and more obviously beneficial optimization opportunities to work on. Memory management, or disk management, is "good enough" as provided by decent operating systems, and the itch just isn't bad enough to scratch yet. They remain unconvinced that the gain in performance, would be worth the cost of maintaining this extra complexity in the code base. If you believe the case can be made, it is up to you to make it. Cheers! mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
First off, Mr. Trainor's response proves nothing about anyone or anything except Mr. Trainor. I'm going to offer an opinion on the caching topic. I don't have any benchmarks; I'm offering a general sense of the issue based on decades of experience, so I'll give a short summary of that. I've been earning my living by working with computers since 1972, and am the architect and primary author of a little-known database product (developed in 1984) which saw tens of thousands of installations in various vertical markets. (Last I checked, a couple years ago, it was still being used statewide by one state government after a multi-million dollar attempt to replace it with a popular commercial database product failed.) I've installed and tuned many other database products over the years. I'm just getting to know PostgreSQL, and am pretty excited about it. Now on to the meat of it. My experience is that a DBMS can improve performance by caching certain types of data. In the product I developed, we had a fairly small cache which used a weighting algorithm for what to keep (rather than simply relying on LRU). Index pages got higher weight than data pages; the higher in the index, the higher the weight. Recent access got higher weight than older access, although it took quite a while for the older access to age out entirely. This improved performance quite a bit over a generalized caching product alone. However, there was a point of diminishing return. My sense is that every 10% you add to a "smart" cache yields less benefit at a higher cost, so beyond a certain point, taking RAM from the general cache to expand the smart cache degrades performance. Clever programming techniques can shift the break-even point, but I very much doubt it can be eliminated entirely, unless the ratio of performance between CPU+RAM and persistent storage is much more extreme than I've ever seen. There is another issue, which has been raised from time to time in these lists, but not enunciated clearly enough in my view. These discussions about caching generally address maximum throughput, while there are times when it is important that certain tables can be queried very quickly, even if it hurts overall throughput. As an example, there can be tables which are accessed as a user types in a window and tabs around from one GUI control to another. The user perception of the application performance is going to depend PRIMARILY on how quickly the GUI renders the results of these queries; if the run time for a large report goes up by 10%, they will probably not notice. This is a situation where removing RAM from a generalized cache, or one based on database internals, to create an "application specific" cache can yield big rewards. One client has addressed this in a commercial product by defining a named cache large enough to hold these tables, and binding those tables to the cache. One side benefit is that such caches can be defined as "relaxed LRU" -- meaning that they eliminate the overhead of tracking accesses, since they can assume that data will rarely, if ever, be discarded from the cache. It seems to me that in the PostgreSQL world, this would currently be addressed by binding the tables to a tablespace where the file system, controller, or drive(s) would cache the data, although this is somewhat less flexible than the "named cache" approach -- unless there is a file system that can layer a cache on top of a reference to some other file system's space. (And let's not forget the many OS environments in which people use PostgreSQL.) So I do see that there would be benefit to adding a feature to PostgreSQL to define caches and bind tables or indexes to them. So I do think that it is SMART of PostgreSQL to rely on the increasingly sophisticated file systems to provide the MAIN cache. I suspect that a couple types of smaller "smart" caches in front of this could boost performance, and it might be a significant boost. I'm not sure what the current shared memory is used for; perhaps this is already caching specific types of structures for the DBMS. I'm pretty sure that programmers of GUI apps would appreciate the named cache feature, so they could tune the database for snappy GUI response, even under heavy load. I realize this is short on specifics -- I'm shooting for perspective. For the record, I don't consider myself particularly religious on the topic, but I do pull back a little at arguments which sound strictly academic -- I've found that most of what I've drawn from those circles has needed adjustment in solving real-world problems. (Particularly when algorithms optimize for best worst-case performance. I've found users are much happier with best typical case performance as long as the product of worst case performance and worst case frequency is low.) Like many others who have posted on the topic, I am quite prepared to alter my views in the face of relavent evidence. Feel free to laugh at the old fart who decided to sip his Bushmill's while reading through this thread and try to run with the young lions. As someone else recently requested, though, please don't point while you laugh -- that's just rude. :-) -Kevin >>> Ron Peacetree <rjpeace@earthlink.net> 10/04/05 10:06 PM >>> Unfortunately, no matter what I say or do, I'm not going to please or convince anyone who has already have made their minds up to the extent that they post comments like Mr Trainor's below. His response style pretty much proves my earlier point that this is presently a religious issue within the pg community.
Hey, you can say what you want about my style, but you still haven't pointed to even one article from the vast literature that you claim supports your argument. And I did include a smiley. Your original email that PostgreSQL is wrong and that you are right led me to believe that you, like others making such statements, would not post your references. You remind me of Ted Nelson, who wanted the computing center at the University of Illinois at Chicago to change their systems just for him. BTW, I'm a scientist -- I haven't made my mind up about anything. I really am interested in what you say, if there is any real work backing up your claims such that it would impact average cases. Any app designer can conceive of many ways to game the server to their app's advantage -- I'm not interested in that potboiler. douglas On Oct 4, 2005, at 11:06 PM, Ron Peacetree wrote: > Unfortunately, no matter what I say or do, I'm not going to please > or convince anyone who has already have made their minds up > to the extent that they post comments like Mr Trainor's below. > His response style pretty much proves my earlier point that this > is presently a religious issue within the pg community. > > The absolute best proof would be to build a version of pg that does > what Oracle and DB2 have done and implement it's own DB > specific memory manager and then compare the performance > between the two versions on the same HW, OS, and schema. > > The second best proof would be to set up either DB2 or Oracle so > that they _don't_ use their memory managers and compare their > performance to a set up that _does_ use said memory managers > on the same HW, OS, and schema. > > I don't currently have the resources for either experiment. > > Some might even argue that IBM (where Codd and Date worked) > and Oracle just _might_ have had justification for the huge effort > they put into developing such infrastructure. > > Then there's the large library of research on caching strategies > in just about every HW and SW domain, including DB theory, > that points put that the more context dependent, ie application > or domain specific awareness, caching strategies are the better > they are. > > Maybe after we do all we can about physical IO and sorting > performance I'll take on the religious fanatics on this one. > > One problem set at a time. > Ron
I'm sure there will be cases when some human assisted caching algorithm will perform better than an mathetical statistical based design, but it will also depend on the "human". And it probably will make thing worse when workload changes and human doesn't realize. It must be considered that, today, hardware cost is not the %90 of budget that it used to be. Throwing hardware at the system can be as much expensive as throwing certified "it stuff". (just think in coffee budget! :-) ) If you need to improve "user perception", you can do others things. Like caching a table in your client (with a trigger for any change on table X updating a table called "timestamp_table_change" and a small select to this table, you can easily know when you must update your client). If it is a application server, serving http request, then "user perception" will be sticked to bandwidth AND application server (some of them have cache for request). FYI, I don't recall a mechanism in MSSQL to cache a table in buffers. Oracle has some structures to allow that. (you know) It uses his own buffer. Since version 9i, you can set three different data buffers, one (recycled cache) for low usage tables (I mean tables with blocks which don't have too much chance to be queried again, like a very large historical table) , one for high usage tables (keep cache), and the regular one (difference is in algorithm). And you must also set a buffer cache size for tablespaces with different block size. But there is no such thing as "create table x keep entirenly in buffer". And above all things, oracle doc always states "first, tune design, then tune queries, then start tunning engine". greetings.
** Low Priority ** Human feedback from testers and users has proven pretty effective at catching errors in the "human assisted" cache configuration. When people setting up the servers have missed the named cache configuration, and all they had was the single general purpose cache, it has been caught because of user complaints on performance. There was an attempt made to simulate database queries -- hitting a client side cache on some of the roughly100 tables (out of 300 in the well normalized schema) which fit this pattern of usage. It didn't prove very cost effective. It just makes more sense to allow the DBAs to tweek database performance through database configuration changes than to jump through that many hoops in application code to try to achieve it where it becomes an issue. As far as I know, you can't use this technique in Microsoft SQL Server or Oracle. They are using Sybase Adaptive Server Enterprise (ASE). I believe named caches were added in version 12.0, long after Microsoft split off with their separate code stream based on the Sybase effort. -Kevin >>> "Dario" <dario_d_s@unitech.com.ar> 10/05/05 6:16 AM >>> I'm sure there will be cases when some human assisted caching algorithm will perform better than an mathetical statistical based design, but it will also depend on the "human". And it probably will make thing worse when workload changes and human doesn't realize. It must be considered that, today, hardware cost is not the %90 of budget that it used to be. Throwing hardware at the system can be as much expensive as throwing certified "it stuff". (just think in coffee budget! :-) ) If you need to improve "user perception", you can do others things. Like caching a table in your client (with a trigger for any change on table X updating a table called "timestamp_table_change" and a small select to this table, you can easily know when you must update your client). If it is a application server, serving http request, then "user perception" will be sticked to bandwidth AND application server (some of them have cache for request). FYI, I don't recall a mechanism in MSSQL to cache a table in buffers. Oracle has some structures to allow that. (you know) It uses his own buffer. Since version 9i, you can set three different data buffers, one (recycled cache) for low usage tables (I mean tables with blocks which don't have too much chance to be queried again, like a very large historical table) , one for high usage tables (keep cache), and the regular one (difference is in algorithm). And you must also set a buffer cache size for tablespaces with different block size. But there is no such thing as "create table x keep entirenly in buffer". And above all things, oracle doc always states "first, tune design, then tune queries, then start tunning engine". greetings. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
On Tue, 4 Oct 2005 23:06:54 -0400 (EDT) Ron Peacetree <rjpeace@earthlink.net> wrote: > Then there's the large library of research on caching strategies > in just about every HW and SW domain, including DB theory, > that points put that the more context dependent, ie application > or domain specific awareness, caching strategies are the better > they are. Isn't this also a very strong argument for putting your caching into your application and not at the database level? As you say the more "application or domain specific" it is the better. I don't see how PostgreSQL is going to magically determine what is perfect for everyone's differing needs and implement it for you. Even rudimentary controls such "always keep this table/index/whatever in RAM" aren't as fine grained or specific enough to get full benefit. My suggestion is to use something like memcached to store your data in, based on the particular needs of your application. This puts all of the control in the hands of the programmer where, in my opinion, it belongs. Just to clarify, I'm not entirely against the idea, but I certainly think there are other areas of PostgreSQL we should be focusing our efforts. --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org ---------------------------------
From: Kevin Grittner <Kevin.Grittner@wicourts.gov> Sent: Oct 5, 2005 2:16 AM Subject: Re: [PERFORM] Is There Any Way .... >First off, Mr. Trainor's response proves nothing about anyone or >anything except Mr. Trainor. > Fair Enough. I apologize for the inappropriately general statement. >I'm going to offer an opinion on the caching topic. I don't have >any benchmarks; I'm offering a general sense of the issue based on >decades of experience, so I'll give a short summary of that. > >I've been earning my living by working with computers since 1972, > ~1978 for me. So to many on this list, I also would be an "old fart". <description of qualifications snipped> > I've pretty much spent my entire career thinking about and making advances in RW distributed computing and parallel processing as first a programmer and then a systems architect. >Now on to the meat of it. <excellent and fair handed overall analysis snipped> > I agree with your comments just about across the board. I also agree with the poster(s) who noted that the "TLC factor" and the 2x every 18months pace of increasing HW performance and RAM capacity make this stuff a moving target. OTOH, there are some fundamentals that don't seem to change no matter how far or fast the computing field evolves. As usual, the proper answers involve finding a sometimes nontrivial balance between building on known precedent and not being trapped by doctrine. Ron
A blast from the past is forwarded below. douglas Begin forwarded message: <excerpt><bold><color><param>0000,0000,0000</param>From: </color></bold>Tom Lane <<tgl@sss.pgh.pa.us> <bold><color><param>0000,0000,0000</param>Date: </color></bold>August 23, 2005 3:23:43 PM EDT <bold><color><param>0000,0000,0000</param>To: </color></bold>Donald Courtney <<Donald.Courtney@sun.com> <bold><color><param>0000,0000,0000</param>Cc: </color></bold>pgsql-performance@postgresql.org, Frank Wiles <<frank@wiles.org>, gokulnathbabu manoharan <<gokulnathbabu@yahoo.com> <bold><color><param>0000,0000,0000</param>Subject: </color>Re: [PERFORM] Caching by Postgres </bold> Donald Courtney <<Donald.Courtney@Sun.COM> writes: <excerpt>I am not alone in having the *expectation* that a database should have some cache size parameter and the option to skip the file system. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. </excerpt> And you're not alone in holding that opinion despite having no shred of evidence that it's worthwhile expanding the cache that far. However, since we've gotten tired of hearing this FUD over and over, 8.1 will have the ability to set shared_buffers as high as you want. I expect next we'll be hearing from people complaining that they set shared_buffers to use all of RAM and performance went into the tank ... regards, tom lane </excerpt> On Oct 4, 2005, at 11:06 PM, Ron Peacetree wrote: <excerpt>Unfortunately, no matter what I say or do, I'm not going to please or convince anyone who has already have made their minds up to the extent that they post comments like Mr Trainor's below. His response style pretty much proves my earlier point that this is presently a religious issue within the pg community. The absolute best proof would be to build a version of pg that does what Oracle and DB2 have done and implement it's own DB specific memory manager and then compare the performance between the two versions on the same HW, OS, and schema. The second best proof would be to set up either DB2 or Oracle so that they _don't_ use their memory managers and compare their performance to a set up that _does_ use said memory managers on the same HW, OS, and schema. I don't currently have the resources for either experiment. Some might even argue that IBM (where Codd and Date worked) and Oracle just _might_ have had justification for the huge effort they put into developing such infrastructure. Then there's the large library of research on caching strategies in just about every HW and SW domain, including DB theory, that points put that the more context dependent, ie application or domain specific awareness, caching strategies are the better they are. Maybe after we do all we can about physical IO and sorting performance I'll take on the religious fanatics on this one. One problem set at a time. Ron </excerpt> A blast from the past is forwarded below. douglas Begin forwarded message: > From: Tom Lane <tgl@sss.pgh.pa.us> > Date: August 23, 2005 3:23:43 PM EDT > To: Donald Courtney <Donald.Courtney@sun.com> > Cc: pgsql-performance@postgresql.org, Frank Wiles <frank@wiles.org>, > gokulnathbabu manoharan <gokulnathbabu@yahoo.com> > Subject: Re: [PERFORM] Caching by Postgres > > Donald Courtney <Donald.Courtney@Sun.COM> writes: >> I am not alone in having the *expectation* that a database should have >> some cache size parameter and the option to skip the file system. If >> I use oracle, sybase, mysql and maxdb they all have the ability to >> size a data cache and move to 64 bits. > > And you're not alone in holding that opinion despite having no shred > of evidence that it's worthwhile expanding the cache that far. > > However, since we've gotten tired of hearing this FUD over and over, > 8.1 will have the ability to set shared_buffers as high as you want. > I expect next we'll be hearing from people complaining that they > set shared_buffers to use all of RAM and performance went into the > tank ... > > regards, tom lane On Oct 4, 2005, at 11:06 PM, Ron Peacetree wrote: > Unfortunately, no matter what I say or do, I'm not going to please > or convince anyone who has already have made their minds up > to the extent that they post comments like Mr Trainor's below. > His response style pretty much proves my earlier point that this > is presently a religious issue within the pg community. > > The absolute best proof would be to build a version of pg that does > what Oracle and DB2 have done and implement it's own DB > specific memory manager and then compare the performance > between the two versions on the same HW, OS, and schema. > > The second best proof would be to set up either DB2 or Oracle so > that they _don't_ use their memory managers and compare their > performance to a set up that _does_ use said memory managers > on the same HW, OS, and schema. > > I don't currently have the resources for either experiment. > > Some might even argue that IBM (where Codd and Date worked) > and Oracle just _might_ have had justification for the huge effort > they put into developing such infrastructure. > > Then there's the large library of research on caching strategies > in just about every HW and SW domain, including DB theory, > that points put that the more context dependent, ie application > or domain specific awareness, caching strategies are the better > they are. > > Maybe after we do all we can about physical IO and sorting > performance I'll take on the religious fanatics on this one. > > One problem set at a time. > Ron
Jim C. Nasby" <jnasby ( at ) pervasive ( dot ) com> wrote: > > Stefan Weiss wrote: > > ... IMO it would be useful to have a way to tell > > PG that some tables were needed frequently, and should be cached if > > possible. This would allow application developers to consider joins with > > these tables as "cheap", even when querying on columns that are > > not indexed. > > Why do you think you'll know better than the database how frequently > something is used? At best, your guess will be correct and PostgreSQL > (or the kernel) will keep the table in memory. Or, your guess is wrong > and you end up wasting memory that could have been used for something > else. > > It would probably be better if you describe why you want to force this > table (or tables) into memory, so we can point you at more appropriate > solutions. Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate implementation. ;-) Ok, wittiness aside, here's a concrete example. I have an application with one critical index that MUST remain in memoryat all times. The index's tablespace is about 2 GB. As long as it's in memory, performance is excellent - a user'squery takes a fraction of a second. But if it gets swapped out, the user's query might take up to five minutes asthe index is re-read from memory. Now here's the rub. The only performance I care about is response to queries from the web application. Everything elseis low priority. But there is other activity going on. Suppose, for example, that I'm updating tables, performing queries,doing administration, etc., etc., for a period of an hour, during which no customer visits the site. The anothercustomer comes along and performs a query. At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except my webapplication. The performance of all the other stuff, the administration, the updates, etc., is utterly irrelevant comparedto the performance of the customer's query. What actually happens is that the other activities have swapped out the critical index, and my customer waits, and waits,and waits... and goes away after a minute or two. To solve this, we've been forced to purchase two computers, andmirror the database on both. All administration and modification happens on the "offline" database, and the web applicationonly uses the "online" database. At some point, we swap the two servers, sync the two databases, and carry on. It's a very unsatisfactory solution. There is ONLY one way to convey this sort of information to Postgres, which is to provide the application developer a mechanismto explicitely name the tables that should be locked in memory. Look at tsearchd that Oleg is working on. It's a direct response to this problem. It's been recognized for decades that, as kernel developers (whether a Linux kernel or a database kernel), our ability topredict the behavior of an application is woefully inadequate compared with the application developer's knowledge of theapplication. Computer Science simply isn't a match for the human brain yet, not even close. To give you perspective, since I posted a question about this problem (regarding tsearch2/GIST indexes), half of the responsesI received told me that they encountered this problem, and their solution was to use an external full-text engine. They all confirmed that Postgres can't deal with this problem yet, primarily for the reasons outlined above. Craig
This is possible with Oracle utilizing the keep pool alter table t_name storage ( buffer_pool keep); If Postgres were to implement it's own caching system, this seems like it would be easily to implement (beyond the initial caching effort). Alex On 10/24/05, Craig A. James <cjames@modgraph-usa.com> wrote: > Jim C. Nasby" <jnasby ( at ) pervasive ( dot ) com> wrote: > > > Stefan Weiss wrote: > > > ... IMO it would be useful to have a way to tell > > > PG that some tables were needed frequently, and should be cached if > > > possible. This would allow application developers to consider joins with > > > these tables as "cheap", even when querying on columns that are > > > not indexed. > > > > Why do you think you'll know better than the database how frequently > > something is used? At best, your guess will be correct and PostgreSQL > > (or the kernel) will keep the table in memory. Or, your guess is wrong > > and you end up wasting memory that could have been used for something > > else. > > > > It would probably be better if you describe why you want to force this > > table (or tables) into memory, so we can point you at more appropriate > > solutions. > > Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate implementation. ;-) > > Ok, wittiness aside, here's a concrete example. I have an application with one critical index that MUST remain in memoryat all times. The index's tablespace is about 2 GB. As long as it's in memory, performance is excellent - a user'squery takes a fraction of a second. But if it gets swapped out, the user's query might take up to five minutes asthe index is re-read from memory. > > Now here's the rub. The only performance I care about is response to queries from the web application. Everything elseis low priority. But there is other activity going on. Suppose, for example, that I'm updating tables, performing queries,doing administration, etc., etc., for a period of an hour, during which no customer visits the site. The anothercustomer comes along and performs a query. > > At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except myweb application. The performance of all the other stuff, the administration, the updates, etc., is utterly irrelevantcompared to the performance of the customer's query. > > What actually happens is that the other activities have swapped out the critical index, and my customer waits, and waits,and waits... and goes away after a minute or two. To solve this, we've been forced to purchase two computers, andmirror the database on both. All administration and modification happens on the "offline" database, and the web applicationonly uses the "online" database. At some point, we swap the two servers, sync the two databases, and carry on. It's a very unsatisfactory solution. > > There is ONLY one way to convey this sort of information to Postgres, which is to provide the application developer a mechanismto explicitely name the tables that should be locked in memory. > > Look at tsearchd that Oleg is working on. It's a direct response to this problem. > > It's been recognized for decades that, as kernel developers (whether a Linux kernel or a database kernel), our abilityto predict the behavior of an application is woefully inadequate compared with the application developer's knowledgeof the application. Computer Science simply isn't a match for the human brain yet, not even close. > > To give you perspective, since I posted a question about this problem (regarding tsearch2/GIST indexes), half of the responsesI received told me that they encountered this problem, and their solution was to use an external full-text engine. They all confirmed that Postgres can't deal with this problem yet, primarily for the reasons outlined above. > > Craig > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Alex Turner wrote: > This is possible with Oracle utilizing the keep pool > > alter table t_name storage ( buffer_pool keep); > > If Postgres were to implement it's own caching system, this seems like > it would be easily to implement (beyond the initial caching effort). > > Alex > > > On 10/24/05, Craig A. James <cjames@modgraph-usa.com> wrote: > >> Jim C. Nasby" <jnasby ( at ) pervasive ( dot ) com> wrote: >> >>>> Stefan Weiss wrote: >>>> ... IMO it would be useful to have a way to tell >>>> PG that some tables were needed frequently, and should be cached if >>>> possible. This would allow application developers to consider joins with >>>> these tables as "cheap", even when querying on columns that are >>>> not indexed. >>>> >>> Why do you think you'll know better than the database how frequently >>> something is used? At best, your guess will be correct and PostgreSQL >>> (or the kernel) will keep the table in memory. Or, your guess is wrong >>> and you end up wasting memory that could have been used for something >>> else. >>> >>> It would probably be better if you describe why you want to force this >>> table (or tables) into memory, so we can point you at more appropriate >>> solutions. >>> >> Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate implementation. ;-) >> >> Ok, wittiness aside, here's a concrete example. I have an application with one critical index that MUST remain in memoryat all times. The index's tablespace is about 2 GB. As long as it's in memory, performance is excellent - a user'squery takes a fraction of a second. But if it gets swapped out, the user's query might take up to five minutes asthe index is re-read from memory. >> >> Now here's the rub. The only performance I care about is response to queries from the web application. Everything elseis low priority. But there is other activity going on. Suppose, for example, that I'm updating tables, performing queries,doing administration, etc., etc., for a period of an hour, during which no customer visits the site. The anothercustomer comes along and performs a query. >> >> At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except myweb application. The performance of all the other stuff, the administration, the updates, etc., is utterly irrelevantcompared to the performance of the customer's query. >> >> What actually happens is that the other activities have swapped out the critical index, and my customer waits, and waits,and waits... and goes away after a minute or two. To solve this, we've been forced to purchase two computers, andmirror the database on both. All administration and modification happens on the "offline" database, and the web applicationonly uses the "online" database. At some point, we swap the two servers, sync the two databases, and carry on. It's a very unsatisfactory solution. We have a similar problem with vacuum being the equivalent of "continuously flush all system caches for a long time". Our database is about 200GB in size and vacuums take hours and hours. The performance is acceptable still, but only because we've hidden the latency in our application. I've occasionally thought it would be good to have the backend doing a vacuum or analyze also call priocntl() prior to doing any real work to lower its priority. We'll be switching to the 8.1 release ASAP just because the direct IO capabilities are appearing to be a win on our development system. -- Alan