Thread: caching table/query
Hi list! I have several pretty little tables (~300 - 400 records) which are queried very frequently. I would like to make postrgres be aware of this and do a in memory cache for it . (It have to be inside postgres - can't use any special tools for it) What is the way to do it? Thanks Evgeny
On Sun, Jul 17, 2005 at 02:00:28PM +0300, Tsirkin Evgeny wrote: > > I have several pretty little tables (~300 - 400 records) which are > queried very frequently. > I would like to make postrgres be aware of this and do a in memory cache > for it . PostgreSQL maintains a buffer cache in shared memory; if you enable statistics gathering then you can see how effective it is for the tables in question: http://www.postgresql.org/docs/8.0/static/monitoring-stats.html If the tables are small and you're querying them frequently, and if shared_buffers is adequately sized, then I'd expect most or all queries to be satisfied from the buffer cache. Those that aren't might still be satisfied from the OS's cache. Are you experiencing performance problems? Presumably that's the real problem you're trying to solve. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Michael Fuhr wrote: >On Sun, Jul 17, 2005 at 02:00:28PM +0300, Tsirkin Evgeny wrote: > > >>I have several pretty little tables (~300 - 400 records) which are >>queried very frequently. >>I would like to make postrgres be aware of this and do a in memory cache >>for it . >> >> > >PostgreSQL maintains a buffer cache in shared memory; if you enable >statistics gathering then you can see how effective it is for the >tables in question: > >http://www.postgresql.org/docs/8.0/static/monitoring-stats.html > >If the tables are small and you're querying them frequently, and >if shared_buffers is adequately sized, then I'd expect most or all >queries to be satisfied from the buffer cache. Those that aren't >might still be satisfied from the OS's cache. > >Are you experiencing performance problems? Presumably that's the >real problem you're trying to solve. > > Well, that is currently only development stage so i can't realy know if there will be any perfomence problems. But i am looking for a way to be sure i will not have one.And since i already know what queries and tables i want to cache those .Just saing to my boss " when we there will be performance problems the postgresql will solve it " would not do it.The case is that when runnig a test i am getting penalty for requesting the table,of course the test does query only once so i can't see the cache in work . What i realy want is to say to postgres that the particular table should be cached . Thanks.
Tsirkin Evgeny <tsurkin@mail.jct.ac.il> writes: > What i realy want is to say to postgres that the particular > table should be cached . Adopting the position that you are smarter than an automatic optimization algorithm is generally a good way to achieve less performance, not more. Especially so if you are willing to override the automatic algorithm on the strength of guesses that you make when you haven't even seen the system in operation yet. The truth of the matter is that you haven't got the foggiest idea yet where the system bottlenecks will be, and so you should certainly not be thinking about how to override the cache manager. If you've not heard the phrase "premature optimization is the root of all evil", I suggest you read up on it. It's generally good advice. regards, tom lane
Well i guess that there are cases where that is not so.However i understand that postgres does not have any way accept the automatic. Tom Lane wrote: >Tsirkin Evgeny <tsurkin@mail.jct.ac.il> writes: > > >>What i realy want is to say to postgres that the particular >>table should be cached . >> >> > >Adopting the position that you are smarter than an automatic >optimization algorithm is generally a good way to achieve less >performance, not more. >
Well ,here is something that might be interesting: trying to reduce query time and since that is a web application i now using caching with perl MLDBM::Sync::SDBM_File on application level .(Basically that involves quering all the table and putting it into MLDBM::Sync::SDBM_File and then quering it back ) .What is interesting is that while time quering the tables from postgres is taken 0.3 sec. using sdbm it takes 0.1 sec. Actually i am not anymore sure that postgres does NOT cache table in memory, maybe it does and i don't know about it?Maybe the time is spended in transmitting the data from postgres to the application? How can i test this? Thanks. Chris Travers wrote: > Tsirkin Evgeny wrote: > >> >> Well i guess that there are cases where that is not so.However i >> understand that postgres does not have any way accept >> the automatic. > > > Largely true. There may be ways of forcing a from-memory query once > the system is in place, but I am not sure that this is always what you > want, it would not be pretty, and might have some real performance > issues. My suggestion is that you wait until your system is in > operation and then look first for general tuning help before you even > consider trying to impliment a memory-cached table in your application. > > Just for the record, I can think of at least one way of hacking this > on, but it seems like it should be an absolutely last resort rather > than a first impulse. > > PostgreSQL can do this, but you probably don't want it to. > > Best Wishes, > Chris Travers
On Wed, Jul 20, 2005 at 10:25:59AM +0300, Tsirkin Evgeny wrote: > trying to reduce query time and since that is a web application i now > using caching with perl MLDBM::Sync::SDBM_File on application level. > (Basically that involves quering all the table and putting it into > MLDBM::Sync::SDBM_File and then quering it back). What is interesting > is that while time quering the tables from postgres is taken 0.3 sec. > using sdbm it takes 0.1 sec. PostgreSQL isn't likely to win a speed contest against a trivial storage/retrieval library when the test involves simple queries and little or no concurrency. PostgreSQL is a full-featured database with goals beyond just "make SELECT statements as fast as possible." > Actually i am not anymore sure that postgres does NOT cache table in memory, > maybe it does and i don't know about it? As I mentioned in an earlier message, you could enable statistics gathering and use the statistics views to see whether your queries are being satistified from the buffer cache or if they require a call to the operating system: http://www.postgresql.org/docs/8.0/static/monitoring-stats.html The first time you query a table it will probably need to be fetched from disk (or from the operating system's cache); subsequent queries will likely be much faster because PostgreSQL will have stored the fetched pages in its buffer cache. Example: SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 0 | 0 | 0 | 0 (1 row) EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using foo_pkey on foo (cost=0.00..3.01 rows=1 width=18) (actual time=73.686..73.698 rows=1 loops=1) Index Cond: (id = 12345) Total runtime: 73.934 ms (3 rows) SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 1 | 0 | 3 | 0 (1 row) EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using foo_pkey on foo (cost=0.00..3.01 rows=1 width=18) (actual time=0.072..0.083 rows=1 loops=1) Index Cond: (id = 12345) Total runtime: 0.237 ms (3 rows) SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 1 | 1 | 3 | 3 (1 row) Notice how much faster the second query was, even though it was identical to the first. As you can see from the statistics view, the second query's page fetches were all "hits," meaning they were retrieved from the buffer cache. If you want to analyze performance, then make sure your tests mimic the activity you expect to see on the production system, such as number of concurrent connections; amount of select, insert, update, and delete activity; similar data sets in terms of size and distribution; representative queries; etc. Use PostgreSQL's and the operating system's instrumentation to identify performance bottlenecks, and then look for remedies. BTW, pgsql-performance might be a more appropriate list for this thread. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Tsirkin Evgeny wrote: > > Well ,here is something that might be interesting: > trying to reduce query time and since that is a web application i now > using caching with > perl MLDBM::Sync::SDBM_File on application level .(Basically that > involves quering all the > table and putting it into MLDBM::Sync::SDBM_File and then quering it > back ) .What is > interesting is that while time quering the tables from postgres is > taken 0.3 sec. using sdbm > it takes 0.1 sec. Note too you have authentication and network latency issues when querying a remote server. Also PostgreSQL does a lot besides just storing your data. You have permissions enforcement, and a wide variety of other things. So it si unreasonable to assume that PostgreSQL will be as fast as any simple storage mechanism for simple data storage. > > Actually i am not anymore sure that postgres does NOT cache table in > memory, > maybe it does and i don't know about it?Maybe the time is spended in > transmitting the data > from postgres to the application? How can i test this? Normally, PostgreSQL will try to intelligently cache information in memory. The OS also will generally cache disk reads, so even if it expires from PostgreSQL's cache, it may be in the OS disk cache. So yes, it tries to cache material in memory automatically. If you want to try to force PostgreSQL to always cache such a table, I guess it would be possible but any solution you come up with will have three characteristics: 1) It will reduce the general memory available for other database operations (possibly causing things to page to disk including possibly your cached copy) and this may reduce general database performance, 2) It will be messy and bolted on and 3) It will likely be brittle-- I don't want to know how easily you can recover things in the event of a power outage. PLEASE DON'T DO THIS but I think one could use a tablespace on a ram disk with some sort of sync'ing utility (rules/triggers) and intelligent copying of files to and from disk before startup and after shutdown of the database server. However, you are on your own in the event of a power outage.... In general, Tom's advice is very good: Build your application first, and expect that PostgreSQL's optimizer will do a reasonable job of optimizing your data access. Then when you have bottlenecks, look for ways around these. Do as *little as possible* to optimize things and look for the simplest steps to remove your bottlenecks. This will ensure that your application and data are maintainable down the road. The optimizer is actually quite good. Use it as your primary tool of performance enhancement. Remember Occam's Razor: "One should not needlessly multiply entities." (the most common translation I have seen regarding what William of Occam actually said.) This statement is a fabulous engineering principle. Best Wishes, Chris Travers Metatron Technology Consulting
Tsirkin Evgeny wrote: > > Well i guess that there are cases where that is not so.However i > understand that postgres does not have any way accept > the automatic. Largely true. There may be ways of forcing a from-memory query once the system is in place, but I am not sure that this is always what you want, it would not be pretty, and might have some real performance issues. My suggestion is that you wait until your system is in operation and then look first for general tuning help before you even consider trying to impliment a memory-cached table in your application. Just for the record, I can think of at least one way of hacking this on, but it seems like it should be an absolutely last resort rather than a first impulse. PostgreSQL can do this, but you probably don't want it to. Best Wishes, Chris Travers
Attachment
On Sun, Jul 17, 2005 at 02:00:28PM +0300, Tsirkin Evgeny wrote: > > I have several pretty little tables (~300 - 400 records) which are > queried very frequently. > I would like to make postrgres be aware of this and do a in memory cache > for it . PostgreSQL maintains a buffer cache in shared memory; if you enable statistics gathering then you can see how effective it is for the tables in question: http://www.postgresql.org/docs/8.0/static/monitoring-stats.html If the tables are small and you're querying them frequently, and if shared_buffers is adequately sized, then I'd expect most or all queries to be satisfied from the buffer cache. Those that aren't might still be satisfied from the OS's cache. Are you experiencing performance problems? Presumably that's the real problem you're trying to solve. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Tsirkin Evgeny wrote: > > Well ,here is something that might be interesting: > trying to reduce query time and since that is a web application i now > using caching with > perl MLDBM::Sync::SDBM_File on application level .(Basically that > involves quering all the > table and putting it into MLDBM::Sync::SDBM_File and then quering it > back ) .What is > interesting is that while time quering the tables from postgres is > taken 0.3 sec. using sdbm > it takes 0.1 sec. Note too you have authentication and network latency issues when querying a remote server. Also PostgreSQL does a lot besides just storing your data. You have permissions enforcement, and a wide variety of other things. So it si unreasonable to assume that PostgreSQL will be as fast as any simple storage mechanism for simple data storage. > > Actually i am not anymore sure that postgres does NOT cache table in > memory, > maybe it does and i don't know about it?Maybe the time is spended in > transmitting the data > from postgres to the application? How can i test this? Normally, PostgreSQL will try to intelligently cache information in memory. The OS also will generally cache disk reads, so even if it expires from PostgreSQL's cache, it may be in the OS disk cache. So yes, it tries to cache material in memory automatically. If you want to try to force PostgreSQL to always cache such a table, I guess it would be possible but any solution you come up with will have three characteristics: 1) It will reduce the general memory available for other database operations (possibly causing things to page to disk including possibly your cached copy) and this may reduce general database performance, 2) It will be messy and bolted on and 3) It will likely be brittle-- I don't want to know how easily you can recover things in the event of a power outage. PLEASE DON'T DO THIS but I think one could use a tablespace on a ram disk with some sort of sync'ing utility (rules/triggers) and intelligent copying of files to and from disk before startup and after shutdown of the database server. However, you are on your own in the event of a power outage.... In general, Tom's advice is very good: Build your application first, and expect that PostgreSQL's optimizer will do a reasonable job of optimizing your data access. Then when you have bottlenecks, look for ways around these. Do as *little as possible* to optimize things and look for the simplest steps to remove your bottlenecks. This will ensure that your application and data are maintainable down the road. The optimizer is actually quite good. Use it as your primary tool of performance enhancement. Remember Occam's Razor: "One should not needlessly multiply entities." (the most common translation I have seen regarding what William of Occam actually said.) This statement is a fabulous engineering principle. Best Wishes, Chris Travers Metatron Technology Consulting
On Wed, Jul 20, 2005 at 10:25:59AM +0300, Tsirkin Evgeny wrote: > trying to reduce query time and since that is a web application i now > using caching with perl MLDBM::Sync::SDBM_File on application level. > (Basically that involves quering all the table and putting it into > MLDBM::Sync::SDBM_File and then quering it back). What is interesting > is that while time quering the tables from postgres is taken 0.3 sec. > using sdbm it takes 0.1 sec. PostgreSQL isn't likely to win a speed contest against a trivial storage/retrieval library when the test involves simple queries and little or no concurrency. PostgreSQL is a full-featured database with goals beyond just "make SELECT statements as fast as possible." > Actually i am not anymore sure that postgres does NOT cache table in memory, > maybe it does and i don't know about it? As I mentioned in an earlier message, you could enable statistics gathering and use the statistics views to see whether your queries are being satistified from the buffer cache or if they require a call to the operating system: http://www.postgresql.org/docs/8.0/static/monitoring-stats.html The first time you query a table it will probably need to be fetched from disk (or from the operating system's cache); subsequent queries will likely be much faster because PostgreSQL will have stored the fetched pages in its buffer cache. Example: SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 0 | 0 | 0 | 0 (1 row) EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using foo_pkey on foo (cost=0.00..3.01 rows=1 width=18) (actual time=73.686..73.698 rows=1 loops=1) Index Cond: (id = 12345) Total runtime: 73.934 ms (3 rows) SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 1 | 0 | 3 | 0 (1 row) EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using foo_pkey on foo (cost=0.00..3.01 rows=1 width=18) (actual time=0.072..0.083 rows=1 loops=1) Index Cond: (id = 12345) Total runtime: 0.237 ms (3 rows) SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 1 | 1 | 3 | 3 (1 row) Notice how much faster the second query was, even though it was identical to the first. As you can see from the statistics view, the second query's page fetches were all "hits," meaning they were retrieved from the buffer cache. If you want to analyze performance, then make sure your tests mimic the activity you expect to see on the production system, such as number of concurrent connections; amount of select, insert, update, and delete activity; similar data sets in terms of size and distribution; representative queries; etc. Use PostgreSQL's and the operating system's instrumentation to identify performance bottlenecks, and then look for remedies. BTW, pgsql-performance might be a more appropriate list for this thread. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Michael Fuhr wrote: >On Sun, Jul 17, 2005 at 02:00:28PM +0300, Tsirkin Evgeny wrote: > > >>I have several pretty little tables (~300 - 400 records) which are >>queried very frequently. >>I would like to make postrgres be aware of this and do a in memory cache >>for it . >> >> > >PostgreSQL maintains a buffer cache in shared memory; if you enable >statistics gathering then you can see how effective it is for the >tables in question: > >http://www.postgresql.org/docs/8.0/static/monitoring-stats.html > >If the tables are small and you're querying them frequently, and >if shared_buffers is adequately sized, then I'd expect most or all >queries to be satisfied from the buffer cache. Those that aren't >might still be satisfied from the OS's cache. > >Are you experiencing performance problems? Presumably that's the >real problem you're trying to solve. > > Well, that is currently only development stage so i can't realy know if there will be any perfomence problems. But i am looking for a way to be sure i will not have one.And since i already know what queries and tables i want to cache those .Just saing to my boss " when we there will be performance problems the postgresql will solve it " would not do it.The case is that when runnig a test i am getting penalty for requesting the table,of course the test does query only once so i can't see the cache in work . What i realy want is to say to postgres that the particular table should be cached . Thanks.
Well ,here is something that might be interesting: trying to reduce query time and since that is a web application i now using caching with perl MLDBM::Sync::SDBM_File on application level .(Basically that involves quering all the table and putting it into MLDBM::Sync::SDBM_File and then quering it back ) .What is interesting is that while time quering the tables from postgres is taken 0.3 sec. using sdbm it takes 0.1 sec. Actually i am not anymore sure that postgres does NOT cache table in memory, maybe it does and i don't know about it?Maybe the time is spended in transmitting the data from postgres to the application? How can i test this? Thanks. Chris Travers wrote: > Tsirkin Evgeny wrote: > >> >> Well i guess that there are cases where that is not so.However i >> understand that postgres does not have any way accept >> the automatic. > > > Largely true. There may be ways of forcing a from-memory query once > the system is in place, but I am not sure that this is always what you > want, it would not be pretty, and might have some real performance > issues. My suggestion is that you wait until your system is in > operation and then look first for general tuning help before you even > consider trying to impliment a memory-cached table in your application. > > Just for the record, I can think of at least one way of hacking this > on, but it seems like it should be an absolutely last resort rather > than a first impulse. > > PostgreSQL can do this, but you probably don't want it to. > > Best Wishes, > Chris Travers
Tsirkin Evgeny wrote: > > Well i guess that there are cases where that is not so.However i > understand that postgres does not have any way accept > the automatic. Largely true. There may be ways of forcing a from-memory query once the system is in place, but I am not sure that this is always what you want, it would not be pretty, and might have some real performance issues. My suggestion is that you wait until your system is in operation and then look first for general tuning help before you even consider trying to impliment a memory-cached table in your application. Just for the record, I can think of at least one way of hacking this on, but it seems like it should be an absolutely last resort rather than a first impulse. PostgreSQL can do this, but you probably don't want it to. Best Wishes, Chris Travers