Thread: caching table/query

caching table/query

From
Tsirkin Evgeny
Date:
Hi list!
I have several pretty little tables (~300 - 400 records) which are
queried very frequently.
I would like to make postrgres be aware of this and do a in memory cache
for it .
(It have to be inside postgres - can't use any special tools for it)
What is the way to do it?
Thanks
Evgeny

Re: caching table/query

From
Michael Fuhr
Date:
On Sun, Jul 17, 2005 at 02:00:28PM +0300, Tsirkin Evgeny wrote:
>
> I have several pretty little tables (~300 - 400 records) which are
> queried very frequently.
> I would like to make postrgres be aware of this and do a in memory cache
> for it .

PostgreSQL maintains a buffer cache in shared memory; if you enable
statistics gathering then you can see how effective it is for the
tables in question:

http://www.postgresql.org/docs/8.0/static/monitoring-stats.html

If the tables are small and you're querying them frequently, and
if shared_buffers is adequately sized, then I'd expect most or all
queries to be satisfied from the buffer cache.  Those that aren't
might still be satisfied from the OS's cache.

Are you experiencing performance problems?  Presumably that's the
real problem you're trying to solve.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


Re: caching table/query

From
Tsirkin Evgeny
Date:
Michael Fuhr wrote:

>On Sun, Jul 17, 2005 at 02:00:28PM +0300, Tsirkin Evgeny wrote:
>
>
>>I have several pretty little tables (~300 - 400 records) which are
>>queried very frequently.
>>I would like to make postrgres be aware of this and do a in memory cache
>>for it .
>>
>>
>
>PostgreSQL maintains a buffer cache in shared memory; if you enable
>statistics gathering then you can see how effective it is for the
>tables in question:
>
>http://www.postgresql.org/docs/8.0/static/monitoring-stats.html
>
>If the tables are small and you're querying them frequently, and
>if shared_buffers is adequately sized, then I'd expect most or all
>queries to be satisfied from the buffer cache.  Those that aren't
>might still be satisfied from the OS's cache.
>
>Are you experiencing performance problems?  Presumably that's the
>real problem you're trying to solve.
>
>
Well, that is currently only development stage so i can't realy know if
there will be any perfomence problems.
But i am looking for a way to be sure i will not have one.And since i
already know what queries and tables i want
to cache those .Just saing to my boss " when we there will be
performance problems the postgresql will solve it "
would not do it.The case is that when runnig a test i am getting penalty
for requesting the table,of course the test
does query only once so i can't see the cache in work . What i realy
want is to say to postgres that the particular
table should be cached .
Thanks.



Re: caching table/query

From
Tom Lane
Date:
Tsirkin Evgeny <tsurkin@mail.jct.ac.il> writes:
> What i realy want is to say to postgres that the particular
> table should be cached .

Adopting the position that you are smarter than an automatic
optimization algorithm is generally a good way to achieve less
performance, not more.  Especially so if you are willing to override
the automatic algorithm on the strength of guesses that you make
when you haven't even seen the system in operation yet.  The truth
of the matter is that you haven't got the foggiest idea yet where
the system bottlenecks will be, and so you should certainly not be
thinking about how to override the cache manager.

If you've not heard the phrase "premature optimization is the root of
all evil", I suggest you read up on it.  It's generally good advice.

            regards, tom lane

Re: caching table/query

From
Tsirkin Evgeny
Date:
Well i guess that there are cases where that is not so.However i
understand that postgres does not have any way accept
the automatic.
Tom Lane wrote:

>Tsirkin Evgeny <tsurkin@mail.jct.ac.il> writes:
>
>
>>What i realy want is to say to postgres that the particular
>>table should be cached .
>>
>>
>
>Adopting the position that you are smarter than an automatic
>optimization algorithm is generally a good way to achieve less
>performance, not more.
>

Re: caching table/query

From
Tsirkin Evgeny
Date:
Well ,here is something that might be interesting:
trying to reduce query time and since that is a web application i now
using caching with
perl MLDBM::Sync::SDBM_File on application level .(Basically that
involves quering all the
table and putting it into MLDBM::Sync::SDBM_File and then quering it
back ) .What is
 interesting is that while  time quering the tables from postgres is
taken 0.3 sec. using sdbm
it takes 0.1 sec.
Actually i am not anymore sure that postgres does NOT cache table in memory,
maybe it does and i don't know about it?Maybe the time is spended in
transmitting the data
from postgres to the application? How can i test this?
Thanks.

Chris Travers wrote:

> Tsirkin Evgeny wrote:
>
>>
>> Well i guess that there are cases where that is not so.However i
>> understand that postgres does not have any way accept
>> the automatic.
>
>
> Largely true.  There may be ways of forcing a from-memory query once
> the system is in place, but I am not sure that this is always what you
> want, it would not be pretty, and might have some real performance
> issues.  My suggestion is that you wait until your system is in
> operation and then look first for general tuning help before you even
> consider trying to impliment a memory-cached table in your application.
>
> Just for the record, I can think of at least one way of hacking this
> on, but it seems like it should be an absolutely last resort rather
> than a first impulse.
>
> PostgreSQL can do this, but you probably don't want it to.
>
> Best Wishes,
> Chris Travers




Re: caching table/query

From
Michael Fuhr
Date:
On Wed, Jul 20, 2005 at 10:25:59AM +0300, Tsirkin Evgeny wrote:
> trying to reduce query time and since that is a web application i now
> using caching with perl MLDBM::Sync::SDBM_File on application level.
> (Basically that involves quering all the table and putting it into
> MLDBM::Sync::SDBM_File and then quering it back). What is interesting
> is that while  time quering the tables from postgres is taken 0.3 sec.
> using sdbm it takes 0.1 sec.

PostgreSQL isn't likely to win a speed contest against a trivial
storage/retrieval library when the test involves simple queries and
little or no concurrency.  PostgreSQL is a full-featured database
with goals beyond just "make SELECT statements as fast as possible."

> Actually i am not anymore sure that postgres does NOT cache table in memory,
> maybe it does and i don't know about it?

As I mentioned in an earlier message, you could enable statistics
gathering and use the statistics views to see whether your queries
are being satistified from the buffer cache or if they require a
call to the operating system:

http://www.postgresql.org/docs/8.0/static/monitoring-stats.html

The first time you query a table it will probably need to be fetched
from disk (or from the operating system's cache); subsequent queries
will likely be much faster because PostgreSQL will have stored the
fetched pages in its buffer cache.  Example:

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
----------------+---------------+---------------+--------------
              0 |             0 |             0 |            0
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Scan using foo_pkey on foo  (cost=0.00..3.01 rows=1 width=18) (actual time=73.686..73.698 rows=1 loops=1)
   Index Cond: (id = 12345)
 Total runtime: 73.934 ms
(3 rows)

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
----------------+---------------+---------------+--------------
              1 |             0 |             3 |            0
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Index Scan using foo_pkey on foo  (cost=0.00..3.01 rows=1 width=18) (actual time=0.072..0.083 rows=1 loops=1)
   Index Cond: (id = 12345)
 Total runtime: 0.237 ms
(3 rows)

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
----------------+---------------+---------------+--------------
              1 |             1 |             3 |            3
(1 row)

Notice how much faster the second query was, even though it was
identical to the first.  As you can see from the statistics view,
the second query's page fetches were all "hits," meaning they were
retrieved from the buffer cache.

If you want to analyze performance, then make sure your tests mimic
the activity you expect to see on the production system, such as
number of concurrent connections; amount of select, insert, update,
and delete activity; similar data sets in terms of size and
distribution; representative queries; etc.  Use PostgreSQL's and
the operating system's instrumentation to identify performance
bottlenecks, and then look for remedies.

BTW, pgsql-performance might be a more appropriate list for this
thread.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/



Re: caching table/query

From
Chris Travers
Date:
Tsirkin Evgeny wrote:

>
> Well ,here is something that might be interesting:
> trying to reduce query time and since that is a web application i now
> using caching with
> perl MLDBM::Sync::SDBM_File on application level .(Basically that
> involves quering all the
> table and putting it into MLDBM::Sync::SDBM_File and then quering it
> back ) .What is
> interesting is that while  time quering the tables from postgres is
> taken 0.3 sec. using sdbm
> it takes 0.1 sec.

Note too you have authentication and network latency issues when
querying a remote server.  Also PostgreSQL does a lot besides just
storing your data.  You have permissions enforcement, and a wide variety
of other things.  So it si unreasonable to assume that PostgreSQL will
be as fast as any simple storage mechanism for simple data storage.

>
> Actually i am not anymore sure that postgres does NOT cache table in
> memory,
> maybe it does and i don't know about it?Maybe the time is spended in
> transmitting the data
> from postgres to the application? How can i test this?

Normally, PostgreSQL will try to intelligently cache information in
memory.  The OS also will generally cache disk reads, so even if it
expires from PostgreSQL's cache, it may be in the OS disk cache.  So
yes, it tries to cache material in memory automatically.

If you want to try to force PostgreSQL to always cache such a table, I
guess it would be possible but any solution you come up with will have
three characteristics:  1)  It will reduce the general memory available
for other database operations (possibly causing things to page to disk
including possibly your cached copy) and this may reduce general
database performance, 2)  It will be messy and bolted on and 3)  It will
likely be brittle-- I don't want to know how easily you can recover
things in the event of a power outage.  PLEASE DON'T DO THIS but I think
one could use a tablespace on a ram disk with some sort of sync'ing
utility (rules/triggers) and intelligent copying of files to and from
disk before startup and after shutdown of the database server.  However,
you are on your own in the event of a power outage....

In general, Tom's advice is very good:  Build your application first,
and expect that PostgreSQL's optimizer will do a reasonable job of
optimizing your data access.  Then when you have bottlenecks, look for
ways around these.  Do as *little as possible* to optimize things and
look for the simplest steps to remove your bottlenecks.  This will
ensure that your application and data are maintainable down the road.
The optimizer is actually quite good.  Use it as your primary tool of
performance enhancement.

Remember Occam's Razor:  "One should not needlessly multiply entities."
(the most common translation I have seen regarding what William of Occam
actually said.)  This statement is a fabulous engineering principle.

Best Wishes,
Chris Travers
Metatron Technology Consulting



Re: caching table/query

From
Chris Travers
Date:
Tsirkin Evgeny wrote:

>
> Well i guess that there are cases where that is not so.However i
> understand that postgres does not have any way accept
> the automatic.

Largely true.  There may be ways of forcing a from-memory query once the
system is in place, but I am not sure that this is always what you want,
it would not be pretty, and might have some real performance issues.  My
suggestion is that you wait until your system is in operation and then
look first for general tuning help before you even consider trying to
impliment a memory-cached table in your application.

Just for the record, I can think of at least one way of hacking this on,
but it seems like it should be an absolutely last resort rather than a
first impulse.

PostgreSQL can do this, but you probably don't want it to.

Best Wishes,
Chris Travers

Attachment

Re: caching table/query

From
Michael Fuhr
Date:
On Sun, Jul 17, 2005 at 02:00:28PM +0300, Tsirkin Evgeny wrote:
>
> I have several pretty little tables (~300 - 400 records) which are
> queried very frequently.
> I would like to make postrgres be aware of this and do a in memory cache
> for it .

PostgreSQL maintains a buffer cache in shared memory; if you enable
statistics gathering then you can see how effective it is for the
tables in question:

http://www.postgresql.org/docs/8.0/static/monitoring-stats.html

If the tables are small and you're querying them frequently, and
if shared_buffers is adequately sized, then I'd expect most or all
queries to be satisfied from the buffer cache.  Those that aren't
might still be satisfied from the OS's cache.

Are you experiencing performance problems?  Presumably that's the
real problem you're trying to solve.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


Re: caching table/query

From
Chris Travers
Date:
Tsirkin Evgeny wrote:

>
> Well ,here is something that might be interesting:
> trying to reduce query time and since that is a web application i now
> using caching with
> perl MLDBM::Sync::SDBM_File on application level .(Basically that
> involves quering all the
> table and putting it into MLDBM::Sync::SDBM_File and then quering it
> back ) .What is
> interesting is that while  time quering the tables from postgres is
> taken 0.3 sec. using sdbm
> it takes 0.1 sec.

Note too you have authentication and network latency issues when
querying a remote server.  Also PostgreSQL does a lot besides just
storing your data.  You have permissions enforcement, and a wide variety
of other things.  So it si unreasonable to assume that PostgreSQL will
be as fast as any simple storage mechanism for simple data storage.

>
> Actually i am not anymore sure that postgres does NOT cache table in
> memory,
> maybe it does and i don't know about it?Maybe the time is spended in
> transmitting the data
> from postgres to the application? How can i test this?

Normally, PostgreSQL will try to intelligently cache information in
memory.  The OS also will generally cache disk reads, so even if it
expires from PostgreSQL's cache, it may be in the OS disk cache.  So
yes, it tries to cache material in memory automatically.

If you want to try to force PostgreSQL to always cache such a table, I
guess it would be possible but any solution you come up with will have
three characteristics:  1)  It will reduce the general memory available
for other database operations (possibly causing things to page to disk
including possibly your cached copy) and this may reduce general
database performance, 2)  It will be messy and bolted on and 3)  It will
likely be brittle-- I don't want to know how easily you can recover
things in the event of a power outage.  PLEASE DON'T DO THIS but I think
one could use a tablespace on a ram disk with some sort of sync'ing
utility (rules/triggers) and intelligent copying of files to and from
disk before startup and after shutdown of the database server.  However,
you are on your own in the event of a power outage....

In general, Tom's advice is very good:  Build your application first,
and expect that PostgreSQL's optimizer will do a reasonable job of
optimizing your data access.  Then when you have bottlenecks, look for
ways around these.  Do as *little as possible* to optimize things and
look for the simplest steps to remove your bottlenecks.  This will
ensure that your application and data are maintainable down the road.
The optimizer is actually quite good.  Use it as your primary tool of
performance enhancement.

Remember Occam's Razor:  "One should not needlessly multiply entities."
(the most common translation I have seen regarding what William of Occam
actually said.)  This statement is a fabulous engineering principle.

Best Wishes,
Chris Travers
Metatron Technology Consulting



Re: caching table/query

From
Michael Fuhr
Date:
On Wed, Jul 20, 2005 at 10:25:59AM +0300, Tsirkin Evgeny wrote:
> trying to reduce query time and since that is a web application i now
> using caching with perl MLDBM::Sync::SDBM_File on application level.
> (Basically that involves quering all the table and putting it into
> MLDBM::Sync::SDBM_File and then quering it back). What is interesting
> is that while  time quering the tables from postgres is taken 0.3 sec.
> using sdbm it takes 0.1 sec.

PostgreSQL isn't likely to win a speed contest against a trivial
storage/retrieval library when the test involves simple queries and
little or no concurrency.  PostgreSQL is a full-featured database
with goals beyond just "make SELECT statements as fast as possible."

> Actually i am not anymore sure that postgres does NOT cache table in memory,
> maybe it does and i don't know about it?

As I mentioned in an earlier message, you could enable statistics
gathering and use the statistics views to see whether your queries
are being satistified from the buffer cache or if they require a
call to the operating system:

http://www.postgresql.org/docs/8.0/static/monitoring-stats.html

The first time you query a table it will probably need to be fetched
from disk (or from the operating system's cache); subsequent queries
will likely be much faster because PostgreSQL will have stored the
fetched pages in its buffer cache.  Example:

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
----------------+---------------+---------------+--------------
              0 |             0 |             0 |            0
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Scan using foo_pkey on foo  (cost=0.00..3.01 rows=1 width=18) (actual time=73.686..73.698 rows=1 loops=1)
   Index Cond: (id = 12345)
 Total runtime: 73.934 ms
(3 rows)

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
----------------+---------------+---------------+--------------
              1 |             0 |             3 |            0
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Index Scan using foo_pkey on foo  (cost=0.00..3.01 rows=1 width=18) (actual time=0.072..0.083 rows=1 loops=1)
   Index Cond: (id = 12345)
 Total runtime: 0.237 ms
(3 rows)

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
----------------+---------------+---------------+--------------
              1 |             1 |             3 |            3
(1 row)

Notice how much faster the second query was, even though it was
identical to the first.  As you can see from the statistics view,
the second query's page fetches were all "hits," meaning they were
retrieved from the buffer cache.

If you want to analyze performance, then make sure your tests mimic
the activity you expect to see on the production system, such as
number of concurrent connections; amount of select, insert, update,
and delete activity; similar data sets in terms of size and
distribution; representative queries; etc.  Use PostgreSQL's and
the operating system's instrumentation to identify performance
bottlenecks, and then look for remedies.

BTW, pgsql-performance might be a more appropriate list for this
thread.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/



Re: caching table/query

From
Tsirkin Evgeny
Date:
Michael Fuhr wrote:

>On Sun, Jul 17, 2005 at 02:00:28PM +0300, Tsirkin Evgeny wrote:
>
>
>>I have several pretty little tables (~300 - 400 records) which are
>>queried very frequently.
>>I would like to make postrgres be aware of this and do a in memory cache
>>for it .
>>
>>
>
>PostgreSQL maintains a buffer cache in shared memory; if you enable
>statistics gathering then you can see how effective it is for the
>tables in question:
>
>http://www.postgresql.org/docs/8.0/static/monitoring-stats.html
>
>If the tables are small and you're querying them frequently, and
>if shared_buffers is adequately sized, then I'd expect most or all
>queries to be satisfied from the buffer cache.  Those that aren't
>might still be satisfied from the OS's cache.
>
>Are you experiencing performance problems?  Presumably that's the
>real problem you're trying to solve.
>
>
Well, that is currently only development stage so i can't realy know if
there will be any perfomence problems.
But i am looking for a way to be sure i will not have one.And since i
already know what queries and tables i want
to cache those .Just saing to my boss " when we there will be
performance problems the postgresql will solve it "
would not do it.The case is that when runnig a test i am getting penalty
for requesting the table,of course the test
does query only once so i can't see the cache in work . What i realy
want is to say to postgres that the particular
table should be cached .
Thanks.



Re: caching table/query

From
Tsirkin Evgeny
Date:
Well ,here is something that might be interesting:
trying to reduce query time and since that is a web application i now
using caching with
perl MLDBM::Sync::SDBM_File on application level .(Basically that
involves quering all the
table and putting it into MLDBM::Sync::SDBM_File and then quering it
back ) .What is
 interesting is that while  time quering the tables from postgres is
taken 0.3 sec. using sdbm
it takes 0.1 sec.
Actually i am not anymore sure that postgres does NOT cache table in memory,
maybe it does and i don't know about it?Maybe the time is spended in
transmitting the data
from postgres to the application? How can i test this?
Thanks.

Chris Travers wrote:

> Tsirkin Evgeny wrote:
>
>>
>> Well i guess that there are cases where that is not so.However i
>> understand that postgres does not have any way accept
>> the automatic.
>
>
> Largely true.  There may be ways of forcing a from-memory query once
> the system is in place, but I am not sure that this is always what you
> want, it would not be pretty, and might have some real performance
> issues.  My suggestion is that you wait until your system is in
> operation and then look first for general tuning help before you even
> consider trying to impliment a memory-cached table in your application.
>
> Just for the record, I can think of at least one way of hacking this
> on, but it seems like it should be an absolutely last resort rather
> than a first impulse.
>
> PostgreSQL can do this, but you probably don't want it to.
>
> Best Wishes,
> Chris Travers




Re: caching table/query

From
Chris Travers
Date:
Tsirkin Evgeny wrote:

>
> Well i guess that there are cases where that is not so.However i
> understand that postgres does not have any way accept
> the automatic.

Largely true.  There may be ways of forcing a from-memory query once the
system is in place, but I am not sure that this is always what you want,
it would not be pretty, and might have some real performance issues.  My
suggestion is that you wait until your system is in operation and then
look first for general tuning help before you even consider trying to
impliment a memory-cached table in your application.

Just for the record, I can think of at least one way of hacking this on,
but it seems like it should be an absolutely last resort rather than a
first impulse.

PostgreSQL can do this, but you probably don't want it to.

Best Wishes,
Chris Travers

Attachment