Re: Database Caching - Mailing list pgsql-hackers

From F Harvell
Subject Re: Database Caching
Date
Msg-id 200203051635.g25GZW520476@odin.fts.net
Whole thread Raw
In response to Re: Database Caching  (Hannu Krosing <hannu@krosing.net>)
List pgsql-hackers
On 01 Mar 2002 10:24:39 +0500, Hannu Krosing wrote:
> ...
> 
> > I don't see how result caching can be a win, since it can be done when
> > needed anyway, without adding complexity to the database engine.  Just
> > have the application cache the result set.  Certainly a web server could
> > do this, if needed.
> 
> More advanced application server can do it all right. But you still need
> sound cache invalidation mechanisms.
> 
> > If there were a way to mark a database as read only (and this could not
> > be changed unless the entire database is shut down and restarted in
> > read/write mode) then there might be some utility to result set cache.
> > Otherwise, I think it will be wasted effort.  It might be worthwhile to
> > do the same for individual tables (with the same sort of restrictions).
> > But think of all the effort that would be needed to do this properly,
> > and what sort of payback would be received from it?
> 
> The payback will be blazingly fast slashdot type applications with very
> little effort from end application programmer.
> 
> > Again, the same goals can easily be accomplished without having to
> > perform major surgery on the database system.  I suspect that there is
> > some logical reason that Oracle/Sybase/IBM/Microsoft have not bothered
> > with it.
> 
> I think they were designed/developed when WWW was nonexistent, and
> client-server meant a system where client and server where separated by
> a (slow) network connection that would negate most of the benefit from
> server side cacheing. On todays application server scenario the client
> (AS) and server (DB) are usually very close, if not on the same computer
> and thus effectively managed cache can be kept on DB to avoid all the
> cache invalidation logic going through DB-AS link.
> 
> ...

You have identified one of the most valuable reasons for query/result
caching.  As I read the threads going on about caching, it reminds me
of the arguments within MySQL about the need for referential integrity
and transactions.  Yes, the application can do it, however, by having
the database do it, it frees the application programmer to perform
more important tasks (e.g., more features).

Your insights about the caching, etc. in the older, legacy databases
is very likely the case.  With the development of high speed
networking, etc., it is now very feasible to move the caching closer
to the sources of change/invalidation.

Quite frankly, while certainly there are literally millions of
applications that would find minimal value in query/results caching, I
know that, at least for web applications, that the query/results
caching _would_ be very valuable.

Certainly, as has been pointed out several times, the application can
do the caching, however, utilizing todays "generic" tools such as
Apache and PHP, it is relatively difficult.  By moving caching to the
database server, there is little that needs to be done by the
application to realize the benefits.

For example, with my typical application, as a straight dynamic
application, the process would be approximately:
 1) receive the request "http://www.xyz.com/index.php"
 2) query the current page contents from the database:
select * from table where dateline = '2002-03-05';
 3) begin the HTML output
 4) loop through the select results printing the contents
 5) complete the HTML output

With caching within the database, this would likely achieve
performance good enough to serve millions of requests per day.  (This
is the case for some of our sites using Intersystems Cache which does
do query caching and serves over 2 million page views per day.)

Without the database caching, it is necessary to have the application
perform this function.  Because of the short life of an Apache/PHP
process, caching needs to be performed externally to the application:
 1) receive the request "http://www.xyz.com/index.php"
 2) check the age of the cache file (1min, 5min ???):
 3) if the cache is not fresh:
 3.1) lock the cache file
 3.2) query the database
 3.3) begin HTML output to the cache file
 3.4) loop through select results
 3.5) complete the HTML output to the cache file
 3.6) unlock the cache file
 4) open the cache file (i.e., wait for locked file)
 5) read/passthrough cache contents

(Please note that this is one, simplified way to do the caching.  It
assumes that the data becomes stale over time and needs to be
refreshed.  It also uses a file cache instead of a memory cache.
Certainly things could be made more efficient through the use of
notifications from the database and the use of shared memory.  Another
path would be to have an external program generating the pages and
then placing the "static" pages into service (which requires changes
to apache and/or the OS due to cache file invalidation during the
generation process). Both paths make the application more complex
requiring more programming time.)

It is possible to have an application server do this caching for you.
Of course, that in turn has its own complications.  For applications
that do not need the added "features" of an application server,
database caching can be a big win in both processing/response time as
well as programmer time.

Thanks,
F Harvell




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Next
From: Thomas Lockhart
Date:
Subject: Mandrake RPMs uploaded