Again, sorry, caching. - Mailing list pgsql-hackers

From mlw
Subject Again, sorry, caching.
Date
Msg-id 3C935038.BE59E93B@mohawksoft.com
Whole thread Raw
Responses Re: Again, sorry, caching.  (Greg Copeland <greg@CopelandConsulting.Net>)
Re: Again, sorry, caching.  (Greg Copeland <greg@CopelandConsulting.Net>)
Re: Again, sorry, caching.  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Re: Again, sorry, caching.  (Karel Zak <zakkr@zf.jcu.cz>)
Re: Again, sorry, caching.  (Neil Conway <nconway@klamath.dyndns.org>)
List pgsql-hackers
I traded a couple emails with a guy using one of my open source projects. To
make a long story short, he is going to the new version of MySQL for his
website because of the new caching feature. He is convinced that it will speed
up his web site, and he is probably right.

On a web site, a few specific queries get executed, unchanged, repeatedly.
Think about an ecommerce site, most of the time it is just a handful of basic
queries. These basic queries are usually against pretty large product tables. A
caching mechanism would make these queries pretty light weight.

The arguments against caching:

"It is an application issue"
This is completely wrong. Caching can not be done against a database without
knowledge of the database, i.e. when the data changes.

"If it is mostly static data, why not just make it a static page?"
Because a static page is a maintenance nightmare. One uses a database in a web
site to allow content to be changed and upgraded dynamically and with a minimum
of work.

"It isn't very useful"
I disagree completely. A cache of most frequently used queries, or specific
ones, could make for REALLY good performance in some very specific, but very
common, applications. Any system that has a hierarchical "drill down" interface
to a data set, ecommerce, libraries, document management systems, etc. will
greatly benefit from a query cache.

I was thinking that it could be implemented as a keyword or comment in a query.
Such as:

select * from table where column = 'foo' cacheable
or
select * from table where column = 'bar' /* cacheable */

Either way, it would speed up a lot of common application types. It would even
be very cool if you could just cache the results of sub queries, such as:

select * from (select * from table where col1 = 'foo' cacheable) as subset
where subset.col2 = 'bar' ;

Which would mean that the subquery gets cached, but the greater select need not
be. The cache could be like a global temp table. Perhaps the user could even
name the cache entry:

select * from table where column = 'foo' cache on foo

Where one could also do:

select * from cache_foo

Using a keyword is probably a better idea, it can be picked up by the parser
and instruct PostgreSQL to use the cache, otherwise there will be no additional
overhead.

Having caching within PostgreSQL will be good for data integrity. Application
caches can't tell when an update/delete/insert happens, they often have to use
a time-out mechanism.

OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
am.


pgsql-hackers by date:

Previous
From: "Vadim Mikheev"
Date:
Subject: Re: [BUGS] Bug #613: Sequence values fall back to previously chec
Next
From: Greg Copeland
Date:
Subject: Re: Again, sorry, caching.