My only post with regard to query caching - Mailing list pgsql-hackers

From Adam Haberlach
Subject My only post with regard to query caching
Date
Msg-id 20020318095807.B20261@newsnipple.com
Whole thread Raw
List pgsql-hackers
After reading way too many threads on this (probably too many on pgsql-*
in general) I'll just go over how I feel about the caching "issue".

It seems that MySQL has implemented a system that allows the database to
cache queries that are frequently used and reduce latency for them.  This,
to me, seems like a very nice low-hanging fruit optimization, especially
for web systems.

===  Examples

Search Pages:I implented a bug database.  The main entry point was a "Define your
search" page which presented quite a few options.  Many of them were
drop-down lists.  This page did five or six queries to do things like find
the list of engineers in the company, categories for bugs, and versions of
software.  The results of those queries probably changed once per month,
but were done several times/day.  While they are simple and may not have
cost much, I can see how a simple cache would make them cost less.

Home Pages:Frequently, in the 'blog case (such as my home page), a lookup is done
every time the page is hit.  I update that table every couple of days, but
it is accessed much more often.  Once again, this is a fairly common
usage pattern in the web environment that /may/ be a good candiate for this
sort of caching.
These are two frequently-used design patterns which I think would
benefit from this optimization.  MySQL, and some of their customers seem
to think so, too.

=== Common Arguments
"This shouldn't be in the database!"
Arguably, yes.  This is something that might be better handled by the
application server.  The app server may or may not have a unified connection
pool to the database and can better organize the queries and caching.On the other hand, for the case of a database that
isnot on the same machine
 
as the webserver, this is a good chance to reduce bandwidth.

"This is going to make things ugly/hard to implement/etc"...Personally, I feel that too many of PostgreSQL's potential
featuresget
 
rejected out-of-hand on the grounds that implementation will be difficult or
that it will "make things gross" (as though parts of PostgreSQL aren't gross
already).  While I've not looked /too/ closely, it seems that if one were
to create a way for the system to maintain the results of a query, keyed by
the text of the query itself, it would be easy for something in the query
sequence to check and see if the query has already been done, and access it.
We already hold resultsets between queries in order to handle cursors,
so most of the framework must already be in there.  Just keep each 'cacheable'
query.NOTE: This probably implies that in the simple case, the cache cannot
be used between different connections.
The other issue is the expiration of the cache entries.  Once again, for
the "Home Pages" case above, I would be perfectly satisified if the cache
was entirely blown away every time any UPDATE query was executed.  This would
handle most cases, except for triggers on non-UPDATE queries.  Otherwise, we
would need to less simple-case the issue by tracking when tables are actually
updated, and for even more bonus points, track which tables affect which
cache entries.

===

Editorial:
PostgreSQL seems to spend a lot of time stressing ACID, and I believe
this is a very good thing.  I simply don't trust MySQL any more then I trust
any other SQL interface to a flat datafile.  Also, PostgreSQL has some very
handy features involving datatypes, triggers, and stored procedures.  But
you all know that.
MySQL is doing some things right.  They are providing useful documentation.
They are adding features that target the web market (they may be adding them
incorrectly, however).  If we expect PostgreSQL to beat MySQL in anything
but "My database is transactionally secure" and "We have a GECO optimizer"
pissing wars, we'll need to start becoming a little more competitive in the
raw speed arena.  I feel that this optimization, while it may not be trivial,
is fairly low-hanging fruit that can help.  I may even try to implement it,
but I make no guarantees.


-- 
Adam Haberlach         | Who buys an eight-processor machine and then
adam@newsnipple.com    | watches 30 movies on it all at the same time?
http://newsnipple.com  | Beats me.  They told us they could sell it, so                      | we made it.       --
GeorgeHoffman, Be Engineer
 


pgsql-hackers by date:

Previous
From: "Luis Alberto Amigo Navarro"
Date:
Subject: Re: postgres is not using tas
Next
From: Doug McNaught
Date:
Subject: Re: My only post with regard to query caching