Re: Again, sorry, caching. - Mailing list pgsql-hackers

From Greg Copeland
Subject Re: Again, sorry, caching.
Date
Msg-id 1016288914.24600.103.camel@mouse.copelandconsulting.net
Whole thread Raw
In response to Again, sorry, caching.  (mlw <markw@mohawksoft.com>)
List pgsql-hackers
I previously replied to you vaguely describing a way you could implement
this by using a combination of client side caching and database tables
and triggers to allow you to determine if your cache is still valid.
Someone came right behind me, Tom maybe??, and indicated that the
proper/ideal way to do this would be to using postgres' asynchronous
database notification mechanisms (listen/notify I believe were the
semantics) to alert your application that your cache has become
invalid.  Basically, a couple of triggers and the use of the list/notify
model, and you should be all set.

Done properly, a client side cache which is asynchronously notified by
the database when it's contents become invalid should be faster than
relying on MySQL's database caching scheme.  Basically, a strong client
side cache is going to prevent your database from even having to return
a cached result set while a database side cache is going to always
return a result set.  Of course, one of the extra cool things you can do
is to cache a gzip'd copy of the data contents which would further act
as an optimization preventing the client or web server (in case they are
different) from having to recompress every result set.

In the long run, again, if properly done, you should be able to beat
MySQL's implementation without too extra much effort.  Why?  Because a
client side cache can be much smarter in the way that it uses it's
cached contents much in the same way an application is able to better
cache it's data then what the file system is able to do.  This is why an
client side cache should be preferred over that of a database result set
cache.

Greg

References:
http://www.postgresql.org/idocs/index.php?sql-notify.html
http://www.postgresql.org/idocs/index.php?sql-listen.html


On Sat, 2002-03-16 at 08:01, mlw wrote:
> I traded a couple emails with a guy using one of my open source projects. To
> make a long story short, he is going to the new version of MySQL for his
> website because of the new caching feature. He is convinced that it will speed
> up his web site, and he is probably right.
>
> On a web site, a few specific queries get executed, unchanged, repeatedly.
> Think about an ecommerce site, most of the time it is just a handful of basic
> queries. These basic queries are usually against pretty large product tables. A
> caching mechanism would make these queries pretty light weight.
>
> The arguments against caching:
>
> "It is an application issue"
> This is completely wrong. Caching can not be done against a database without
> knowledge of the database, i.e. when the data changes.
>
> "If it is mostly static data, why not just make it a static page?"
> Because a static page is a maintenance nightmare. One uses a database in a web
> site to allow content to be changed and upgraded dynamically and with a minimum
> of work.
>
> "It isn't very useful"
> I disagree completely. A cache of most frequently used queries, or specific
> ones, could make for REALLY good performance in some very specific, but very
> common, applications. Any system that has a hierarchical "drill down" interface
> to a data set, ecommerce, libraries, document management systems, etc. will
> greatly benefit from a query cache.
>
> I was thinking that it could be implemented as a keyword or comment in a query.
> Such as:
>
> select * from table where column = 'foo' cacheable
> or
> select * from table where column = 'bar' /* cacheable */
>
> Either way, it would speed up a lot of common application types. It would even
> be very cool if you could just cache the results of sub queries, such as:
>
> select * from (select * from table where col1 = 'foo' cacheable) as subset
> where subset.col2 = 'bar' ;
>
> Which would mean that the subquery gets cached, but the greater select need not
> be. The cache could be like a global temp table. Perhaps the user could even
> name the cache entry:
>
> select * from table where column = 'foo' cache on foo
>
> Where one could also do:
>
> select * from cache_foo
>
> Using a keyword is probably a better idea, it can be picked up by the parser
> and instruct PostgreSQL to use the cache, otherwise there will be no additional
> overhead.
>
> Having caching within PostgreSQL will be good for data integrity. Application
> caches can't tell when an update/delete/insert happens, they often have to use
> a time-out mechanism.
>
> OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
> am.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


pgsql-hackers by date:

Previous
From: mlw
Date:
Subject: Again, sorry, caching.
Next
From: mlw
Date:
Subject: Re: Again, sorry, caching.