Re: Again, sorry, caching. - Mailing list pgsql-hackers

From mlw
Subject Re: Again, sorry, caching.
Date
Msg-id 3C95F53C.9E14AE32@mohawksoft.com
Whole thread Raw
In response to Again, sorry, caching.  (mlw <markw@mohawksoft.com>)
List pgsql-hackers
Jean-Michel POURE wrote:
> 
> Le Lundi 18 Mars 2002 13:23, mlw a écrit :
> > Lets face it, MySQL wins a lot of people because they put in features that
> > people want.
> 
> MySQL is very interested in benchmarks.
> It does not really care for data consistency.

In no way am I suggesting we avoid ACID compliance. In no way am I suggesting
that PostgreSQL change. All I am suggesting is that tables which change
infrequently can and should be cached.

select * from table where foo = 'bar'

Need not be executed twice if the table has not changed. 

select * from table1, (select * from table2 where foo='bar' cacheable) as
subset were subset.col1 = table1.col1;

In the above query, if table two changes 4 times a day, and it queried a couple
times a minute or second, the caching of the subset could save a huge amount of
disk I/O.

This sort of query could improve many catalog based implementations, from
music, to movies, to books. A library could implement a SQL query for book
lookups like this:

select * from authors, (select * from books where genre = 'scifi' cacheable) as
subset where authors.id = subset.auhorid and authors.id in (....)

Yes it is arguable that index scans may work better, and obviously, summary
tables may help, etc. but imagine a more complex join which produces fewer
records, but is executed frequently. Caching could help the performance of
PostgreSQL in some very real applications.

MySQL's quest for benchmarking numbers, I agree, is shameful because they
create numbers which are not really applicable in the real world. This time,
however, I think they may be on to something.

(1) PostgreSQL use a "cacheable" or "iscacheable" keyword.
(2) If the query uses functions which are not marked as "iscacheable," then it
is not cached.
(3) If any table contained within the cacheable portion of the query is
modified, the cache is marked as dirty.
(4) No provisions are made to recreate the cache after an insert/update/delete.
(5) The first query marked as "iscacheable" that encounters a "dirty" flag in a
table, does an exhaustive search on the cache and removes all entries that are
affected.


As far as I can see, if the above parameters are used to define caching, it
could improve performance on sites where a high number of transactions are
made, where there is also a large amount of static data, i.e. a ecommerce site,
library, etc. If the "iscacheable" keyword is not used, PostgreSQL will not
incur any performance degradation. However, if he "iscacheable" keyword is
used, the performance loss could very well be made up by the benefits of
caching.


pgsql-hackers by date:

Previous
From: Greg Copeland
Date:
Subject: Re: Again, sorry, caching.
Next
From: mlw
Date:
Subject: Re: Again, sorry, caching.