Re: Database Caching - Mailing list pgsql-hackers

From Justin Clift
Subject Re: Database Caching
Date
Msg-id 3C7FBEC6.7F540E09@postgresql.org
Whole thread Raw
In response to Re: Database Caching  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Responses Re: Database Caching  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-hackers
Hi guys,

Stephan Szabo wrote:
<snip> 
> The question is, when it's invalidated, how does it become valid again?
> I don't see that there's a way to do it only by query string that doesn't
> result in meaning that the cache cannot cache a query again until any
> transactions that can see the prior state are finished since otherwise
> you'd be providing the incorrect results to that transaction. But I
> haven't spent much time thinking about it either.

It seems like a good idea to me, but only if it's optional.  It could
get in the way for systems that don't need it, but would be really
beneficial for some types of systems which are read-only or mostly-read
only (with consistent queries) in nature.

i.e.  Lets take a web page where clients can look up which of 10,000
records are either .biz, .org, .info, or .com.

So, we have a database query of simply:

SELECT name FROM sometable WHERE tld = 'biz';

And lets say 2,000 records come back, which are cached.

Then the next query comes in, which is :

SELECT name FROM sometable WHERE tld = 'info';

And lets say 3,000 records come back, which are also cached.

Now, both of these queries are FULLY cached.  So, if either query
happens again, it's a straight memory read and dump, no disk activity
involved, etc (very fast in comparison).

Now, lets say a transaction which involves a change of "sometable"
COMMITs.  This should invalidate these results in the cache, as the
viewpoint of the transaction could now be incorrect (there might now be
less or more or different results for .info or .biz).  The next queries
will be cached too, and will keep upon being cached until the next
transaction involving a change to "sometable" COMMITs.

In this type of database access, this looks like a win.

But caching results in this matter could be a memory killer for those
applications which aren't so predictable in their queries, or are not so
read-only.  That's why I feel it should be optional, but I also feel it
should be added due to what looks like massive wins without data
integrity nor reliability issues.

Hope this helps.

:-)

Regards and best wishes,

Justin Clift

> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: elog() patch
Next
From: Peter Eisentraut
Date:
Subject: Re: elog() patch