Thread: cache table

cache table

From

Joseph Shraibman

Date:

03 May 2004, 23:46:11

I have a big table with some int fields.  I frequently need to do
queries like:

SELECT if2, count(*) FROM table WHERE if1 = 20 GROUP BY if2;

The problem is that this is slow and frequently requires a seqscan. I'd
like to cache the results in a second table and update the counts with
triggers, but this would a) require another UPDATE for each
INSERT/UPDATE which would slow down adding and updating of data and b)
produce a large amount of dead rows for vacuum to clear out.

It would also be nice if this small table could be locked into the pg
cache somehow.  It doesn't need to store the data on disk because the
counts can be generated from scratch?

So what is the best solution to this problem?  I'm sure it must come up
pretty often.

Re: cache table

From

"scott.marlowe"

Date:

04 May 2004, 10:55:50

On Mon, 3 May 2004, Joseph Shraibman wrote:

> I have a big table with some int fields.  I frequently need to do
> queries like:
>
> SELECT if2, count(*) FROM table WHERE if1 = 20 GROUP BY if2;
>
> The problem is that this is slow and frequently requires a seqscan. I'd
> like to cache the results in a second table and update the counts with
> triggers, but this would a) require another UPDATE for each
> INSERT/UPDATE which would slow down adding and updating of data and b)
> produce a large amount of dead rows for vacuum to clear out.
>
> It would also be nice if this small table could be locked into the pg
> cache somehow.  It doesn't need to store the data on disk because the
> counts can be generated from scratch?

I think you might be interested in materialized views.  You could create
this as a materialized view which should be very fast to just select *
from.

While materialized views aren't a standard part of PostgreSQL just yet,
there is a working implementation available from Jonathan Gardner at:

http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

It's all implemented with plpgsql and is quite interesting to read
through.  IT has a nice tutorial methodology to it.

Re: cache table

From

Joseph Shraibman

Date:

04 May 2004, 12:46:42

scott.marlowe wrote:

> I think you might be interested in materialized views.  You could create
> this as a materialized view which should be very fast to just select *
> from.

That seems to be the count table I envisioned.  It just hides the
details for me.  It still has the problems of an extra UPDATE every time
the data table is updated and generating a lot of dead tuples.

Re: cache table

From

Greg Stark

Date:

04 May 2004, 15:15:03

Joseph Shraibman <jks@selectacast.net> writes:

> scott.marlowe wrote:
>
> > I think you might be interested in materialized views.  You could create this
> > as a materialized view which should be very fast to just select * from.
>
> That seems to be the count table I envisioned.  It just hides the details for
> me.  It still has the problems of an extra UPDATE every time the data table is
> updated and generating a lot of dead tuples.

The dead tuples is only going to be a problem if you have lots of updates. If
that's the case then you're also going to have problems with contention. This
trigger will essentially serialize all inserts, deletes, updates at least
within a group. If you use transactions with multiple such updates then you
will also risk creating deadlocks.

But I think these problems are fundamental to the problem you've described.
Keeping denormalized aggregate data like this inherently creates contention on
the data and generates lots of old data. It's only really useful when you have
few updates and many many reads.

If you know more about the behaviour of the updates then there might be other
options. Like, do you need precise data or only approximate data? If
approximate perhaps you could just do a periodic refresh of the denormalized
view and use that.

Are all the updates to the data you'll be querying coming from within the same
application context? In which case you can keep a cache locally in the
application and update it locally. I often do this when I have rarely updated
or insert-only data, I just do a lazy cache using a perl hash or equivalent.

If you're really happy with losing the cache, and you don't need complex
transactions or care about serializing updates then you could use something
like memcached (http://www.danga.com/memcached/). That might be your best fit
for how you describe your requirements.

--
greg