Thread: cache table
I have a big table with some int fields. I frequently need to do queries like: SELECT if2, count(*) FROM table WHERE if1 = 20 GROUP BY if2; The problem is that this is slow and frequently requires a seqscan. I'd like to cache the results in a second table and update the counts with triggers, but this would a) require another UPDATE for each INSERT/UPDATE which would slow down adding and updating of data and b) produce a large amount of dead rows for vacuum to clear out. It would also be nice if this small table could be locked into the pg cache somehow. It doesn't need to store the data on disk because the counts can be generated from scratch? So what is the best solution to this problem? I'm sure it must come up pretty often.
On Mon, 3 May 2004, Joseph Shraibman wrote: > I have a big table with some int fields. I frequently need to do > queries like: > > SELECT if2, count(*) FROM table WHERE if1 = 20 GROUP BY if2; > > The problem is that this is slow and frequently requires a seqscan. I'd > like to cache the results in a second table and update the counts with > triggers, but this would a) require another UPDATE for each > INSERT/UPDATE which would slow down adding and updating of data and b) > produce a large amount of dead rows for vacuum to clear out. > > It would also be nice if this small table could be locked into the pg > cache somehow. It doesn't need to store the data on disk because the > counts can be generated from scratch? I think you might be interested in materialized views. You could create this as a materialized view which should be very fast to just select * from. While materialized views aren't a standard part of PostgreSQL just yet, there is a working implementation available from Jonathan Gardner at: http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html It's all implemented with plpgsql and is quite interesting to read through. IT has a nice tutorial methodology to it.
scott.marlowe wrote: > I think you might be interested in materialized views. You could create > this as a materialized view which should be very fast to just select * > from. That seems to be the count table I envisioned. It just hides the details for me. It still has the problems of an extra UPDATE every time the data table is updated and generating a lot of dead tuples.
Joseph Shraibman <jks@selectacast.net> writes: > scott.marlowe wrote: > > > I think you might be interested in materialized views. You could create this > > as a materialized view which should be very fast to just select * from. > > That seems to be the count table I envisioned. It just hides the details for > me. It still has the problems of an extra UPDATE every time the data table is > updated and generating a lot of dead tuples. The dead tuples is only going to be a problem if you have lots of updates. If that's the case then you're also going to have problems with contention. This trigger will essentially serialize all inserts, deletes, updates at least within a group. If you use transactions with multiple such updates then you will also risk creating deadlocks. But I think these problems are fundamental to the problem you've described. Keeping denormalized aggregate data like this inherently creates contention on the data and generates lots of old data. It's only really useful when you have few updates and many many reads. If you know more about the behaviour of the updates then there might be other options. Like, do you need precise data or only approximate data? If approximate perhaps you could just do a periodic refresh of the denormalized view and use that. Are all the updates to the data you'll be querying coming from within the same application context? In which case you can keep a cache locally in the application and update it locally. I often do this when I have rarely updated or insert-only data, I just do a lazy cache using a perl hash or equivalent. If you're really happy with losing the cache, and you don't need complex transactions or care about serializing updates then you could use something like memcached (http://www.danga.com/memcached/). That might be your best fit for how you describe your requirements. -- greg