On Wed, 2004-06-09 at 16:32, jao@geophile.com wrote:
> I have an application with a table that tracks objects with a "size"
> attribute. What we want to do is to periodically report on the number
> of these objects and the sum of the object sizes. The table will
> typically have 1-5 million rows.
>
> I know that the obvious "select count(*), sum(size) ..." is not a good
> idea, and we have seen the impact such a query has on performance.
>
> Then we tried creating triggers to maintain a table with object_count
> and sum_size columns:
>
> create table summary(object_count bigint, sum_size bigint)
>
> We populate the table with exactly one row, initialized to (0, 0).
> A trigger runs an update like this:
>
> update summary
> set object_count = object_count + 1,
> sum_size = sum_size + new.size
>
> The problem is that our application has periods where objects are
> being created at a rapid rate. Updates to the summary table slow down
> over time, and the slowdown is accompanied by very low CPU idle% as
> reported by vmstat.
I'm wondering if these data are time sensitive, and if so, how
sensitive. If they could be up to say a minute old or something, using
a materialized view of some sort might work well here. Or a temp table
to hold all the incoming data until it needs to be dumped into the real
table.
Just a thought, might not work for you.
Also, do you have a lot of parallel updates to the same rows, or are
they all different rows? If you're doing a lot of updating of the same
rows over and over, any database is going to be somewhat slow, either in
vacuuming like for postgresql, or locking as in row level locking
databases.
If they're all different, then this should be no different for
postgresql than for a row locking database, since each tuple will only
have two or three versions, instead of some smaller percentage of rows
having hundreds of versions.
Just wondering.