Postgresql vs. aggregates - Mailing list pgsql-general
From | jao@geophile.com |
---|---|
Subject | Postgresql vs. aggregates |
Date | |
Msg-id | 1086820378.40c7901a40d04@geophile.com Whole thread Raw |
Responses |
Re: Postgresql vs. aggregates
Re: Postgresql vs. aggregates |
List | pgsql-general |
I have an application with a table that tracks objects with a "size" attribute. What we want to do is to periodically report on the number of these objects and the sum of the object sizes. The table will typically have 1-5 million rows. I know that the obvious "select count(*), sum(size) ..." is not a good idea, and we have seen the impact such a query has on performance. Then we tried creating triggers to maintain a table with object_count and sum_size columns: create table summary(object_count bigint, sum_size bigint) We populate the table with exactly one row, initialized to (0, 0). A trigger runs an update like this: update summary set object_count = object_count + 1, sum_size = sum_size + new.size The problem is that our application has periods where objects are being created at a rapid rate. Updates to the summary table slow down over time, and the slowdown is accompanied by very low CPU idle% as reported by vmstat. I found that if I have a thread which vacuums the summary table every few seconds (e.g. 30), then update times stay reasonable. We're starting to reconsider this approach, wondering if the vacuuming or the buildup of row versions (in those 30 seconds) is behind some variability in performance that we're seeing. To that end, we are measuring the impact of more frequent and less frequent vacuuming. I'll skip the details here since this is not the main point of this email. Another idea we had is to maintain object_count and sum_size in sequences. The trigger would use currval and setval to maintain the values. The problem with this is that when we try to examine the sequence on another connection, we can't use currval before bumping the value with nextval, which would interfere with maintenance of the totals. (The error message for a sequence s is "ERROR: s.currval is not yet defined in this session".) Yet another idea is to write C functions which maintain counts in shared memory, simulating what we can't quite do with sequences. I understand why asking for aggregate values computed over rapidly changing data sets is not a completely meaningful thing to do. Yet we have the requirement. Users of our product need to be able to look at object_count and sum_size to get a sense of how an object load is proceeding. This shouldn't be so difficult. I've seen, on this mailing list, the advice "don't do that, use a trigger" when someone want to do select count(*). But I haven't seen any discussion of the consequences of using a trigger. Can anyone recommend a good way to maintain aggregates using a trigger? Is vacuuming every few seconds really the only way to go? Jack Orenstein ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
pgsql-general by date: