Re: Postgresql vs. aggregates - Mailing list pgsql-general

From jao@geophile.com
Subject Re: Postgresql vs. aggregates
Date
Msg-id 1086824183.40c79ef71eb86@geophile.com
Whole thread Raw
In response to Re: Postgresql vs. aggregates  ("Scott Marlowe" <smarlowe@qwest.net>)
Responses Re: Postgresql vs. aggregates  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Postgresql vs. aggregates  (Richard Huxton <dev@archonet.com>)
List pgsql-general
Quoting Scott Marlowe <smarlowe@qwest.net>:

> On Wed, 2004-06-09 at 16:32, jao@geophile.com wrote:
> > Then we tried creating triggers to maintain a table with object_count
> > and sum_size columns:
> >
> >     create table summary(object_count bigint, sum_size bigint)
> >
> > We populate the table with exactly one row, initialized to (0, 0).
> > A trigger runs an update like this:
> >
> >     update summary
> >     set object_count = object_count + 1,
> >         sum_size = sum_size + new.size
> >
> > The problem is that our application has periods where objects are
> > being created at a rapid rate. Updates to the summary table slow down
> > over time, and the slowdown is accompanied by very low CPU idle% as
> > reported by vmstat.
>
> I'm wondering if these data are time sensitive, and if so, how
> sensitive.  If they could be up to say a minute old or something, using
> a materialized view of some sort might work well here.  Or a temp table
> to hold all the incoming data until it needs to be dumped into the real
> table.

So once a minute I'd run a stored procedure to compute statistics
and copy rows to the main table? That might be doable, but then
queries would have to be to a view combining the main table and
the temp table.

Can you expand on the use of materialized views to solve this?

> Also, do you have a lot of parallel updates to the same rows, or are
> they all different rows?  If you're doing a lot of updating of the same
> rows over and over, any database is going to be somewhat slow, either in
> vacuuming like for postgresql, or locking as in row level locking
> databases.
>
> If they're all different, then this should be no different for
> postgresql than for a row locking database, since each tuple will only
> have two or three versions, instead of some smaller percentage of rows
> having hundreds of versions.

The summary table has exactly one row, and whenever an object
is added to the main table, this one summary row is updated. An
odd feature of my application is that there is only one object
creation going on at a time. The summary row would not be a concurrency
hotspot in a non-MVCC database system.

But that raises an interesting idea. Suppose that instead of one
summary row, I had, let's say, 1000. When my application creates
an object, I choose one summary row at random (or round-robin) and update
it. So now, instead of one row with many versions, I have 1000 with 1000x
fewer versions each. When I want object counts and sizes, I'd sum up across
the 1000 summary rows. Would that allow me to maintain performance
for summary updates with less frequent vacuuming?

I'd find all this much easier to reason about if I understood how
the versions of a row are organized and accessed. How does postgresql
locate the correct version of a row?

Jack Orenstein



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

pgsql-general by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: Postgresql vs. aggregates
Next
From: Bruce Momjian
Date:
Subject: Re: Solaris ecpg program doesn't work - pulling my hair