Why frequently updated tables are an issue - Mailing list pgsql-hackers

From pgsql@mohawksoft.com
Subject Why frequently updated tables are an issue
Date
Msg-id 15910.64.119.142.34.1086877809.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: sequences and "addval('myseq', value)"  (Christopher Browne <cbbrowne@acm.org>)
Responses Re: Why frequently updated tables are an issue
Re: Why frequently updated tables are an issue
Re: Why frequently updated tables are an issue
List pgsql-hackers
OK, the problem I am having with whole discussion, on several fronts, is
the idea of performance. If performance and consistent behavior were not
*important* issues to a project, a summary table would work fine, and I
could just vacuum frequently.

Currently a client needs to vacuum two summary tables at least once every
two seconds. The performace of the system slowly declines with each
summary update, until the next vacuum. After a vacuum, the transaction
comes in at about 7ms, it increases to about 35ms~50ms, then we vacuum and
we've back to 7ms. When we vacuumed every 30 seconds, it would sometimes
get up to whole seconds.

There is an important issue here. Yes, MVCC is good. I agree, and no one
is arguing against it in a general case, however, there are classes of
problems in which MVCC, or at least PostgreSQL's implementation of it, is
not the best solution.

There are two basic problems which are fundimental issues I've had with
PostgreSQL over the years: summary tables and session tables.

The summary tables take the place of a "select sum(col) from table" where
table is very small. The amount of vacuuming required and the steady
degradation of performance prior to each vacuum is a problem that could be
addressed by some global variable system.

The session table is a different issue, but has the same problems. You
have an active website, hundreds or thousands of hits a second, and you
want to manage sessions for this site. Sessions are created, updated many
times, and deleted. Performance degrades steadily until a vacuum. Vacuum
has to be run VERY frequently. Prior to lazy vacuum, this was impossible.

Both session tables and summary tables have another thing in common, they
are not vital data, they hold transitive state information. Yea, sure,
data integrity is important, but if you lose these values, you can either
recreate it or it isn't too important.

Why put that is a database at all? Because, in the case of sessions
especially, you need to access this information for other operations. In
the case of summary tables, OLAP usually needs to join or include this
info.

PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be done. I've suggested a number of
things, and admittedly they have all been pretty weak ideas, but they were
potentially workable.




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: simple_heap_update: tuple concurrently updated -- during INSERT
Next
From: Tom Lane
Date:
Subject: Re: I/O support for composite types