Why frequently updated tables are an issue - Mailing list pgsql-hackers
From | pgsql@mohawksoft.com |
---|---|
Subject | Why frequently updated tables are an issue |
Date | |
Msg-id | 15910.64.119.142.34.1086877809.squirrel@mail.mohawksoft.com Whole thread Raw |
In response to | Re: sequences and "addval('myseq', value)" (Christopher Browne <cbbrowne@acm.org>) |
Responses |
Re: Why frequently updated tables are an issue
Re: Why frequently updated tables are an issue Re: Why frequently updated tables are an issue |
List | pgsql-hackers |
OK, the problem I am having with whole discussion, on several fronts, is the idea of performance. If performance and consistent behavior were not *important* issues to a project, a summary table would work fine, and I could just vacuum frequently. Currently a client needs to vacuum two summary tables at least once every two seconds. The performace of the system slowly declines with each summary update, until the next vacuum. After a vacuum, the transaction comes in at about 7ms, it increases to about 35ms~50ms, then we vacuum and we've back to 7ms. When we vacuumed every 30 seconds, it would sometimes get up to whole seconds. There is an important issue here. Yes, MVCC is good. I agree, and no one is arguing against it in a general case, however, there are classes of problems in which MVCC, or at least PostgreSQL's implementation of it, is not the best solution. There are two basic problems which are fundimental issues I've had with PostgreSQL over the years: summary tables and session tables. The summary tables take the place of a "select sum(col) from table" where table is very small. The amount of vacuuming required and the steady degradation of performance prior to each vacuum is a problem that could be addressed by some global variable system. The session table is a different issue, but has the same problems. You have an active website, hundreds or thousands of hits a second, and you want to manage sessions for this site. Sessions are created, updated many times, and deleted. Performance degrades steadily until a vacuum. Vacuum has to be run VERY frequently. Prior to lazy vacuum, this was impossible. Both session tables and summary tables have another thing in common, they are not vital data, they hold transitive state information. Yea, sure, data integrity is important, but if you lose these values, you can either recreate it or it isn't too important. Why put that is a database at all? Because, in the case of sessions especially, you need to access this information for other operations. In the case of summary tables, OLAP usually needs to join or include this info. PostgreSQL's behavior on these cases is poor. I don't think anyone who has tried to use PG for this sort of thing will disagree, and yes it is getting better. Does anyone else consider this to be a problem? If so, I'm open for suggestions on what can be done. I've suggested a number of things, and admittedly they have all been pretty weak ideas, but they were potentially workable.
pgsql-hackers by date: