Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers
From | Mark Woodward |
---|---|
Subject | Re: vacuum, performance, and MVCC |
Date | |
Msg-id | 3936.216.145.49.15.1150998576.squirrel@mail.mohawksoft.com Whole thread Raw |
In response to | Re: vacuum, performance, and MVCC (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: vacuum, performance, and MVCC
|
List | pgsql-hackers |
> Christopher Browne <cbbrowne@acm.org> writes: >> After a long battle with technology, pgsql@mohawksoft.com ("Mark >> Woodward"), an earthling, wrote: >>> Not true. Oracle does not seem to exhibit this problem. > >> Oracle suffers a problem in this regard that PostgreSQL doesn't; in >> Oracle, rollbacks are quite expensive, as "recovery" requires doing >> extra work that PostgreSQL doesn't do. > > The Oracle design has got other drawbacks: if you need to access a row > version other than than the very latest, you need to go searching in the > rollback segments for it. This is slow (no index help) and creates > significant amounts of contention (since lots of processes are competing > to touch the rollback segments). But, it is all probability, in most cases, the VAST majority, older versions aren't much needed outside the concurency of of active transactions. > Plus there's the old bugaboo that > long-running transactions require indefinite amounts of rollback space, > and Oracle is apparently unable to enlarge that space on-the-fly. > (This last seems like a surmountable problem, but maybe there is some > non-obvious reason why it's hard.) Yea, Oracle has a million way to die. And when you think you know all one million, you find one million and one. > > Basically there's no free lunch: if you want the benefits of MVCC it's > going to cost you somewhere. In the Postgres design you pay by having > to do VACUUM pretty often for heavily-updated tables. I don't think > that decision is fundamentally wrong --- the attractive thing about it > is that the overhead is pushed out of the foreground query-processing > code paths. Under certain circumstances, it is a very poor design. Think of a single row table that keeps a scoreboard or a session table that keeps a limited number of rows that are updated very frequently. > We still have lots of work to do in making autovacuum > smarter, avoiding vacuuming parts of relations that have not changed, > and so on. But I have no desire to go over to an Oracle-style solution > instead. We can't beat them by trying to be like them, and we run no > small risk of falling foul of some of their patents if we do. I proposed having a "key row entry" for each logical row. The key row entry points to the latest version of the row. There, each row entry is a linked list, in descending order, of previous row versions. The vast majority of the time, the latest version will be the first version. It is only when you have a previously started long running or concurrent transaction will you ever look at previous versions. I'm not saying it is an easy slam dunk, as I can think of a few difficulties off the top of my head, but it would solve the steady degradation of performance between vacuums and, to a possibly lesser extent, the cost of updating a row in a heavily indexed table.
pgsql-hackers by date: