Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Mark Woodward
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 3936.216.145.49.15.1150998576.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: vacuum, performance, and MVCC  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
> Christopher Browne <cbbrowne@acm.org> writes:
>> After a long battle with technology, pgsql@mohawksoft.com ("Mark
>> Woodward"), an earthling, wrote:
>>> Not true. Oracle does not seem to exhibit this problem.
>
>> Oracle suffers a problem in this regard that PostgreSQL doesn't; in
>> Oracle, rollbacks are quite expensive, as "recovery" requires doing
>> extra work that PostgreSQL doesn't do.
>
> The Oracle design has got other drawbacks: if you need to access a row
> version other than than the very latest, you need to go searching in the
> rollback segments for it.  This is slow (no index help) and creates
> significant amounts of contention (since lots of processes are competing
> to touch the rollback segments).

But, it is all probability, in most cases, the VAST majority, older
versions aren't much needed outside the concurency of of active
transactions.


> Plus there's the old bugaboo that
> long-running transactions require indefinite amounts of rollback space,
> and Oracle is apparently unable to enlarge that space on-the-fly.
> (This last seems like a surmountable problem, but maybe there is some
> non-obvious reason why it's hard.)

Yea, Oracle has a million way to die. And when you think you know all one
million, you find one million and one.

>
> Basically there's no free lunch: if you want the benefits of MVCC it's
> going to cost you somewhere.  In the Postgres design you pay by having
> to do VACUUM pretty often for heavily-updated tables.  I don't think
> that decision is fundamentally wrong --- the attractive thing about it
> is that the overhead is pushed out of the foreground query-processing
> code paths.

Under certain circumstances, it is a very poor design. Think of a single
row table that keeps a scoreboard or a session table that keeps a limited
number of rows that are updated very frequently.


> We still have lots of work to do in making autovacuum
> smarter, avoiding vacuuming parts of relations that have not changed,
> and so on.  But I have no desire to go over to an Oracle-style solution
> instead.  We can't beat them by trying to be like them, and we run no
> small risk of falling foul of some of their patents if we do.

I proposed having a "key row entry" for each logical row. The key row
entry points to the latest version of the row. There, each row entry is a
linked list, in descending order, of previous row versions. The vast
majority of the time, the latest version will be the first version. It is
only when you have a previously started long running or concurrent
transaction will you ever look at previous versions.

I'm not saying it is an easy slam dunk, as I can think of a few
difficulties off the top of my head, but it would solve the steady
degradation of performance between vacuums and, to a possibly lesser
extent, the cost of updating a row in a heavily indexed table.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: xlog viewer proposal
Next
From: Rod Taylor
Date:
Subject: Re: vacuum, performance, and MVCC