Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 1151073838.3828.40.camel@localhost.localdomain
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
List pgsql-hackers
Ühel kenal päeval, N, 2006-06-22 kell 13:49, kirjutas Mark Woodward:
> > Christopher Browne <cbbrowne@acm.org> writes:

> > Basically there's no free lunch: if you want the benefits of MVCC it's
> > going to cost you somewhere.  In the Postgres design you pay by having
> > to do VACUUM pretty often for heavily-updated tables.  I don't think
> > that decision is fundamentally wrong --- the attractive thing about it
> > is that the overhead is pushed out of the foreground query-processing
> > code paths.
> 
> Under certain circumstances, it is a very poor design. Think of a single
> row table that keeps a scoreboard or a session table that keeps a limited
> number of rows that are updated very frequently.

A single row table that every session updates is a really bad design on
any database, as it is a sure point of lock contention and thus removes
any possibility of concurrency.

But except for locking problems, it will perform really well when you
vacuum often enough :)

> > We still have lots of work to do in making autovacuum
> > smarter, avoiding vacuuming parts of relations that have not changed,
> > and so on.  But I have no desire to go over to an Oracle-style solution
> > instead.  We can't beat them by trying to be like them, and we run no
> > small risk of falling foul of some of their patents if we do.
> 
> I proposed having a "key row entry" for each logical row. The key row
> entry points to the latest version of the row. There, each row entry is a
> linked list, in descending order, of previous row versions. 

Do I understand right, that you are proposing a redesign of how indexing
works, by updating indexes in-place. 

How would older rows be found then by transactions needing to see
them ? 

Do you suggest reverting to seqscan when we see _any_ newer
transactions ?

Or if you want to have index pointing to "latest" row with each value in
indexed field, how would you find the last time this value was used ?

Don't tell me that you plan to trace the full update-chain on each
update.

Or would this new indexing mechanism be used only for non-changing key
fields ? How would you check for that ?

> The vast majority of the time, the latest version will be the first version. 

Not in a web scenario. In my experience more complicated web-pages tend
to produce lots of concurrent accesses.

> It is
> only when you have a previously started long running or concurrent
> transaction will you ever look at previous versions.
>
> I'm not saying it is an easy slam dunk, as I can think of a few
> difficulties off the top of my head, but it would solve the steady
> degradation of performance between vacuums and, to a possibly lesser
> extent, the cost of updating a row in a heavily indexed table.

VACUUMing often also solves the problem of "steady degradation of
performance between vacuums" :)

No need to be afraid of vacuum. Vacuum is your friend! Just learn to use
it right.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




pgsql-hackers by date:

Previous
From: Michael Meskes
Date:
Subject: Re: [CORE] GPL Source and Copyright Questions
Next
From: James Robinson
Date:
Subject: Webcluster session storage, was vacuum, performance, and MVCC