Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Mark Woodward
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 18926.24.91.171.78.1151155408.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: vacuum, performance, and MVCC  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Re: vacuum, performance, and MVCC  ("Jochem van Dieten" <jochemd@gmail.com>)
Re: vacuum, performance, and MVCC  (Martijn van Oosterhout <kleptog@svana.org>)
Re: vacuum, performance, and MVCC  (Jan Wieck <JanWieck@Yahoo.com>)
List pgsql-hackers
> On Sat, 24 Jun 2006, Mark Woodward wrote:
>
>> I'm probably mistaken, but aren't there already forward references in
>> tuples to later versions? If so, I'm only sugesting reversing the order
>> and referencing the latest version.
>
> I thought I understood your idea, but now you lost me again. I thought
> what you want is that the older heap tuple has a pointer to the
> newer one. Which it already has, it's called t_ctid.

Perfect!
>
> Can you try to explain more carefully how the whole thing would work?
> What would an index tuple point to? What pointers would a heap tuple
> have? What would an index scan do to find the row version it's interested
> in? What exactly would an update do?


Since we already allocate space for some notion of linked list, then all
I'm suggesting is we reverse the order, sort of. Currently it looks like
this:

ver001->ver002->ver003->...-verN

That's what t_ctid does now, right? Well, that's sort of stupid. Why not
have it do this:

ver001->verN->...->ver003->ver002->|^---------------------------------/

This will speed up almost *all* queries when there are more than two
version of rows.

OK, here is the behavior of an update:
(1) Find the latest version of the row
(2) Duplicate row and modify as per plan
(3) Set the t_ctid of the new row to the last "latest"
(4) Set the t_ctid of the first row to that of the new row
(5) Attempt to index the row
(6) If the first version of the row is in the index already (ver001) Don't
modify the index, otherwise, add the new version (just as before)

When you vacuum, simply make the latest version (verN) the key row (ver001).

There are, no doubt, issues that need to be resolved (I can think of a
coouple off the top of my head), but overall I think it is workable and
don't think this will affect performance in the simple case and improve
performance in the cases where there are more than one or two version of a
row.


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: "Jonah H. Harris"
Date:
Subject: Re: vacuum, performance, and MVCC