Thread: Mulit-Vesions and Vacuum

Mulit-Vesions and Vacuum

From
Anthony Berglas
Date:
Hello All,

Does anyone really know how this multi version concurrency really works?

The basic idea is simple, never update anything so that you can always get
read-consistent queries by reading behind the updates.  But the details are
missing in the docs.  In particular:-

1. How long do previous version hang arround?  As long as the oldest
transaction, or until a Vacuum?

2. How expensive are they?  Do the multi-version copies clutter the main
database blocks, or are they stored elsewhere?  And if I update one field of
a row, is the entire row copied or just the field?

3. What does Vacuum really do?  Will it interfere with open transactions, or
will it know to leave multi version copies for existing transactions?

PS.  I have done a few experiements on how the locking really works, see
http://www.SimpleORM.org/DBNotes.html.  Note the difference with Oracle,
which I think is better and easy to implement given the basic MV approach.

PPS. Design issues:-

1.  Why not just use the transaction log to implement MV?  Each record only
needs to store a pointer into the logs to the previous update.  And the logs
would log that pointer, forming a backward pointing linked list.  Most of
the time this will not be needed, or will refer to a very recently written
tail of the log which will already be cached in memory.   This approach
would put an end to Oracle's issus with running out of "Rollback Segments".
It would also put an end to Postgres performance problems on inserts and
updates -- why be slower than MySQL?  You need a transaction log anyway, so
no extra overhead should be added for MV unless a query actually needs to
read behind updated data.

2.  If you are going to go to the trouble of keeping multi versions for
locking, why not make them available for the application as well?  One of
the painful features to implement is audit trails of how records get to be
the way they are.  The database could do that automatically.

Re: Mulit-Vesions and Vacuum

From
nconway@klamath.dyndns.org (Neil Conway)
Date:
On Thu, Jul 18, 2002 at 06:43:03PM -0700, Anthony Berglas wrote:
> Does anyone really know how this multi version concurrency really works?

Yes, lots of people -- namely, those that read the documentation.

> 1. How long do previous version hang arround?  As long as the oldest
> transaction, or until a Vacuum?

I may as well quote the docs:

 In normal PostgreSQL operation, an UPDATE or DELETE of a row does not
 immediately remove the old tuple (version of the row). This approach is
 necessary to gain the benefits of multiversion concurrency control (see
 the User's Guide): the tuple must not be deleted while it is still
 potentially visible to other transactions. But eventually, an outdated
 or deleted tuple is no longer of interest to any transaction. The space
 it occupies must be reclaimed for reuse by new tuples, to avoid
 infinite growth of disk space requirements. This is done by running
 VACUUM.

(from http://developer.postgresql.org/docs/postgres/routine-vacuuming.html)

> 2. How expensive are they?  Do the multi-version copies clutter the main
> database blocks, or are they stored elsewhere?  And if I update one field of
> a row, is the entire row copied or just the field?

Sequential scans will need to read through old tuples.

In 7.2 and earlier versions, dead tuples remain in indexes; to determine
whether a tuple found from an index scan is visible, the backend needs
to check the heap tuple (since the index entry doesn't store transaction
visibility info).

In 7.3, Tom Lane wrote the code to allow index scans to ignore dead
tuples dynamically (without the need for a VACUUM) -- the dead index
tuple still exists, but the index scan ignores it. Search for "tuple
killing" in the pgsql-hackers archives for more information.

> 3. What does Vacuum really do?  Will it interfere with open transactions, or
> will it know to leave multi version copies for existing transactions?

This is from the same page I quoted above:

 Beginning in PostgreSQL 7.2, the standard form of VACUUM can run in
 parallel with normal database operations (selects, inserts, updates,
 deletes, but not changes to table schemas).

> PS.  I have done a few experiements on how the locking really works, see
> http://www.SimpleORM.org/DBNotes.html.

On that page, you note "We do not see any of the changes made by Session
One, including the phantoms. This is in contradiction to the PostgreSQL
documentation, but good." Where is the contradiction between the
behavior you observed, and the documentation? For example:

 Read Committed  is the default isolation level in PostgreSQL. When a
 transaction runs on this isolation level, a SELECT query sees only data
 committed before the query began; it never sees either uncommitted data
 or changes committed during query execution by concurrent transactions.
 (However, the SELECT does see the effects of previous updates executed
 within its own transaction, even though they are not yet committed.) In
 effect, a SELECT query sees a snapshot of the database as of the
 instant that that query begins to run. Notice that two successive
 SELECTs can see different data, even though they are within a single
 transaction, if other transactions commit changes during execution of
 the first SELECT.

(from
http://developer.postgresql.org/docs/postgres/xact-read-committed.html)

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: Mulit-Vesions and Vacuum

From
Martijn van Oosterhout
Date:
On Thu, Jul 18, 2002 at 06:43:03PM -0700, Anthony Berglas wrote:
> Hello All,
>
> Does anyone really know how this multi version concurrency really works?

Somewhat.

> The basic idea is simple, never update anything so that you can always get
> read-consistent queries by reading behind the updates.  But the details are
> missing in the docs.  In particular:-
>
> 1. How long do previous version hang arround?  As long as the oldest
> transaction, or until a Vacuum?

Tuples are valid for as long as a transaction can see them. Vacuum cleans
out any tuples that are no longer valid.

> 2. How expensive are they?  Do the multi-version copies clutter the main
> database blocks, or are they stored elsewhere?  And if I update one field of
> a row, is the entire row copied or just the field?

To store them elsewhere would imply copying them to a new place when you
update them, which could cause performance issues. The entire row is copied
and usually appended to the end of the table.

> 3. What does Vacuum really do?  Will it interfere with open transactions, or
> will it know to leave multi version copies for existing transactions?

Old vacuum used to interfere. There is the new non-blocking vacuum which can
cleanup a lot of stuff but not everything. VACUUM FULL is required to free
up disk-space.

> PS.  I have done a few experiements on how the locking really works, see
> http://www.SimpleORM.org/DBNotes.html.  Note the difference with Oracle,
> which I think is better and easy to implement given the basic MV approach.

If you want serialisable, then you should ask for it (SET TRANSACTION LEVEL
SERIALIZABLE or something like that). The default is not-serializable (as
you saw).

> 1.  Why not just use the transaction log to implement MV?  Each record only
> needs to store a pointer into the logs to the previous update.  And the logs
> would log that pointer, forming a backward pointing linked list.  Most of
> the time this will not be needed, or will refer to a very recently written
> tail of the log which will already be cached in memory.   This approach
> would put an end to Oracle's issus with running out of "Rollback Segments".
> It would also put an end to Postgres performance problems on inserts and
> updates -- why be slower than MySQL?  You need a transaction log anyway, so
> no extra overhead should be added for MV unless a query actually needs to
> read behind updated data.

This bit I'm less sure about. Each tuple does contain a pointer to the
previous incarnation, but I'm not exactly sure what it's used for. The
problem is that only some tuples are usually updated, most are left alone
and they are never moved (except maybe by vacuum). As for insert speed, I
can always saturate the disk subsystem doing straight inserts, so making it
faster doesn't seem too useful. The problem with updates is that you have to
read the old tuple first which ruins the disk performance. Though WAL helps
a lot here IIRC.

> 2.  If you are going to go to the trouble of keeping multi versions for
> locking, why not make them available for the application as well?  One of
> the painful features to implement is audit trails of how records get to be
> the way they are.  The database could do that automatically.

Look in the docs for Time Travel. It used to be supported but was tossed a
while ago since it could be more efficiently implemented using triggers and
it had a tendency to use a lot of disk space.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

Re: Mulit-Vesions and Vacuum

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
>> 3. What does Vacuum really do?  Will it interfere with open transactions, or
>> will it know to leave multi version copies for existing transactions?

> Old vacuum used to interfere. There is the new non-blocking vacuum which can
> cleanup a lot of stuff but not everything. VACUUM FULL is required to free
> up disk-space.

VACUUM has never removed any tuples that could be seen by open transactions.

> This bit I'm less sure about. Each tuple does contain a pointer to the
> previous incarnation, but I'm not exactly sure what it's used for.

Actually, the links run from tuple versions to newer tuple versions,
not older ones.  The purpose is to let UPDATE in READ COMMITTED mode
easily find the latest version of a tuple --- you just chain through
the forward links till you get to the end of the chain.  There isn't
any common operation that would want to chain backwards.

> Look in the docs for Time Travel. It used to be supported but was tossed a
> while ago since it could be more efficiently implemented using triggers and
> it had a tendency to use a lot of disk space.

There's been some discussion of re-allowing it as an optional feature...
but I'm not clear on how that would work.

            regards, tom lane