Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Christopher Browne
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 87veqtbcq3.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
Responses Re: vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
Re: vacuum, performance, and MVCC  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
After a long battle with technology, pgsql@mohawksoft.com ("Mark Woodward"), an earthling, wrote:
>> Clinging to sanity, pgsql@mohawksoft.com ("Mark Woodward") mumbled into
>> her beard:
>>> We all know that PostgreSQL suffers performance problems when rows are
>>> updated frequently prior to a vacuum. The most serious example can be
>>> seen
>>> by using PostgreSQL as a session handler for a busy we site. You may
>>> have
>>> thousands or millions of active sessions, each being updated per page
>>> hit.
>>>
>>> Each time the record is updated, a new version is created, thus
>>> lengthening the "correct" version search each time row is accessed,
>>> until,
>>> of course, the next vacuum comes along and corrects the index to point
>>> to
>>> the latest version of the record.
>>>
>>> Is that a fair explanation?
>>
>> No, it's not.
>>
>> 1.  The index points to all the versions, until they get vacuumed out.
>
> It can't point to "all" versions, it points to the last "current" version
> as  updated by vacuum, or the first version of the row.

No, it points to *all* the versions.

Suppose I take a table with two rows:

INFO:  analyzing "public.test"
INFO:  "test": 1 pages, 2 rows sampled, 2 estimated total rows
VACUUM

Then, over and over, I remove and insert one entry with the same PK:

sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842550 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842551 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842552 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842553 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842554 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842555 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842556 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842557 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842558 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842559 1

Now, I vacuum it.

sample=# vacuum verbose analyze test;
INFO:  vacuuming "public.test"
INFO:  index "test_id_key" now contains 2 row versions in 2 pages
DETAIL:  10 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "test": removed 10 row versions in 1 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "test": found 10 removable, 2 nonremovable row versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.test"
INFO:  "test": 1 pages, 2 rows sampled, 2 estimated total rows
VACUUM

Notice that the index contained 10 versions of that one row.

It pointed to *ALL* the versions.

>> 2.  There may simultaneously be multiple "correct" versions.  The
>> notion that there is one version that is The Correct One is wrong, and
>> you need to get rid of that thought.
>
> Sorry, this is  misunderstanding. By "correct version search" it was
> implied "for this transaction." Later I mention finding the first row with
> a transaction lower than the current.

Ah.  Then you need for each transaction to spawn an index for each
table that excludes non-current values.

>>> If my assertion is fundimentally true, then PostgreSQL will always
>>> suffer performance penalties under a heavy modification load. Of
>>> course, tables with many inserts are not an issue, it is mainly
>>> updates. The problem is that there are classes of problems where
>>> updates are the primary operation.
>>
>> The trouble with your assertion is that it is true for *all* database
>> systems except for those whose only transaction mode is READ
>> UNCOMMITTED, where the only row visible is the "Latest" version.
>
> Not true. Oracle does not seem to exhibit this problem.

Oracle suffers a problem in this regard that PostgreSQL doesn't; in
Oracle, rollbacks are quite expensive, as "recovery" requires doing
extra work that PostgreSQL doesn't do.
-- 
output = ("cbbrowne" "@" "gmail.com")
http://cbbrowne.com/info/
Marriage means commitment. Of course, so does insanity. 


pgsql-hackers by date:

Previous
From: "Diogo Biazus"
Date:
Subject: xlog viewer proposal
Next
From: Andrew Dunstan
Date:
Subject: Re: CVS HEAD busted on Windows?