Home > mailing lists

Re: A long-running transaction - Mailing list pgsql-sql

From	Andrew Sullivan
Subject	Re: A long-running transaction
Date	April 11, 2007 15:49:45
Msg-id	20070411154923.GA22771@phlogiston.dyndns.org Whole thread Raw
In response to	Re: A long-running transaction (John Summerfield <postgres@herakles.homelinux.org>)
Responses	Re: A long-running transaction (John Summerfield <postgres@herakles.homelinux.org>)
List	pgsql-sql

Tree view

On Wed, Apr 11, 2007 at 10:23:34PM +0800, John Summerfield wrote:
> Each row's identified by a key, and the key doesn't change.

That doesn't matter.

> ADABAS would put the updated record right back where it came from, it

That isn't how PostgreSQL works.

I'm having a hard time laying my hands on the bits of the docs that
explain this (anyone?), but MVCC stands fo Multi-Version Concurrency
Control.  When you write a row, it has a version number.  That
version number tells other transactions what version of the row they
look at.  

Here's a _really over simplified_ explanation, at a conceptual level. 
[Note to experts: please don't correct details of how this is wrong. 
I know that.  But John's problem is that he doesn't have this
conceptually yet.  The following explanation has worked for me in the
past in getting the idea across first, so that the details can later
make sense.]

Suppose you have two transactions, t1 and t2.  They start at
approximately the same time.  I'm assuming they're READ COMMITTED.

Suppose t1 updates row R.  That version of R has a number R(t1).  

Suppose at the same time, t2 is looking at R.  The _version_ of R
that t2 sees is R(tn), where n<1.  This way, t2 does not have to
wait on t1 in order for t2 to proceed (unless t2 actually modifies R. 
Ignore that for the sake of this explanation, please).

The way this works is that, in marking R(t1), the system says "R(t1) is
valid for transactions that committed after t1".  Since t2 hasn't
committed, it sees the old version.  So when t1 commits, there are
_two_ versions of R -- R(t1) and R(tn),n<1.  When all transactions
have committed such that there is no (tn),n<1, then the row is marked
dead, and can be recovered by VACUUM (this is one of the main things
VACUUM does.  The idea is to move the housekeeping of concurrency
control out of the main transaction.  Oracle does something
conceptually similar, except using rollback segments, which is why
long-running transactions can exhaust the supply of rollback segments
on Oracle).

Now, you're thinking, "But this is all in one transaction!"  Yes, but
the implementation details make it impossible that t1 rewrite the
same row over and over again, so your rewriting of the same row again
and again actually is creating huge numbers of dead rows.  The
unfortunate thing is that you have to follow the whole dead-row chain
to find the currently live version of your row.

The upshot of this is that updating the same row over and over in one
transaction will make your transaction go slower on each round, until
you are nearly stopped.  That's why a lot of updates of the sort you
describe are in fact the worst case under Postgres.  EnterpriseDB
seems to have a proposal (which I believe is going to make 8.3) that
will mitigate but not completely solve some of this.

> user  :   1d  3:17:04.03  16.0%  page in :  131097310  disk 1:
> 3079516r20087664w
> nice  :       0:05:39.64   0.1%  page out:  197016649
> system:   2d 20:38:37.13  40.1%  page act:   87906251
> 
> That line above. The way I figure it the kernel's doing an enormous
> amount of work handling its buffers. 

No, I think it's doing an enormous amount of I/O, because it has to
keep looking for these new rows (and scanning over the old areas
while it's at it).  This is system time, if I recall correctly
through the haze that is now my understanding of Linux I/O, because
it's down at the I/O subsystem and can't actually be done by the user
program.  (Feel free to correct me on this.  But I still bet it's
dead rows.)

> Since all the work is within one transaction, there's no real need for
> postgresql to write data at all, unless it's running out of memory.

No, I'm sure this has to do with the multi-version row writing.  Your
alternatives are ok, but I'd be more tempted to commit a range of
data in a timestamped series, and then join on a subselect picking
the latest insert (and add another daemon to knock old values off the
bottom of the list).

> In both cases, I am using default values. Unless someone thinks there's
> clearly something to be gained by changing them, I won't.

The PostgreSQL instal defaults are there to get things started, not
to perform well.  There is a long-standing debate in the community
about whether that's a good idea (and the defaults are way more
sensible than they used to be), but in general you won't get really
good performance from the default configuration of a PostgreSQL
installation.  

Here is a possibly useful bit of reading to start with on tuning,
although it's getting long in the tooth:

http://www.powerpostgresql.com/Downloads/annotated_conf_80.html

Also

http://varlena.com/varlena/GeneralBits/Tidbits/perf.html

And the -performance list will probably help you more than this one.

Hope this helps.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."    --Damien Katz

pgsql-sql by date:

From: John Summerfield
Date: 11 April 2007, 14:24:32
Subject: Re: A long-running transaction

From: Karthikeyan Sundaram
Date: 11 April 2007, 20:31:06
Subject: Urgent help in bit_string data type

Re: A long-running transaction - Mailing list pgsql-sql

Previous

Next