Re: A long-running transaction - Mailing list pgsql-sql

From Andrew Sullivan
Subject Re: A long-running transaction
Date
Msg-id 20070413114919.GA31083@phlogiston.dyndns.org
Whole thread Raw
In response to Re: A long-running transaction  (John Summerfield <postgres@herakles.homelinux.org>)
Responses Re: A long-running transaction  (Andrew Sullivan <ajs@crankycanuck.ca>)
Re: A long-running transaction  (Joe <dev@freedomcircle.net>)
List pgsql-sql
On Fri, Apr 13, 2007 at 09:02:58AM +0800, John Summerfield wrote:
> 
> Why is this implementation preferable to not doing that?

Because this way, readers never wait for writers.  On most database
systems, that pending UPDATE would block anyone reading the row too,
even if they weren't going to write on it.  It's a trade-off.  It
happens to mean that the trade is bad for the thing you happen to be
doing, but since most database operations aren't bulk ones in the way
you're trying to achieve this, that's quite possibly a reasonable
trade off.

> Where is enterprisedb? enterprisedb.com seems to be held by a squatter
> in Kingston. About where they're playing an important cricket
> competition about now.

Google took me here in one:

http://www.enterprisedb.com

But anyway, their stuff is built on top of Pg, so going to their site
won't help you.

> I can see we need before and after, but why all those intermediate rows?

Because in the general case, you need the intermediate rows.  The
problem is that you'd have to write a special piece of code to catch
the case where nobody else can see the row that you're about to
expire, and that test isn't free.  Therefore, you do it the same way
any other row gets expired.

> Also, I don't see why (apparently) a sequential search is used; surely
> if all of these rows might be required, still a pointer to the last
> would be right? Or last two, if you need the ability to check the order.

No, it's not a seqscan.  It's following the chain of versions of the
row.  You don't know until you look at the actual row whether the
version of it you are looking at is valid for you.  There isn't any
other place to store that information.  (If you want someone who
really understands how all this works to explain it to you in more
accurate terms, you should ask the question on -hackers.  They'll
point you to the developers' docs that I can't seem to find right
now.)

> Is there a development version I can try, with this improvement in it?

Well, you could try using the current CVS HEAD, which is close to
feature freeze.  It'd sure be a service to the community, at least,
because we'd learn whether the proposed change fixes this sort of
case.  (But I wouldn't trust my data to the HEAD for real.  You said
you're not actually in production yet, though.)

> 1. For the first day or so, my observation was that the disk was not
> particularly busy.

That's completely consistent with the theory I have.  As the number
of dead tuples goes up, your disk activity will slowly get worse.

> At present I'm trying to clarify in my mind the nature of the problem.
> What I'm trying to do seems to me reasonable. I have some data, and I
> want it all in or none of it, so it fits the idea of a single transaction.
> 
> It might be that my demands exceed Postgresql's current capabilities,
> but by itself it doesn't make what I'm trying to do unreasonable.

No, it's not unreasonable, but it happens to be a pessimal case under
Postgres.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.    --H.W. Fowler


pgsql-sql by date:

Previous
From: "A. Kretschmer"
Date:
Subject: Re: How can I know if a row is Locked?
Next
From: Marcin Stępnicki
Date:
Subject: Re: How can I know if a row is Locked?