Re: Followup: vacuum'ing toast - Mailing list pgsql-performance

From Craig Ringer
Subject Re: Followup: vacuum'ing toast
Date
Msg-id 4AF3DCC8.5040302@postnewspapers.com.au
Whole thread Raw
In response to Re: Followup: vacuum'ing toast  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-performance
On 5/11/2009 10:28 PM, Greg Smith wrote:
> Craig Ringer wrote:
>> Hibernate is pretty well behaved with transaction management. In fact,
>> it's downright nuts about keeping transactions open for as short a
>> period of time as possible. It even implements its own row-versioning
>> based optimistic locking scheme (oplock) rather than relying on holding
>> a transaction open with row locks in the database.
>>
> It's probably more nuts than it needs to be with PostgreSQL as the
> backing store, since MVCC prevents some of the common sources of row
> locks from being needed.

I'm not sure about that personally. Much of the work it does is to avoid
holding an update lock on a row during "user think time". Instead of
stopping another transaction from jumping in between reading a record
and writing an updated copy, it detects when another transaction has got
in the way and aborts the loser of the race, which will usually retry in
some way. This issue applies just as much to PostgreSQL as any other
database, and is very hard to avoid if your problem forces you to write
code that reads a record, updates it in memory, then writes it back to
the DB instead of doing an in-place read-and-update.

That means that, as in SERIALIZABLE transactions, UPDATEs with hibernate
can fail and may need to be retried. On the other hand, it means that
transactions aren't blocked by a lock held by another transaction during
long periods of user inactivity.

It's the difference between:

BEGIN;
SELECT val1, val2 FROM blah WHERE id = 1 FOR UPDATE;
-- User ponders for half an hour before applying a change
-- Meanwhile, another transaction that has to update the same record
-- is blocked, and can't continue on to do other work. As it also holds
-- update locks on other records, if you're unlucky or the app's data
-- is highly interdependent then half the app lands up waiting for the
-- user to get back from lunch.
UPDATE blah SET val1 = something, val2 = somethingelse WHERE id = 1;
COMMIT;

and:

BEGIN;
SELECT val1, val2, version FROM blah WHERE id = 1;
COMMIT;
-- User ponders for half an hour before applying a change. Meanwhile,
-- someone else who hasn't gone for lunch updates the record,
-- incrementing the `version' field as well as tweaking the data fields.
BEGIN;
UPDATE blah SET val1 = something, val2 = somethingelse
WHERE id = 1, version = oldversion;
-- As rows matched, Hibernate knows the record has been deleted
-- or someone else updated it in the mean time. It aborts the
-- change by the until recently out-to-lunch user and the app informs
-- the user that
-- someone else has altered the record, so they'll have to check
-- if they still need to make their changes and possibly re-apply them.
-- (Or, if appropriate, the app it merges the two change sets and
-- auto-retries).
ROLLBACK;


Getting these two strategies to play well together in a DB used by
"optimistic locking" row-versioned users like Hibernate as well as apps
using conventional SQL DB locking isn't hard, by the way. I wrote
something up on it recently:

http://wiki.postgresql.org/wiki/Hibernate_oplocks

> Downthread it suggests there's still some confusion here, but everyone
> should be clear about one thing:  turning autocommit on is the first
> step down a road that usually leads to bad batch performance.

Normally I'd be in complete agreement with you. Batching things into
transactions not only improves performance, but it's necessary for
correctness unless much of what you're doing is pretty trivial.

The distinction here is that the ORM framework expects to manage
autocommit settings on the JDBC connection its self. In the case of use
of Hibernate via JPA, Hibernate will almost always have autocommit
disabled when doing work. It's just that the JPA implementation appears
to expect to receive connections with autocommit initially enabled, and
gets somewhat confused if that's not the case.

--
Craig Ringer

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Running some query in low priority
Next
From: Massan
Date:
Subject: Problem with database performance, Debian 4gb ram ?