Thread: Followup: vacuum'ing toast

Followup: vacuum'ing toast

From
Dave Crooke
Date:
Thanks folks for the quick replies.

1. There is one transaction, connected from the JVM, that is showing
"IDLE in transaction" .... this appears to be a leftover from
Hibernate looking at the schema metadata. It's Apache Jackrabbit, not
our own code:

hyper9test_1_6=# select c.relname, l.* from pg_class c, pg_locks l
where c.relfilenode=l.relation and l.pid in (select procpid from
pg_stat_activity where current_query='<IDLE> in transaction');
          relname           | locktype | database | relation | page |
tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction | pid  |      mode       | granted

----------------------------+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+-----------------+---------
 pg_class_oid_index         | relation |   280066 |     2662 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 pg_class_relname_nsp_index | relation |   280066 |     2663 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 pg_description_o_c_o_index | relation |   280066 |     2675 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 pg_namespace_nspname_index | relation |   280066 |     2684 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 pg_namespace_oid_index     | relation |   280066 |     2685 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 pg_class                   | relation |   280066 |     1259 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 pg_description             | relation |   280066 |     2609 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 pg_namespace               | relation |   280066 |     2615 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 version_node               | relation |   280066 |   493309 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
 version_node_idx           | relation |   280066 |   493315 |      |
     |            |               |         |       |          | 3/18
             | 8069 | AccessShareLock | t
(10 rows)

Since the Jackrabbit tables are in the same namespace / user / schema
as ours, am I right in thinking that this is effectively blocking the
entire auto-vaccum system from doing anything at all?

Cheers
Dave

Re: Followup: vacuum'ing toast

From
Greg Smith
Date:
Dave Crooke wrote:
> Since the Jackrabbit tables are in the same namespace / user / schema
> as ours, am I right in thinking that this is effectively blocking the
> entire auto-vaccum system from doing anything at all?
>
Yes, but the problem is actually broader than that:  it wouldn't matter
if it was a different user or namespace, the impact would still be the
same.  PostgreSQL gets rid of needing to hold a bunch of table/row locks
by using an approach called MVCC:
http://www.postgresql.org/docs/8.4/static/mvcc-intro.html

The biggest downside of that approach is that if you have an old client
lingering around, things that happened in the database after it started
can't be cleaned up.  That client might still be referring to the old
copy of that data, so that anything it looks at will be a consistent
snapshot that includes the earlier version of the rows, the database is
paranoid about letting VACUUM clean the things you've deleted up.

In 8.4 this situation is improved for some common use cases.  In the 8.3
you're using, an old transaction will block any VACUUM attempt from
moving past that point in time forever.  You have to figure out how to
get Hibernate to close the transaction it's leaving open for VACUUM to work.

--
Greg Smith    greg@2ndquadrant.com    Baltimore, MD


Re: Followup: vacuum'ing toast

From
Craig Ringer
Date:
Greg Smith wrote:

> The biggest downside of [MVCC] is that if you have an old client
> lingering around, things that happened in the database after it started
> can't be cleaned up.

Just to clarify for readers: Idle clients aren't generally an issue.
It's only clients that are idle with an open transaction that tend to
cause issues.

> In 8.4 this situation is improved for some common use cases.  In the 8.3
> you're using, an old transaction will block any VACUUM attempt from
> moving past that point in time forever.  You have to figure out how to
> get Hibernate to close the transaction it's leaving open for VACUUM to
> work.

Hibernate is pretty well behaved with transaction management. In fact,
it's downright nuts about keeping transactions open for as short a
period of time as possible. It even implements its own row-versioning
based optimistic locking scheme (oplock) rather than relying on holding
a transaction open with row locks in the database.

If you have connections left idle in transaction by a Hibernate-based
Java app, the problem is probably:

1) Unclosed sessions / EntityManagers or explicit transactions in your
own app code. Check particularly for places where the app may open a
transaction without a finally clause on a try block to ensure the
transaction (and the Session / EntityManager) are closed when the block
is exited.

2) Connections being returned to the connection pool with open
transactions ( probably due to #1 ). The connection pool should take
care of that, but reports suggest that some don't.

3) Autocommit being disabled. At least when using Hibernate via JPA,
that'll cause a major mess and would easily explain the issues you're
seeing. Hibernate manages transactions explicitly when required, and
expects autocommit to be off.

3) Your connection pool software doing something crazy like
intentionally keeping idle connections with transactions open. The
connection pool (c3p0 or whatever) that you use is separate from
Hibernate. I'd be surprised to see this except if autocommit was
disabled and the pooling software expected/assumed it'd be enabled.

--
Craig Ringe

Re: Followup: vacuum'ing toast

From
Eduardo Morras
Date:
>
>3) Autocommit being disabled. At least when using Hibernate via JPA,
>that'll cause a major mess and would easily explain the issues you're
>seeing. Hibernate manages transactions explicitly when required, and
>expects autocommit to be off.

Excuse me but i don't understand this point. You say that the problem happens if Autocommit is disabled but that
hibernatesexpects that Autocommit is disabled for a correct work. What's better then? Autocommit off for Hibernate or
AutocommitOn for the original poster problem. Can you explain it more? 

Thanks

>--
>Craig Ringe


--------------------------------
Eduardo Morrás González
Dept. I+D+i e-Crime Vigilancia Digital
S21sec Labs
Tlf: +34 902 222 521
Móvil: +34 555 555 555
www.s21sec.com, blog.s21sec.com


Salvo que se indique lo contrario, esta información es CONFIDENCIAL y
contiene datos de carácter personal que han de ser tratados conforme a la
legislación vigente en materia de protección de datos. Si usted no es
destinatario original de este mensaje, le comunicamos que no está autorizado
a revisar, reenviar, distribuir, copiar o imprimir la información en él
contenida y le rogamos que proceda a borrarlo de sus sistemas.

Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO
informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen
datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta
honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula
bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko.
Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea.

Antes de imprimir este mensaje valora si verdaderamente es necesario. De
esta forma contribuimos a la preservación del Medio Ambiente.


Re: Followup: vacuum'ing toast

From
Craig Ringer
Date:
Eduardo Morras wrote:
>> 3) Autocommit being disabled. At least when using Hibernate via JPA,
>> that'll cause a major mess and would easily explain the issues you're
>> seeing. Hibernate manages transactions explicitly when required, and
>> expects autocommit to be off.
>
> Excuse me but i don't understand this point. You say that the problem happens if Autocommit is disabled but that
hibernatesexpects that Autocommit is disabled for a correct work. What's better then? Autocommit off for Hibernate or
AutocommitOn for the original poster problem. Can you explain it more? 

Argh!

I'm really sorry. I meant that Hibernate expects autocommit to be
_enabled_. However, when I want back to the documentation to
double-check my understanding:

http://docs.jboss.org/hibernate/core/3.3/reference/en/html/session-configuration.html

it reads:

"hibernate.connection.autocommit: Enables autocommit for JDBC pooled
connections (it is not recommended).
e.g. true | false"

... so now I'm confused too. I *know* I had issues initially when I
disabled autocommit explicitly (particularly with explicit transaction
management), and that everything works well in my app with autocommit
off via the hibernate.connection.autocommit param, but that appears to
conflict with the documentation.  Maybe it's different when hibernate is
used via the JPA APIs as I'm using it?

Thanks for checking that. I guess now I'm confused too, but that's
better than "knowing" something wrong.

--
Craig Ringer

Re: Followup: vacuum'ing toast

From
Craig Ringer
Date:
Craig Ringer wrote:
> Eduardo Morras wrote:
>>> 3) Autocommit being disabled. At least when using Hibernate via JPA,
>>> that'll cause a major mess and would easily explain the issues you're
>>> seeing. Hibernate manages transactions explicitly when required, and
>>> expects autocommit to be off.
>> Excuse me but i don't understand this point. You say that the problem happens if Autocommit is disabled but that
hibernatesexpects that Autocommit is disabled for a correct work. What's better then? Autocommit off for Hibernate or
AutocommitOn for the original poster problem. Can you explain it more? 
>
> Argh!
>
> I'm really sorry. I meant that Hibernate expects autocommit to be
> _enabled_.

Some searching suggests that, indeed, the issue is that when using
Hibernate via JPA (Hibernate EntityManager) autocommit needs to be left
enabled.

For some reason there doesn't seem to be any explicit reference to
autocommit in the Hibernate EntityManager docs or the EJB3 spec. I can't
find anything but (numerous) forum posts and the like on this, so don't
take it as definitive.

--
Craig Ringer

Re: Followup: vacuum'ing toast

From
Greg Smith
Date:
Craig Ringer wrote:
> Hibernate is pretty well behaved with transaction management. In fact,
> it's downright nuts about keeping transactions open for as short a
> period of time as possible. It even implements its own row-versioning
> based optimistic locking scheme (oplock) rather than relying on holding
> a transaction open with row locks in the database.
>
It's probably more nuts than it needs to be with PostgreSQL as the
backing store, since MVCC prevents some of the common sources of row
locks from being needed.  But since Hibernate is database-agnostic and
it worried about locally cached copies of things too, it ends up needing
to do this extra work regardless.

> 3) Autocommit being disabled. At least when using Hibernate via JPA,
> that'll cause a major mess and would easily explain the issues you're
> seeing. Hibernate manages transactions explicitly when required, and
> expects autocommit to be off.
>
Downthread it suggests there's still some confusion here, but everyone
should be clear about one thing:  turning autocommit on is the first
step down a road that usually leads to bad batch performance.  If your
problems go away by enabling it, which they sometimes do, that is a sign
there's a problem to be investigated, not a true solution.  One day
you're going to find yourself wanting transactions to be explicitly
committed only when required, both for atomicity and performance
reasons, and you won't be able to rely on autocommit as a crutch at that
point.  Better to never get used to be there in the first place.

--
Greg Smith    greg@2ndquadrant.com    Baltimore, MD

Re: Followup: vacuum'ing toast

From
Craig Ringer
Date:
On 5/11/2009 10:28 PM, Greg Smith wrote:
> Craig Ringer wrote:
>> Hibernate is pretty well behaved with transaction management. In fact,
>> it's downright nuts about keeping transactions open for as short a
>> period of time as possible. It even implements its own row-versioning
>> based optimistic locking scheme (oplock) rather than relying on holding
>> a transaction open with row locks in the database.
>>
> It's probably more nuts than it needs to be with PostgreSQL as the
> backing store, since MVCC prevents some of the common sources of row
> locks from being needed.

I'm not sure about that personally. Much of the work it does is to avoid
holding an update lock on a row during "user think time". Instead of
stopping another transaction from jumping in between reading a record
and writing an updated copy, it detects when another transaction has got
in the way and aborts the loser of the race, which will usually retry in
some way. This issue applies just as much to PostgreSQL as any other
database, and is very hard to avoid if your problem forces you to write
code that reads a record, updates it in memory, then writes it back to
the DB instead of doing an in-place read-and-update.

That means that, as in SERIALIZABLE transactions, UPDATEs with hibernate
can fail and may need to be retried. On the other hand, it means that
transactions aren't blocked by a lock held by another transaction during
long periods of user inactivity.

It's the difference between:

BEGIN;
SELECT val1, val2 FROM blah WHERE id = 1 FOR UPDATE;
-- User ponders for half an hour before applying a change
-- Meanwhile, another transaction that has to update the same record
-- is blocked, and can't continue on to do other work. As it also holds
-- update locks on other records, if you're unlucky or the app's data
-- is highly interdependent then half the app lands up waiting for the
-- user to get back from lunch.
UPDATE blah SET val1 = something, val2 = somethingelse WHERE id = 1;
COMMIT;

and:

BEGIN;
SELECT val1, val2, version FROM blah WHERE id = 1;
COMMIT;
-- User ponders for half an hour before applying a change. Meanwhile,
-- someone else who hasn't gone for lunch updates the record,
-- incrementing the `version' field as well as tweaking the data fields.
BEGIN;
UPDATE blah SET val1 = something, val2 = somethingelse
WHERE id = 1, version = oldversion;
-- As rows matched, Hibernate knows the record has been deleted
-- or someone else updated it in the mean time. It aborts the
-- change by the until recently out-to-lunch user and the app informs
-- the user that
-- someone else has altered the record, so they'll have to check
-- if they still need to make their changes and possibly re-apply them.
-- (Or, if appropriate, the app it merges the two change sets and
-- auto-retries).
ROLLBACK;


Getting these two strategies to play well together in a DB used by
"optimistic locking" row-versioned users like Hibernate as well as apps
using conventional SQL DB locking isn't hard, by the way. I wrote
something up on it recently:

http://wiki.postgresql.org/wiki/Hibernate_oplocks

> Downthread it suggests there's still some confusion here, but everyone
> should be clear about one thing:  turning autocommit on is the first
> step down a road that usually leads to bad batch performance.

Normally I'd be in complete agreement with you. Batching things into
transactions not only improves performance, but it's necessary for
correctness unless much of what you're doing is pretty trivial.

The distinction here is that the ORM framework expects to manage
autocommit settings on the JDBC connection its self. In the case of use
of Hibernate via JPA, Hibernate will almost always have autocommit
disabled when doing work. It's just that the JPA implementation appears
to expect to receive connections with autocommit initially enabled, and
gets somewhat confused if that's not the case.

--
Craig Ringer