Thread: long running transactions

long running transactions

From
Tobias Brox
Date:
While doing a verbose vacuum, I'm constantly hitting things like:

DETAIL:  3606 dead row versions cannot be removed yet.

I believe this is a problem, because I still do have some empty tables
requireing up to 3-400 ms just to check if the table is empty (see
thread "slow queue-like empty table").

If pg_stat_activity.query_start actually is the start time of the
transaction, then we've gotten rid of all the real long-running
transactions.  Then again, if pg_stat_activity.query_start actually was
the start time of the transaction, the attribute would have been called
pg_stat_activity.transaction_start, right?

Is there any way to find the longest running transaction?


Re: long running transactions

From
Tom Lane
Date:
Tobias Brox <tobias@nordicbet.com> writes:
> If pg_stat_activity.query_start actually is the start time of the
> transaction,

... but it isn't.

> Is there any way to find the longest running transaction?

Look in pg_locks to see the lowest-numbered transaction ID --- each
transaction will be holding exclusive lock on its own XID.  You can
correlate that back to pg_stat_activity via the PID.

            regards, tom lane

Re: long running transactions

From
Tobias Brox
Date:
[Tom Lane - Tue at 12:23:40PM -0400]
> Look in pg_locks to see the lowest-numbered transaction ID --- each
> transaction will be holding exclusive lock on its own XID.  You can
> correlate that back to pg_stat_activity via the PID.

Thanks a lot for the quick reply - I've already identified one
long-running transaction.

(I'm not allowed to order by xid, and not allowed to cast it to
anything, how come?)


Re: long running transactions

From
Tobias Brox
Date:
[Tobias Brox - Tue at 06:39:13PM +0200]
> Thanks a lot for the quick reply - I've already identified one
> long-running transaction.

belonging to autovacuum ... how come?


Re: long running transactions

From
Tom Lane
Date:
Tobias Brox <tobias@nordicbet.com> writes:
> [Tobias Brox - Tue at 06:39:13PM +0200]
>> Thanks a lot for the quick reply - I've already identified one
>> long-running transaction.

> belonging to autovacuum ... how come?

Blocked on someone else's lock, maybe?

            regards, tom lane

Re: long running transactions

From
Tobias Brox
Date:
[Tom Lane - Tue at 12:42:52PM -0400]
> > belonging to autovacuum ... how come?
>
> Blocked on someone else's lock, maybe?

hardly, the autovacuum is the only one having such a low transaction id,
and also the only one hanging around when waiting a bit and rechecking
the pg_locks table.

Re: long running transactions

From
Tom Lane
Date:
Tobias Brox <tobias@nordicbet.com> writes:
>> Blocked on someone else's lock, maybe?

> hardly, the autovacuum is the only one having such a low transaction id,
> and also the only one hanging around when waiting a bit and rechecking
> the pg_locks table.

Hmph.  Is the autovac process actually doing anything (strace would be
revealing)?  If not, can you attach to the autovac process with gdb and
get a stack trace to see where it's blocked?

            regards, tom lane

Re: long running transactions

From
Tobias Brox
Date:
[Tom Lane - Tue at 01:09:52PM -0400]
> Hmph.  Is the autovac process actually doing anything (strace would be
> revealing)?  If not, can you attach to the autovac process with gdb and
> get a stack trace to see where it's blocked?

Sorry ... I SIGINT'ed it, and now it's gone :-(  I thought reloading the
config would restart autovacuum.  Well, whatever, we still have the
nightly vacuum crontab.

Re: long running transactions

From
Tom Lane
Date:
Tobias Brox <tobias@nordicbet.com> writes:
> [Tom Lane - Tue at 01:09:52PM -0400]
>> Hmph.  Is the autovac process actually doing anything (strace would be
>> revealing)?  If not, can you attach to the autovac process with gdb and
>> get a stack trace to see where it's blocked?

> Sorry ... I SIGINT'ed it, and now it's gone :-(  I thought reloading the
> config would restart autovacuum.

It'll come back after the autovacuum naptime.  If it gets stuck again,
please investigate.

            regards, tom lane

Re: long running transactions

From
Tobias Brox
Date:
[Tom Lane - Tue at 01:18:27PM -0400]
> >> Hmph.  Is the autovac process actually doing anything (strace would be
> >> revealing)?

It's definitively doing something; mostly reading, but also some few
writes, semops and opens.

> If not, can you attach to the autovac process with gdb and
> >> get a stack trace to see where it's blocked?

(gdb) bt
#0  0xb7c599f8 in select () from /lib/tls/libc.so.6
#1  0x08253c53 in pg_usleep ()
#2  0x0812ee93 in vacuum_delay_point ()
#3  0x0812f2a5 in lazy_vacuum_rel ()
#4  0x0812ef7b in lazy_vacuum_rel ()
#5  0x0812b4b6 in vac_update_relstats ()
#6  0x0812a995 in vacuum ()
#7  0x0818d2ca in autovac_stopped ()
#8  0x0818ceae in autovac_stopped ()
#9  0x0818c848 in autovac_stopped ()
#10 0x0818c4e2 in autovac_start ()
#11 0x08192c11 in PostmasterMain ()
#12 0x08191dcf in PostmasterMain ()
#13 0x081541b1 in main ()

> It'll come back after the autovacuum naptime.  If it gets stuck again,
> please investigate.

It seems stuck, has had the same transid for a long while, and the
number of undeletable dead rows in our tables are increasing.


Re: long running transactions

From
Tom Lane
Date:
Tobias Brox <tobias@nordicbet.com> writes:
> (gdb) bt
> #0  0xb7c599f8 in select () from /lib/tls/libc.so.6
> #1  0x08253c53 in pg_usleep ()
> #2  0x0812ee93 in vacuum_delay_point ()
> #3  0x0812f2a5 in lazy_vacuum_rel ()
> #4  0x0812ef7b in lazy_vacuum_rel ()
> #5  0x0812b4b6 in vac_update_relstats ()

That doesn't look particularly blocked, and if you are seeing
reads/writes too, then it's doing something.

> It seems stuck, has had the same transid for a long while, and the
> number of undeletable dead rows in our tables are increasing.

Perhaps you have overly aggressive vacuum cost delay settings?

            regards, tom lane

Re: long running transactions

From
Tobias Brox
Date:
[Tom Lane - Tue at 02:04:55PM -0400]
> > It seems stuck, has had the same transid for a long while, and the
> > number of undeletable dead rows in our tables are increasing.
>
> Perhaps you have overly aggressive vacuum cost delay settings?

Perhaps, though I wouldn't expect it to sleep in the middle of a
transaction - and also, it really did seem to me that it's doing work
rather than only sleeping.

The transaction id for the vacuum process is the same now as when I
wrote the previous email, and the number of dead unremovable rows have
increased steadily.

The settings in effect are:

autovacuum_vacuum_cost_delay = 500
autovacuum_vacuum_cost_limit = 200


Re: long running transactions

From
Tom Lane
Date:
Tobias Brox <tobias@nordicbet.com> writes:
>> Perhaps you have overly aggressive vacuum cost delay settings?

> autovacuum_vacuum_cost_delay = 500
> autovacuum_vacuum_cost_limit = 200

Well, that's going to cause it to sleep half a second after every dozen
or so page I/Os.  I think you'd be well advised to reduce the delay.

            regards, tom lane

Re: long running transactions

From
Tobias Brox
Date:
[Tom Lane - Tue at 02:26:53PM -0400]
> > autovacuum_vacuum_cost_delay = 500
> > autovacuum_vacuum_cost_limit = 200
>
> Well, that's going to cause it to sleep half a second after every dozen
> or so page I/Os.  I think you'd be well advised to reduce the delay.

Modified it to 20/250, and it definitively helped.  Sorry for the
list verbosity; I should have been able to resolve this myself already
some 2-3 emails ago :-)  I wanted a "soft" introduction of autovac in
production, and assumed that it was better to begin with too much sleep
than too little!  Well, well.