Thread: long running transactions
While doing a verbose vacuum, I'm constantly hitting things like: DETAIL: 3606 dead row versions cannot be removed yet. I believe this is a problem, because I still do have some empty tables requireing up to 3-400 ms just to check if the table is empty (see thread "slow queue-like empty table"). If pg_stat_activity.query_start actually is the start time of the transaction, then we've gotten rid of all the real long-running transactions. Then again, if pg_stat_activity.query_start actually was the start time of the transaction, the attribute would have been called pg_stat_activity.transaction_start, right? Is there any way to find the longest running transaction?
Tobias Brox <tobias@nordicbet.com> writes: > If pg_stat_activity.query_start actually is the start time of the > transaction, ... but it isn't. > Is there any way to find the longest running transaction? Look in pg_locks to see the lowest-numbered transaction ID --- each transaction will be holding exclusive lock on its own XID. You can correlate that back to pg_stat_activity via the PID. regards, tom lane
[Tom Lane - Tue at 12:23:40PM -0400] > Look in pg_locks to see the lowest-numbered transaction ID --- each > transaction will be holding exclusive lock on its own XID. You can > correlate that back to pg_stat_activity via the PID. Thanks a lot for the quick reply - I've already identified one long-running transaction. (I'm not allowed to order by xid, and not allowed to cast it to anything, how come?)
[Tobias Brox - Tue at 06:39:13PM +0200] > Thanks a lot for the quick reply - I've already identified one > long-running transaction. belonging to autovacuum ... how come?
Tobias Brox <tobias@nordicbet.com> writes: > [Tobias Brox - Tue at 06:39:13PM +0200] >> Thanks a lot for the quick reply - I've already identified one >> long-running transaction. > belonging to autovacuum ... how come? Blocked on someone else's lock, maybe? regards, tom lane
[Tom Lane - Tue at 12:42:52PM -0400] > > belonging to autovacuum ... how come? > > Blocked on someone else's lock, maybe? hardly, the autovacuum is the only one having such a low transaction id, and also the only one hanging around when waiting a bit and rechecking the pg_locks table.
Tobias Brox <tobias@nordicbet.com> writes: >> Blocked on someone else's lock, maybe? > hardly, the autovacuum is the only one having such a low transaction id, > and also the only one hanging around when waiting a bit and rechecking > the pg_locks table. Hmph. Is the autovac process actually doing anything (strace would be revealing)? If not, can you attach to the autovac process with gdb and get a stack trace to see where it's blocked? regards, tom lane
[Tom Lane - Tue at 01:09:52PM -0400] > Hmph. Is the autovac process actually doing anything (strace would be > revealing)? If not, can you attach to the autovac process with gdb and > get a stack trace to see where it's blocked? Sorry ... I SIGINT'ed it, and now it's gone :-( I thought reloading the config would restart autovacuum. Well, whatever, we still have the nightly vacuum crontab.
Tobias Brox <tobias@nordicbet.com> writes: > [Tom Lane - Tue at 01:09:52PM -0400] >> Hmph. Is the autovac process actually doing anything (strace would be >> revealing)? If not, can you attach to the autovac process with gdb and >> get a stack trace to see where it's blocked? > Sorry ... I SIGINT'ed it, and now it's gone :-( I thought reloading the > config would restart autovacuum. It'll come back after the autovacuum naptime. If it gets stuck again, please investigate. regards, tom lane
[Tom Lane - Tue at 01:18:27PM -0400] > >> Hmph. Is the autovac process actually doing anything (strace would be > >> revealing)? It's definitively doing something; mostly reading, but also some few writes, semops and opens. > If not, can you attach to the autovac process with gdb and > >> get a stack trace to see where it's blocked? (gdb) bt #0 0xb7c599f8 in select () from /lib/tls/libc.so.6 #1 0x08253c53 in pg_usleep () #2 0x0812ee93 in vacuum_delay_point () #3 0x0812f2a5 in lazy_vacuum_rel () #4 0x0812ef7b in lazy_vacuum_rel () #5 0x0812b4b6 in vac_update_relstats () #6 0x0812a995 in vacuum () #7 0x0818d2ca in autovac_stopped () #8 0x0818ceae in autovac_stopped () #9 0x0818c848 in autovac_stopped () #10 0x0818c4e2 in autovac_start () #11 0x08192c11 in PostmasterMain () #12 0x08191dcf in PostmasterMain () #13 0x081541b1 in main () > It'll come back after the autovacuum naptime. If it gets stuck again, > please investigate. It seems stuck, has had the same transid for a long while, and the number of undeletable dead rows in our tables are increasing.
Tobias Brox <tobias@nordicbet.com> writes: > (gdb) bt > #0 0xb7c599f8 in select () from /lib/tls/libc.so.6 > #1 0x08253c53 in pg_usleep () > #2 0x0812ee93 in vacuum_delay_point () > #3 0x0812f2a5 in lazy_vacuum_rel () > #4 0x0812ef7b in lazy_vacuum_rel () > #5 0x0812b4b6 in vac_update_relstats () That doesn't look particularly blocked, and if you are seeing reads/writes too, then it's doing something. > It seems stuck, has had the same transid for a long while, and the > number of undeletable dead rows in our tables are increasing. Perhaps you have overly aggressive vacuum cost delay settings? regards, tom lane
[Tom Lane - Tue at 02:04:55PM -0400] > > It seems stuck, has had the same transid for a long while, and the > > number of undeletable dead rows in our tables are increasing. > > Perhaps you have overly aggressive vacuum cost delay settings? Perhaps, though I wouldn't expect it to sleep in the middle of a transaction - and also, it really did seem to me that it's doing work rather than only sleeping. The transaction id for the vacuum process is the same now as when I wrote the previous email, and the number of dead unremovable rows have increased steadily. The settings in effect are: autovacuum_vacuum_cost_delay = 500 autovacuum_vacuum_cost_limit = 200
Tobias Brox <tobias@nordicbet.com> writes: >> Perhaps you have overly aggressive vacuum cost delay settings? > autovacuum_vacuum_cost_delay = 500 > autovacuum_vacuum_cost_limit = 200 Well, that's going to cause it to sleep half a second after every dozen or so page I/Os. I think you'd be well advised to reduce the delay. regards, tom lane
[Tom Lane - Tue at 02:26:53PM -0400] > > autovacuum_vacuum_cost_delay = 500 > > autovacuum_vacuum_cost_limit = 200 > > Well, that's going to cause it to sleep half a second after every dozen > or so page I/Os. I think you'd be well advised to reduce the delay. Modified it to 20/250, and it definitively helped. Sorry for the list verbosity; I should have been able to resolve this myself already some 2-3 emails ago :-) I wanted a "soft" introduction of autovac in production, and assumed that it was better to begin with too much sleep than too little! Well, well.