Thread: Vacuum Blocking A Deleteion - Why?

Vacuum Blocking A Deleteion - Why?

From
Brad Nicholson
Date:
We have been having a problem with a long running vacuum on a table 
blocking deletes.  We have a log table on which the only activity is an 
on going series of deletions.  The problem is experienced on a slony 
replicated node, but I don't  think that slony is at fault here.

My question - why is a vacuum blocking a deletion?

Here's the information that I grabbed:

tldlogs=# SELECT * from pg_stat_activity order by query_start limit 1;
-[ RECORD 1 
]-+------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
-------------------------------------
datid         | 17142
datname       | tldlogs
procpid       | 1744920
usesysid      | 104
usename       | vacuser
current_query | VACUUM VERBOSE ANALYZE public.log_2;

query_start   | 2006-01-04 17:43:57.350527+00


tldlogs=# SELECT relation, pid, mode, granted from pg_locks where 
pid=1744920;relation |   pid   |           mode           | granted
----------+---------+--------------------------+---------   17578 | 1744920 | ShareUpdateExclusiveLock | t   17578 |
1744920| ShareUpdateExclusiveLock | t   17578 | 1744920 | AccessExclusiveLock      | t         | 1744920 |
ExclusiveLock           | t
 
(4 rows)

tldlogs=# SELECT relation, pid, mode, granted from pg_locks where 
relation=17578;relation |   pid   |           mode           | granted
----------+---------+--------------------------+---------   17578 | 1425438 | RowExclusiveLock         | f   17578 |
1744920| ShareUpdateExclusiveLock | t   17578 | 1744920 | ShareUpdateExclusiveLock | t   17578 | 1744920 |
AccessExclusiveLock     | t
 
(4 rows)

orglogs=# SELECT relname from pg_class where oid=17578; relname
-----------log_2
(1 row)

current_query is truncated  before the relevant delete from log_2

tldlogs=# SELECT * from pg_stat_activity where procpid=1425438;
-[ RECORD 1 

]-+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
datid         | 17142
datname       | tldlogs
procpid       | 1425438
usesysid      | 103
usename       | slony
current_query | insert into "public"."log_1" 
(trans_on,trans_by,client_trid,server_trid,object,command,xml,rc,server_id,_rserv_ts) 
values ('2006-01-04 
19:18:56+00','1234','abc123','987654321','1','4','<epp 
xmlns="urn:ietf:params:xml:ns:epp-1.0"    xml
query_start   | 2006-01-04 19:19:05.837785+00

tldlogs=# SELECT * from pgstattuple('log_2');
-[ RECORD 1 ]------+------------
table_len          | 33264033792
tuple_count        | 9464035
tuple_len          | 6896823100
tuple_percent      | 20.73
dead_tuple_count   | 33165
dead_tuple_len     | 20542608
dead_tuple_percent | 0.06
free_space         | 26050593840
free_percent       | 78.31

gdb backtrace of the vacuum process.

prompt>$ gdb attach 1744920
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-ibm-aix5.1.0.0"...attach: A file or 
directory in the path name does not exist..

Attaching to process 1744920
[Switching to Thread 1]
0xd01f1a74 in read () from /usr/lib/libc.a(shr.o)
(gdb) bt
#0  0xd01f1a74 in read () from /usr/lib/libc.a(shr.o)
#1  0x10039098 in FileRead (file=992, buffer=0x303c8480 "", 
amount=537051144) at fd.c:963
#2  0x100abc40 in mdread (reln=0x203931e0, blocknum=3583554, 
buffer=0x303c8480 "") at md.c:433
#3  0x100aaa64 in smgrread (which=-1, reln=0x203931e0, blocknum=3583554, 
buffer=0xffffffff <Address 0xffffffff out of bounds>) at smgr.c:316
#4  0x1002bac0 in ReadBufferInternal (reln=0x203931e0, blockNum=3583554, 
bufferLockHeld=0 '\0') at bufmgr.c:230
#5  0x1019c550 in count_nondeletable_pages (onerel=0x203931e0, 
vacrelstats=0x20325bc8) at vacuumlazy.c:901
#6  0x1019c2b0 in lazy_truncate_heap (onerel=0x203931e0, 
vacrelstats=0x20325bc8) at vacuumlazy.c:795
#7  0x1019b120 in lazy_vacuum_rel (onerel=0x203931e0, 
vacstmt=0xffffffff) at vacuumlazy.c:181
#8  0x10194694 in vacuum_rel (relid=540619232, vacstmt=0x20323a78, 
expected_relkind=-83 '\uffff') at vacuum.c:855
#9  0x10193af4 in vacuum (vacstmt=0x20323a78) at vacuum.c:290
#10 0x1017af84 in ProcessUtility (parsetree=0x20323a78, dest=0x203238f0, 
completionTag=0x2ff1fa30 "") at utility.c:831
#11 0x101758fc in PortalRunUtility (portal=0x20327b10, query=0x20323ab8, 
dest=0x203238f0, completionTag=0x2ff1fa30 "") at pquery.c:772
#12 0x10175c54 in PortalRunMulti (portal=0x20327b10, dest=0x203238f0, 
altdest=0x203238f0, completionTag=0x2ff1fa30 "") at pquery.c:836
#13 0x101751f4 in PortalRun (portal=0x20327b10, count=2147483647, 
dest=0x203238f0, altdest=0x203238f0, completionTag=0x2ff1fa30 "") at 
pquery.c:494
#14 0x10001ebc in exec_simple_query (query_string=0x203236e0 "VACUUM 
VERBOSE ANALYZE public.log_2;\n") at postgres.c:873
#15 0x10004a30 in PostgresMain (argc=4, argv=0x20270650, 
username=0x20270620 "molly") at postgres.c:2877
#16 0x10036f70 in BackendFork (port=0x202a5c28) at postmaster.c:2564
#17 0x10036908 in BackendStartup (port=0x202a5c28) at postmaster.c:2207
#18 0x10034bb0 in ServerLoop () at postmaster.c:1119
#19 0x100344b8 in PostmasterMain (argc=3, argv=0x2026e998) at 
postmaster.c:897
#20 0x10000688 in main (argc=3, argv=0x2ff22b18) at main.c:214
#21 0x10000204 in __start ()
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 1744920

prompt> ps 1425438    PID    TTY STAT  TIME COMMAND1425438      - A    13:38 postgres: slony tldlogs 32.85.68.213
DELETE
 
waiting

prompt> gdb attach 1425438
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-ibm-aix5.1.0.0"...attach: A file or 
directory in the path name does not exist..

Attaching to process 1425438
[Switching to Thread 1]
0xd02f7188 in semop () from /usr/lib/libc.a(shr.o)
(gdb) bt
#0  0xd02f7188 in semop () from /usr/lib/libc.a(shr.o)
#1  0x1004da54 in PGSemaphoreLock (sema=0x31024868, interruptOK=1 
'\001') at pg_sema.c:419
#2  0x1004c68c in ProcSleep (lockMethodTable=0xffffffff, lockmode=3, 
lock=0x3116ea68, proclock=0x3116fa28) at proc.c:681
#3  0x10050244 in WaitOnLock (lockmethod=-1, lockmode=3, 
lock=0x3116ea68, proclock=0x3116fa28) at lock.c:896
#4  0x1004fc40 in LockAcquire (lockmethod=1, locktag=0x3116ea68, 
xid=539538752, lockmode=3, dontWait=0 '\0') at lock.c:685
#5  0x10063ddc in LockRelation (relation=0x203a6a58, lockmode=3) at 
lmgr.c:135
#6  0x10027050 in relation_open (relationId=17578, lockmode=3) at 
heapam.c:462
#7  0x10027414 in heap_openrv (relation=0xffffffff, lockmode=-1) at 
heapam.c:610
#8  0x10142d8c in setTargetTable (pstate=0x204dc4c0, 
relation=0x202f98d0, inh=0 '\0', alsoSource=1 '\001') at parse_clause.c:139
#9  0x1009e00c in transformDeleteStmt (pstate=0x204dc4c0, 
stmt=0x203e81f0) at analyze.c:471
#10 0x1009ddc0 in transformStmt (pstate=0x204dc4c0, 
parseTree=0x2ff1f8f0, extras_before=0x2ff1f940, extras_after=0x20000ae4) 
at analyze.c:406
#11 0x1009d9ec in do_parse_analyze (parseTree=0xffffffff, 
pstate=0x204dc4c0) at analyze.c:239
#12 0x1009d7dc in parse_analyze (parseTree=0x203e81f0, paramTypes=0x0, 
numParams=0) at analyze.c:163
#13 0x1000188c in pg_analyze_and_rewrite (parsetree=0x203e81f0, 
paramTypes=0x0, numParams=0) at postgres.c:497
#14 0x10001df8 in exec_simple_query (   query_string=0x203b3030 "insert into \"public\".\"log_1\" 
(trans_on,trans_by,client_trid,server_trid,object,command,xml,rc,server_id,_rserv_ts) 
values ('2006-01-04 
19:18:56+00','1391','EpCbaefaEv1AUrC59v5G','124707143','1','4"...) at 
postgres.c:812
#15 0x10004a30 in PostgresMain (argc=4, argv=0x202706b0, 
username=0x20270620 "slony") at postgres.c:2877
#16 0x10036f70 in BackendFork (port=0x202a5c28) at postmaster.c:2564
#17 0x10036908 in BackendStartup (port=0x202a5c28) at postmaster.c:2207
#18 0x10034bb0 in ServerLoop () at postmaster.c:1119
#19 0x100344b8 in PostmasterMain (argc=3, argv=0x2026e998) at 
postmaster.c:897
#20 0x10000688 in main (argc=3, argv=0x2ff22b18) at main.c:214
#21 0x10000204 in __start ()
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 1425438

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.




Re: Vacuum Blocking A Deleteion - Why?

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> We have been having a problem with a long running vacuum on a table 
> blocking deletes.  We have a log table on which the only activity is an 
> on going series of deletions.  The problem is experienced on a slony 
> replicated node, but I don't  think that slony is at fault here.

> My question - why is a vacuum blocking a deletion?

The problem is evidently that it's got an exclusive lock on the table:

>     17578 | 1744920 | AccessExclusiveLock      | t

which is entirely unexpected behavior for plain VACUUM in any release
later than 7.1.  I have to think that slony does have something to do
with it ... would suggest asking the slony folk about this.
        regards, tom lane