Re: [HACKERS] Re: indexing words slow - Mailing list pgsql-hackers
From | dg@illustra.com (David Gould) |
---|---|
Subject | Re: [HACKERS] Re: indexing words slow |
Date | |
Msg-id | 9803140648.AA05842@hawk.illustra.com Whole thread Raw |
In response to | Re: indexing words slow (Bruce Momjian <maillist@candle.pha.pa.us>) |
List | pgsql-hackers |
Bruce Momjian writes: > Just started running CLUSTER, and it appears to be taking forever. Does > not seem to be using the disk, but a lot of CPU. It appears to be > caught in an endless loop. ... > Attaching to program `/u/pg/bin/postgres', process 29755 > 0x94999 in WaitIO (buf=0x4c0ccc8, spinlock=0) at bufmgr.c:1107 > 1107 S_LOCK(&(buf->io_in_progress_lock)); > > --------------------------------------------------------------------------- > > Stepping to the next statement shows it is just hung here. > The backtrace shows: > > --------------------------------------------------------------------------- > > #0 0x9499b in WaitIO (buf=0x4c0ccc8, spinlock=0) at bufmgr.c:1107 > #1 0x94109 in BufferAlloc (reln=0x19df90, blockNum=11324, > foundPtr=0xefbfb903 "\001\030����=\t", bufferLockHeld=0) at bufmgr.c:400 > #2 0x93e91 in ReadBufferWithBufferLock (reln=0x19df90, blockNum=11324, > bufferLockHeld=0) at bufmgr.c:255 > #3 0x93dee in ReadBuffer (reln=0x19df90, blockNum=11324) at bufmgr.c:174 > #4 0xb28f in heap_fetch (relation=0x19df90, seeself=0 '\000', tid=0x1b2256, > b=0xefbfb974) at heapam.c:1050 > #5 0x303d5 in rebuildheap (OIDNewHeap=11466400, OIDOldHeap=6424406, > OIDOldIndex=11466368) at cluster.c:357 > #6 0x30111 in cluster (oldrelname=0x112790 "artist_fti", > oldindexname=0x10cdd0 "artist_fti_idx") at cluster.c:160 > #7 0xa29c1 in ProcessUtility (parsetree=0x1127b0, dest=Remote) > at utility.c:626 This is all just speculation from memory as I do not have the code in front of me, but here is what I think might be happening: heap_fetch -> ReadBuffer -> ReadBufferWithBufferLock -- trying to fetch the next row, needed to read a new page -> BufferAlloc -- to read a new page, we need a buffer, so grab one -- or possibly we found the buffer we were looking for -> WaitIO -- but the target buffer was already being read or written (by another process???) -> S_LOCK(&(buf->io_in_progress_lock)) -- so spin on the buffer in_progress lock waiting for the I/O to complete and clear the lock Apparently, the I/O completion is never seen so the lock is never cleared so this will wait forever. Since the system is stuck at this point, somebody dropped the ball on clearing the io_in_progress_lock spinlock. I can't tell without looking at the code (and probably not even then) if this is I/O initiated by the stuck process and just forgotten about, or if some other process was involved. Were there any other active backends while this was running? If so, did any of them exit abnormally (ie without cleaning up)? Is the stuck process holding any other spinlocks (like the bufmgr spinlock for instance)? (There is an array of held spinlocks in the Proc structure that gets set by SpinAcquire to record spinlocks owned by the current process). If so, is it possible that another process is somehow deadlocked with this one? eg: A has bufmgr spinlock, wants io_in_progress_lock spinlock B has io_in_progress_lock spinlock, wants bufmgr spinlock I hope this is useful... -dg ps: what is CLUSTER anyhow? I have a guess, but... David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - I realize now that irony has no place in business communications.
pgsql-hackers by date: