On Wed, Mar 5, 2008 at 10:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote: > In particular, if that's the problem, why has this not been seen before? > The fact that it's going through heap_page_prune doesn't seem very > relevant --- VACUUM FULL has certainly always had to invoke > CacheInvalidateHeapTuple someplace or other. So I still want to see > the deadlock report ... we at least need to know which tables are > involved in the deadlock.
Actually, maybe it *has* been seen before. Gavin, are you in the habit of running concurrent VACUUM FULLs on system catalogs, and if so have you noted that they occasionally get deadlock failures?
Generally no, I've never noticed deadlocks before, but I'll go back and look at some of the other the machines.
> A separate line of thought is whether it's a good idea that > heap_page_prune calls the inval code inside a critical section. > That's what's turning an ordinary deadlock failure into a PANIC. > Even without the possibility of having to do cache initialization, > that's a lot of code to be invoking, and it has obvious failure > modes (eg, out of memory for the new inval list item).
The more I think about this the more I don't like it. The critical section in heap_page_prune is *way* too big. Aside from the inval call, there are HeapTupleSatisfiesVacuum() calls, which could have failures during attempted clog references.
The reason the critical section is so large is that we're manipulating the contents of a shared buffer, and we don't want a failure to leave a partially-modified page in the buffer. We could fix that if we were to memcpy the page into local storage and do all the pruning work there. Then the critical section would only surround copying the page back to the buffer and writing the WAL record. Copying the page is a tad annoying but heap_page_prune is an expensive operation anyway, and I think we really are at too much risk of PANIC the way it's being done now. Has anyone got a better idea?