Home > mailing lists

Re: 8.3.0 Core with concurrent vacuum fulls - Mailing list pgsql-hackers

From	Gavin M. Roy
Subject	Re: 8.3.0 Core with concurrent vacuum fulls
Date	March 5, 2008 14:35:23
Msg-id	af1bce590803050727y13991498i445cd8a9c75f8be1@mail.gmail.com Whole thread Raw
In response to	Re: 8.3.0 Core with concurrent vacuum fulls (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: 8.3.0 Core with concurrent vacuum fulls
List	pgsql-hackers

Tree view

On Wed, Mar 5, 2008 at 10:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> In particular, if that's the problem, why has this not been seen before?
> The fact that it's going through heap_page_prune doesn't seem very
> relevant --- VACUUM FULL has certainly always had to invoke
> CacheInvalidateHeapTuple someplace or other. So I still want to see
> the deadlock report ... we at least need to know which tables are
> involved in the deadlock.

Actually, maybe it *has* been seen before. Gavin, are you in the habit
of running concurrent VACUUM FULLs on system catalogs, and if so have
you noted that they occasionally get deadlock failures?

Generally no, I've never noticed deadlocks before, but I'll go back and look at some of the other the machines.

> A separate line of thought is whether it's a good idea that
> heap_page_prune calls the inval code inside a critical section.
> That's what's turning an ordinary deadlock failure into a PANIC.
> Even without the possibility of having to do cache initialization,
> that's a lot of code to be invoking, and it has obvious failure
> modes (eg, out of memory for the new inval list item).

The more I think about this the more I don't like it. The critical
section in heap_page_prune is *way* too big. Aside from the inval
call, there are HeapTupleSatisfiesVacuum() calls, which could have
failures during attempted clog references.

The reason the critical section is so large is that we're manipulating
the contents of a shared buffer, and we don't want a failure to leave a
partially-modified page in the buffer. We could fix that if we were to
memcpy the page into local storage and do all the pruning work there.
Then the critical section would only surround copying the page back to
the buffer and writing the WAL record. Copying the page is a tad
annoying but heap_page_prune is an expensive operation anyway, and
I think we really are at too much risk of PANIC the way it's being done
now. Has anyone got a better idea?

regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 05 March 2008, 14:31:21
Subject: Re: 8.3.0 Core with concurrent vacuum fulls

From: "Gavin M. Roy"
Date: 05 March 2008, 14:39:53
Subject: Re: 8.3.0 Core with concurrent vacuum fulls

Re: 8.3.0 Core with concurrent vacuum fulls - Mailing list pgsql-hackers

Previous

Next