Thread: 8.3.0 Core with concurrent vacuum fulls
This morning I had a postgres 8.3 install core this morning while multiple vacuum fulls were taking place. I saved the core file, would anyone be interested in dissecting it? I've otherwise had no issues with this machine or pgsql install.
Gavin
Gavin M. Roy wrote: > This morning I had a postgres 8.3 install core this morning while multiple > vacuum fulls were taking place. I saved the core file, would anyone be > interested in dissecting it? I've otherwise had no issues with this machine > or pgsql install. Of course. Please post the backtrace. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
[gmr@db04 backup]$ cat /etc/redhat-release
CentOS release 4.4 (Final)
BINDIR = /usr/local/pgsql/bin
DOCDIR = /usr/local/pgsql/doc
INCLUDEDIR = /usr/local/pgsql/include
PKGINCLUDEDIR = /usr/local/pgsql/include
INCLUDEDIR-SERVER = /usr/local/pgsql/include/server
LIBDIR = /usr/local/pgsql/lib
PKGLIBDIR = /usr/local/pgsql/lib
LOCALEDIR =
MANDIR = /usr/local/pgsql/man
SHAREDIR = /usr/local/pgsql/share
SYSCONFDIR = /usr/local/pgsql/etc
PGXS = /usr/local/pgsql/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-ldap' '--with-perl' '--enable-integer-datetimes'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/usr/local/pgsql/lib'
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -ltermcap -lcrypt -ldl -lm
VERSION = PostgreSQL 8.3.0
(gdb) where
#0 0x0000003fe362e21d in raise () from /lib64/tls/libc.so.6
#1 0x0000003fe362fa1e in abort () from /lib64/tls/libc.so.6
#2 0x000000000063a2e3 in errfinish ()
#3 0x00000000005974c4 in DeadLockReport ()
#4 0x000000000059381f in LockAcquire ()
#5 0x0000000000592357 in LockRelationOid ()
#6 0x0000000000457255 in relation_open ()
#7 0x00000000004574c3 in heap_open ()
#8 0x000000000062cf41 in CatalogCacheInitializeCache ()
#9 0x000000000062dfad in PrepareToInvalidateCacheTuple ()
#10 0x000000000062e8c5 in CacheInvalidateHeapTuple ()
#11 0x000000000045c803 in heap_page_prune ()
#12 0x00000000005086cd in vacuum_rel ()
#13 0x00000000005096bb in vacuum ()
#14 0x00000000005a163b in PortalRunUtility ()
#15 0x00000000005a1714 in PortalRunMulti ()
#16 0x00000000005a1d30 in PortalRun ()
#17 0x000000000059f4b6 in PostgresMain ()
#18 0x00000000005760c0 in ServerLoop ()
#19 0x0000000000577770 in PostmasterMain ()
#20 0x000000000052fd3e in main ()
On Tue, Mar 4, 2008 at 11:35 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Of course. Please post the backtrace.Gavin M. Roy wrote:
> This morning I had a postgres 8.3 install core this morning while multiple
> vacuum fulls were taking place. I saved the core file, would anyone be
> interested in dissecting it? I've otherwise had no issues with this machine
> or pgsql install.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
"Gavin M. Roy" <gmr@myyearbook.com> writes: > (gdb) where > #0 0x0000003fe362e21d in raise () from /lib64/tls/libc.so.6 > #1 0x0000003fe362fa1e in abort () from /lib64/tls/libc.so.6 > #2 0x000000000063a2e3 in errfinish () > #3 0x00000000005974c4 in DeadLockReport () > #4 0x000000000059381f in LockAcquire () > #5 0x0000000000592357 in LockRelationOid () > #6 0x0000000000457255 in relation_open () > #7 0x00000000004574c3 in heap_open () > #8 0x000000000062cf41 in CatalogCacheInitializeCache () > #9 0x000000000062dfad in PrepareToInvalidateCacheTuple () > #10 0x000000000062e8c5 in CacheInvalidateHeapTuple () > #11 0x000000000045c803 in heap_page_prune () > #12 0x00000000005086cd in vacuum_rel () > #13 0x00000000005096bb in vacuum () > #14 0x00000000005a163b in PortalRunUtility () > #15 0x00000000005a1714 in PortalRunMulti () > #16 0x00000000005a1d30 in PortalRun () > #17 0x000000000059f4b6 in PostgresMain () > #18 0x00000000005760c0 in ServerLoop () > #19 0x0000000000577770 in PostmasterMain () > #20 0x000000000052fd3e in main () So what did DeadLockReport put in the server log before panic'ing? I'm wondering exactly why CatalogCacheInitializeCache is being called here --- seems like that should have been done long before we got to VACUUM. But it would be useful to know just what deadlock it saw. regards, tom lane
On Wed, Mar 5, 2008 at 8:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Gavin M. Roy" <gmr@myyearbook.com> writes: > > (gdb) where > > #0 0x0000003fe362e21d in raise () from /lib64/tls/libc.so.6 > > #1 0x0000003fe362fa1e in abort () from /lib64/tls/libc.so.6 > > #2 0x000000000063a2e3 in errfinish () > > #3 0x00000000005974c4 in DeadLockReport () > > #4 0x000000000059381f in LockAcquire () > > #5 0x0000000000592357 in LockRelationOid () > > #6 0x0000000000457255 in relation_open () > > #7 0x00000000004574c3 in heap_open () > > #8 0x000000000062cf41 in CatalogCacheInitializeCache () > > #9 0x000000000062dfad in PrepareToInvalidateCacheTuple () > > #10 0x000000000062e8c5 in CacheInvalidateHeapTuple () > > #11 0x000000000045c803 in heap_page_prune () > > #12 0x00000000005086cd in vacuum_rel () > > #13 0x00000000005096bb in vacuum () > > #14 0x00000000005a163b in PortalRunUtility () > > #15 0x00000000005a1714 in PortalRunMulti () > > #16 0x00000000005a1d30 in PortalRun () > > #17 0x000000000059f4b6 in PostgresMain () > > #18 0x00000000005760c0 in ServerLoop () > > #19 0x0000000000577770 in PostmasterMain () > > #20 0x000000000052fd3e in main () > > So what did DeadLockReport put in the server log before panic'ing? > > I'm wondering exactly why CatalogCacheInitializeCache is being called > here --- seems like that should have been done long before we got to > VACUUM. But it would be useful to know just what deadlock it saw. > Seems like its possible that the second phase of catalog cache initialization is not done when VACUUM FULL is called, especially if VACUUM FULL is the first command/query in the backend. InitCatalogCachePhase2() is called only if new cache file needs to be written. /* * Lastly, write out a new relcache cache file if one is needed. */ if (needNewCacheFile) { /* * Force all the catcaches to finish initializing and thereby open the * catalogs and indexes they use. This willpreload the relcache with * entries for all the most important system catalogs and indexes, so * that theinit file will be most useful for future backends. */ InitCatalogCachePhase2(); /* now write the file */ write_relcache_init_file(); } We haven't yet seen the deadlock message, but here is my theory of a possible deadlock scenario: Two backends try to vacuum full two different catalog tables. Each acquires an exclusive lock on the respective catalog relation. Then each try to initialize its own catalog cache. But to do that they need AccessShareLock on each other's table leading to a deadlock. Why not just unconditionally finish the phase 2 as part of InitPostgres ? I understand that we may end up initializing caches that we don't need in that session, but there might be other places where this deadlock is waiting to happen. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
On Wed, Mar 5, 2008 at 3:41 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > > > Two backends try to vacuum full two different catalog tables. Each acquires an > exclusive lock on the respective catalog relation. Then each try to > initialize its > own catalog cache. But to do that they need AccessShareLock on each other's > table leading to a deadlock. > Well I could reproduce the above mentioned deadlock scenario with two system relations. So that validates the theory. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > Why not just unconditionally finish the phase 2 as part of InitPostgres ? You're jumping to a patch before we even understand what's happening. In particular, if that's the problem, why has this not been seen before? The fact that it's going through heap_page_prune doesn't seem very relevant --- VACUUM FULL has certainly always had to invoke CacheInvalidateHeapTuple someplace or other. So I still want to see the deadlock report ... we at least need to know which tables are involved in the deadlock. A separate line of thought is whether it's a good idea that heap_page_prune calls the inval code inside a critical section. That's what's turning an ordinary deadlock failure into a PANIC. Even without the possibility of having to do cache initialization, that's a lot of code to be invoking, and it has obvious failure modes (eg, out of memory for the new inval list item). regards, tom lane
I wrote: > In particular, if that's the problem, why has this not been seen before? > The fact that it's going through heap_page_prune doesn't seem very > relevant --- VACUUM FULL has certainly always had to invoke > CacheInvalidateHeapTuple someplace or other. So I still want to see > the deadlock report ... we at least need to know which tables are > involved in the deadlock. Actually, maybe it *has* been seen before. Gavin, are you in the habit of running concurrent VACUUM FULLs on system catalogs, and if so have you noted that they occasionally get deadlock failures? > A separate line of thought is whether it's a good idea that > heap_page_prune calls the inval code inside a critical section. > That's what's turning an ordinary deadlock failure into a PANIC. > Even without the possibility of having to do cache initialization, > that's a lot of code to be invoking, and it has obvious failure > modes (eg, out of memory for the new inval list item). The more I think about this the more I don't like it. The critical section in heap_page_prune is *way* too big. Aside from the inval call, there are HeapTupleSatisfiesVacuum() calls, which could have failures during attempted clog references. The reason the critical section is so large is that we're manipulating the contents of a shared buffer, and we don't want a failure to leave a partially-modified page in the buffer. We could fix that if we were to memcpy the page into local storage and do all the pruning work there. Then the critical section would only surround copying the page back to the buffer and writing the WAL record. Copying the page is a tad annoying but heap_page_prune is an expensive operation anyway, and I think we really are at too much risk of PANIC the way it's being done now. Has anyone got a better idea? regards, tom lane
2008-03-04 05:45:47 EST [6698]: [1-1] LOG: process 6698 still waiting for AccessShareLock on relation 1247 of database 16385 after 1001.519 ms
2008-03-04 05:45:47 EST [6698]: [2-1] STATEMENT: VACUUM FULL autograph.autograph_creators
2008-03-04 05:46:28 EST [6730]: [1-1] LOG: process 6730 still waiting for AccessShareLock on relation 1247 of database 16385 after 1000.887 ms
2008-03-04 05:46:28 EST [6730]: [2-1] STATEMENT: VACUUM FULL lunchmoney.totals
2008-03-04 05:47:55 EST [3809]: [18-1] LOG: server process (PID 6742) was terminated by signal 6: Aborted
2008-03-04 05:47:55 EST [3809]: [19-1] LOG: terminating any other active server processes
2008-03-04 05:47:55 EST [6741]: [12-1] WARNING: terminating connection because of crash of another server process
On Tue, Mar 4, 2008 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Gavin M. Roy" <gmr@myyearbook.com> writes:So what did DeadLockReport put in the server log before panic'ing?
> (gdb) where
> #0 0x0000003fe362e21d in raise () from /lib64/tls/libc.so.6
> #1 0x0000003fe362fa1e in abort () from /lib64/tls/libc.so.6
> #2 0x000000000063a2e3 in errfinish ()
> #3 0x00000000005974c4 in DeadLockReport ()
> #4 0x000000000059381f in LockAcquire ()
> #5 0x0000000000592357 in LockRelationOid ()
> #6 0x0000000000457255 in relation_open ()
> #7 0x00000000004574c3 in heap_open ()
> #8 0x000000000062cf41 in CatalogCacheInitializeCache ()
> #9 0x000000000062dfad in PrepareToInvalidateCacheTuple ()
> #10 0x000000000062e8c5 in CacheInvalidateHeapTuple ()
> #11 0x000000000045c803 in heap_page_prune ()
> #12 0x00000000005086cd in vacuum_rel ()
> #13 0x00000000005096bb in vacuum ()
> #14 0x00000000005a163b in PortalRunUtility ()
> #15 0x00000000005a1714 in PortalRunMulti ()
> #16 0x00000000005a1d30 in PortalRun ()
> #17 0x000000000059f4b6 in PostgresMain ()
> #18 0x00000000005760c0 in ServerLoop ()
> #19 0x0000000000577770 in PostmasterMain ()
> #20 0x000000000052fd3e in main ()
I'm wondering exactly why CatalogCacheInitializeCache is being called
here --- seems like that should have been done long before we got to
VACUUM. But it would be useful to know just what deadlock it saw.
regards, tom lane
"Gavin M. Roy" <gmr@myyearbook.com> writes: > 2008-03-04 05:45:47 EST [6698]: [1-1] LOG: process 6698 still waiting for > AccessShareLock on relation 1247 of database 16385 after 1001.519 ms > 2008-03-04 05:45:47 EST [6698]: [2-1] STATEMENT: VACUUM FULL > autograph.autograph_creators > 2008-03-04 05:46:28 EST [6730]: [1-1] LOG: process 6730 still waiting for > AccessShareLock on relation 1247 of database 16385 after 1000.887 ms > 2008-03-04 05:46:28 EST [6730]: [2-1] STATEMENT: VACUUM FULL > lunchmoney.totals > 2008-03-04 05:47:55 EST [3809]: [18-1] LOG: server process (PID 6742) was > terminated by signal 6: Aborted > 2008-03-04 05:47:55 EST [3809]: [19-1] LOG: terminating any other active > server processes > 2008-03-04 05:47:55 EST [6741]: [12-1] WARNING: terminating connection > because of crash of another server process How annoying ... the PANIC message doesn't seem to have reached the log. elog.c is careful to fflush(stderr) before abort(), so that isn't supposed to happen. But it looks like you are using syslog for logging (correct?) so maybe this is a problem with the syslog implementation you're using. What's the platform exactly? I wonder if it'd be reasonable to put a closelog() call just before the abort() ... regards, tom lane
On Wed, Mar 5, 2008 at 10:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:Actually, maybe it *has* been seen before. Gavin, are you in the habit
> In particular, if that's the problem, why has this not been seen before?
> The fact that it's going through heap_page_prune doesn't seem very
> relevant --- VACUUM FULL has certainly always had to invoke
> CacheInvalidateHeapTuple someplace or other. So I still want to see
> the deadlock report ... we at least need to know which tables are
> involved in the deadlock.
of running concurrent VACUUM FULLs on system catalogs, and if so have
you noted that they occasionally get deadlock failures?
Generally no, I've never noticed deadlocks before, but I'll go back and look at some of the other the machines.
> A separate line of thought is whether it's a good idea thatThe more I think about this the more I don't like it. The critical
> heap_page_prune calls the inval code inside a critical section.
> That's what's turning an ordinary deadlock failure into a PANIC.
> Even without the possibility of having to do cache initialization,
> that's a lot of code to be invoking, and it has obvious failure
> modes (eg, out of memory for the new inval list item).
section in heap_page_prune is *way* too big. Aside from the inval
call, there are HeapTupleSatisfiesVacuum() calls, which could have
failures during attempted clog references.
The reason the critical section is so large is that we're manipulating
the contents of a shared buffer, and we don't want a failure to leave a
partially-modified page in the buffer. We could fix that if we were to
memcpy the page into local storage and do all the pruning work there.
Then the critical section would only surround copying the page back to
the buffer and writing the WAL record. Copying the page is a tad
annoying but heap_page_prune is an expensive operation anyway, and
I think we really are at too much risk of PANIC the way it's being done
now. Has anyone got a better idea?
regards, tom lane
On Wed, Mar 5, 2008 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Gavin M. Roy" <gmr@myyearbook.com> writes:> 2008-03-04 05:45:47 EST [6698]: [1-1] LOG: process 6698 still waiting forHow annoying ... the PANIC message doesn't seem to have reached the log.
> AccessShareLock on relation 1247 of database 16385 after 1001.519 ms
> 2008-03-04 05:45:47 EST [6698]: [2-1] STATEMENT: VACUUM FULL
> autograph.autograph_creators
> 2008-03-04 05:46:28 EST [6730]: [1-1] LOG: process 6730 still waiting for
> AccessShareLock on relation 1247 of database 16385 after 1000.887 ms
> 2008-03-04 05:46:28 EST [6730]: [2-1] STATEMENT: VACUUM FULL
> lunchmoney.totals
> 2008-03-04 05:47:55 EST [3809]: [18-1] LOG: server process (PID 6742) was
> terminated by signal 6: Aborted
> 2008-03-04 05:47:55 EST [3809]: [19-1] LOG: terminating any other active
> server processes
> 2008-03-04 05:47:55 EST [6741]: [12-1] WARNING: terminating connection
> because of crash of another server process
elog.c is careful to fflush(stderr) before abort(), so that isn't
supposed to happen. But it looks like you are using syslog for logging
(correct?) so maybe this is a problem with the syslog implementation
you're using. What's the platform exactly?
I wonder if it'd be reasonable to put a closelog() call just before
the abort() ...
regards, tom lane
I'm using stderr, emulated to look like syslog for pgfouine.
log_destination = 'stderr'
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = '/var/log/postgres/' # Directory where log files are written
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
On Wed, Mar 5, 2008 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Gavin M. Roy" <gmr@myyearbook.com> writes:> 2008-03-04 05:45:47 EST [6698]: [1-1] LOG: process 6698 still waiting forHow annoying ... the PANIC message doesn't seem to have reached the log.
> AccessShareLock on relation 1247 of database 16385 after 1001.519 ms
> 2008-03-04 05:45:47 EST [6698]: [2-1] STATEMENT: VACUUM FULL
> autograph.autograph_creators
> 2008-03-04 05:46:28 EST [6730]: [1-1] LOG: process 6730 still waiting for
> AccessShareLock on relation 1247 of database 16385 after 1000.887 ms
> 2008-03-04 05:46:28 EST [6730]: [2-1] STATEMENT: VACUUM FULL
> lunchmoney.totals
> 2008-03-04 05:47:55 EST [3809]: [18-1] LOG: server process (PID 6742) was
> terminated by signal 6: Aborted
> 2008-03-04 05:47:55 EST [3809]: [19-1] LOG: terminating any other active
> server processes
> 2008-03-04 05:47:55 EST [6741]: [12-1] WARNING: terminating connection
> because of crash of another server process
elog.c is careful to fflush(stderr) before abort(), so that isn't
supposed to happen. But it looks like you are using syslog for logging
(correct?) so maybe this is a problem with the syslog implementation
you're using. What's the platform exactly?
I wonder if it'd be reasonable to put a closelog() call just before
the abort() ...
regards, tom lane
The panic may have made it if this is what you were looking for:
2008-03-04 05:45:20 EST [6742]: [7-1] PANIC: deadlock detected
2008-03-04 05:58:33 EST [8751]: [3-1] PANIC: deadlock detected
(it cored twice before I lowered the concurrency of vacuums)
Tom Lane wrote: > The reason the critical section is so large is that we're manipulating > the contents of a shared buffer, and we don't want a failure to leave a > partially-modified page in the buffer. We could fix that if we were to > memcpy the page into local storage and do all the pruning work there. > Then the critical section would only surround copying the page back to > the buffer and writing the WAL record. Copying the page is a tad > annoying but heap_page_prune is an expensive operation anyway, and > I think we really are at too much risk of PANIC the way it's being done > now. Has anyone got a better idea? We could do the pruning in two phases: first figure out what to do without modifyng anything, outside critical-section, and then actually do it, inside critical section. Looking at heap_page_prune, we already collect information of what we did in the redirected/nowdead/nowunused arrays for WAL logging purposes. We could use that, but we would also have to teach heap_prune_chain to not step into tuples that we've already decided to remove. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Gavin M. Roy" <gmr@myyearbook.com> writes: > The panic may have made it if this is what you were looking for: > 2008-03-04 05:45:20 EST [6742]: [7-1] PANIC: deadlock detected > 2008-03-04 05:58:33 EST [8751]: [3-1] PANIC: deadlock detected That's what I expected to find, but where's the DETAIL for these? regards, tom lane
2008-03-04 05:45:20 EST [6742]: [7-1] PANIC: deadlock detected
2008-03-04 05:45:20 EST [6742]: [8-1] DETAIL: Process 6742 waits for AccessShareLock on relation 2619 of database 16385; blocked by process 6740.
Process 6740 waits for AccessShareLock on relation 1247 of database 16385; blocked by process 6742.
2008-03-04 05:45:20 EST [6742]: [9-1] STATEMENT: VACUUM FULL pg_catalog.pg_type
Sorry, been juggling too many things this morning!
On Wed, Mar 5, 2008 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Gavin M. Roy" <gmr@myyearbook.com> writes:> The panic may have made it if this is what you were looking for:That's what I expected to find, but where's the DETAIL for these?
> 2008-03-04 05:45:20 EST [6742]: [7-1] PANIC: deadlock detected
> 2008-03-04 05:58:33 EST [8751]: [3-1] PANIC: deadlock detected
regards, tom lane
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Tom Lane wrote: >> I think we really are at too much risk of PANIC the way it's being done >> now. Has anyone got a better idea? > We could do the pruning in two phases: first figure out what to do > without modifyng anything, outside critical-section, and then actually > do it, inside critical section. > Looking at heap_page_prune, we already collect information of what we > did in the redirected/nowdead/nowunused arrays for WAL logging purposes. That's a thought, but ... > We could use that, but we would also have to teach heap_prune_chain to > not step into tuples that we've already decided to remove. ... seems like this would require searching the aforementioned arrays for each tuple examined, which could turn into an O(N^2) problem. If there are many removable tuples it could easily end up slower than copying. [ thinks some more... ] I guess we could use a flag array dimensioned MaxHeapTuplesPerPage to mark already-processed tuples, so that you wouldn't need to search the existing arrays but just index into the flag array with the tuple's offsetnumber. I wonder if the logic could be restructured to avoid this by taking advantage of it being a two-pass process, instead of fighting it? But that'd probably be a bigger change than we'd want to risk back-patching. Since I'm the one complaining about the PANIC risk, I guess I should do the legwork here. regards, tom lane
"Gavin M. Roy" <gmr@myyearbook.com> writes: > 2008-03-04 05:45:20 EST [6742]: [7-1] PANIC: deadlock detected > 2008-03-04 05:45:20 EST [6742]: [8-1] DETAIL: Process 6742 waits for > AccessShareLock on relation 2619 of database 16385; blocked by process 6740. > Process 6740 waits for AccessShareLock on relation 1247 of database 16385; > blocked by process 6742. > 2008-03-04 05:45:20 EST [6742]: [9-1] STATEMENT: VACUUM FULL > pg_catalog.pg_type Hmm ... regression=# select 2619::regclass, 1247::regclass; regclass | regclass --------------+----------pg_statistic | pg_type (1 row) regression=# So presumably 6740 is doing a vac full on pg_statistic. There isn't really any need for these to conflict in cache initialization. It strikes me that we could fix most of the deadlock risk if we just switched the order of the two tests in the loop in PrepareToInvalidateCacheTuple; that is, don't force initialization of a catcache unless it's one we need to access right now. Then a VAC FULL is only going to be interested in initializing caches for the catalog it's vacuuming, which should be safe enough. We still need to deal with the excessive PANIC risk, but I think we have a plan for that now. regards, tom lane
"Gavin M. Roy" <gmr@myyearbook.com> writes: > On Wed, Mar 5, 2008 at 10:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Actually, maybe it *has* been seen before. Gavin, are you in the habit >> of running concurrent VACUUM FULLs on system catalogs, and if so have >> you noted that they occasionally get deadlock failures? > Generally no, I've never noticed deadlocks before, but I'll go back and > look at some of the other the machines. After digging a bit deeper, it seems that pre-8.2 releases wouldn't have been at risk for a deadlock here anyway, because CatalogCacheInitializeCache didn't lock the system catalog it was initializing a cache for. (That had *other* risks, but not this one.) So possibly the lack of prior reports is just because not too many people are in the habit of using concurrent VACUUM FULLs with late-model Postgres. I can reproduce the deadlock (though not the ensuing PANIC) in 8.2, so it's definitely not heap_page_prune's fault. regards, tom lane
On Wed, Mar 5, 2008 at 9:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > [ thinks some more... ] I guess we could use a flag array dimensioned > MaxHeapTuplesPerPage to mark already-processed tuples, so that you > wouldn't need to search the existing arrays but just index into the flag > array with the tuple's offsetnumber. > We can actually combine this and the page copying ideas. Instead of copying the entire page, we can just copy the line pointers array and work on the copy. ISTM that the only place where we change the tuple contents is when we collapse the redirected line pointers and that we can do at the end, on the original page. The last step which we run inside a critical section would then be just like invoking heap_xlog_clean with the information collected in the first pass. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > On Wed, Mar 5, 2008 at 9:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> [ thinks some more... ] I guess we could use a flag array dimensioned >> MaxHeapTuplesPerPage to mark already-processed tuples, so that you >> wouldn't need to search the existing arrays but just index into the flag >> array with the tuple's offsetnumber. > We can actually combine this and the page copying ideas. Instead of copying > the entire page, we can just copy the line pointers array and work on the copy. I think that just makes things more complex and fragile. I like Heikki's idea, in part because it makes the normal path and the WAL recovery path guaranteed to work alike. I'll attach my work-in-progress patch for this --- it doesn't do anything about the invalidation semantics problem but it does fix the critical-section-too-big problem. regards, tom lane
Attachment
Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: >> On Wed, Mar 5, 2008 at 9:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> [ thinks some more... ] I guess we could use a flag array dimensioned >>> MaxHeapTuplesPerPage to mark already-processed tuples, so that you >>> wouldn't need to search the existing arrays but just index into the flag >>> array with the tuple's offsetnumber. > >> We can actually combine this and the page copying ideas. Instead of copying >> the entire page, we can just copy the line pointers array and work on the copy. > > I think that just makes things more complex and fragile. I like > Heikki's idea, in part because it makes the normal path and the WAL > recovery path guaranteed to work alike. I'll attach my work-in-progress > patch for this --- it doesn't do anything about the invalidation > semantics problem but it does fix the critical-section-too-big problem. FWIW, the patch looks fine to me. By inspection; I didn't test it. I'm glad we got away with a single "marked" array. I was afraid we would need to consult the unused/redirected/dead arrays separately. Do you have a plan for the invalidation problem? I think we could just not remove the redirection line pointers in catalog tables. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > I'm glad we got away with a single "marked" array. I was afraid we would > need to consult the unused/redirected/dead arrays separately. Yeah, I was worried about that too. The fundamental reason why it's okay seems to be that redirects can only be the heads of HOT chains. Therefore, a newly marked tuple cannot be a member of any chain we'll need to scan later, no matter whether it is marked as newly redirected, dead, or unused. And so we can just ignore marked tuples in all cases. I set up the code to mark the redirection target too, but that is just a minor optimization AFAICS --- if we reach the redirection target later in the outer scan loop, we'd decide not to process it anyway. > Do you have a plan for the invalidation problem? I think we could just > not remove the redirection line pointers in catalog tables. Still thinking about it, but I agree that I'd rather make a small modification to VACUUM FULL than try to redesign cache invalidation. The trick with not removing redirect pointers would be to ensure we don't remove the redirection target. While the redirection target was presumably not DEAD when heap_page_prune looked at it, it seems possible that it is DEAD by the time vacuum.c looks. Another risk factor is trying to move the redirection target down to a lower page. regards, tom lane
On Thu, Mar 6, 2008 at 11:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think that just makes things more complex and fragile. I like > Heikki's idea, in part because it makes the normal path and the WAL > recovery path guaranteed to work alike. I'll attach my work-in-progress > patch for this --- it doesn't do anything about the invalidation > semantics problem but it does fix the critical-section-too-big problem. > The WIP patch looks good to me. I haven't yet tested it (will wait for the final version). The following pointer arithmetic caught my eye though. ! nunused = (end - nowunused); Shouldn't we typecast them to (char *) first ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > The WIP patch looks good to me. I haven't yet tested it (will wait for the > final version). The following pointer arithmetic caught my eye though. > ! nunused = (end - nowunused); > Shouldn't we typecast them to (char *) first ? No ... we want the number of OffsetNumbers, not the number of bytes. regards, tom lane