Nasty bug in heap_page_prune - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Nasty bug in heap_page_prune |
Date | |
Msg-id | 9319.1204766626@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Nasty bug in heap_page_prune
Re: Nasty bug in heap_page_prune Re: Nasty bug in heap_page_prune |
List | pgsql-hackers |
While working on the previously discussed refactoring of heap_page_prune, I came to the realization that its use of CacheInvalidateHeapTuple is not just a PANIC risk but simply wrong :-( The semantics aren't right: inval.c assumes that it's dealing with transactional invalidations, but what we are dealing with in a redirect collapse is non-transactional. Once heap_page_prune completes, the CTID update is a done deal even if the calling transaction rolls back. However, inval.c will think it doesn't need to broadcast any invalidations after a failed transaction. This is fairly difficult to show a self-contained example of, because it can only occur if VACUUM FULL errors out after doing a page prune, and there's no very easy way to guarantee that will happen. I resorted to putting an elog(ERROR) call into vacuum.c right after the scan_heap() call. With that, it was possible to demonstrate the problem: regression=# create table foo(f1 int); CREATE TABLE regression=# select ctid,relname from pg_class where relname = 'foo'; ctid | relname --------+---------(9,32) | foo (1 row) -- need a HOT-candidate update to the pg_class row, eg regression=# alter table foo owner to joe; ALTER TABLE -- check that update is on same page, else it's not HOT regression=# select ctid,relname from pg_class where relname = 'foo'; ctid | relname --------+---------(9,33) | foo (1 row) -- make sure the updated tuple is in local syscache regression=# select 'foo'::regclass;regclass ----------foo (1 row) -- now, in another backend, execute intentionally broken VACUUM FULL pg_class -- and try to alter the updated tuple again using a syscache-based operation regression=# alter table foo owner to postgres; server closed the connection unexpectedly This probably means the server terminated abnormally before or whileprocessing the request. The connection to the server was lost. Attempting reset: Failed. The crash is here: TRAP: FailedAssertion("!(((lp)->lp_flags == 1))", File: "heapam.c", Line: 2330) LOG: server process (PID 8967) was terminated by signal 6 LOG: terminating any other active server processes because it's trying to find the tuple at a CTID that's no longer valid. Not sure about a clean solution to this. I don't really want to bastardize inval.c by making it deal with nontransactional semantics, but there may be no other way. Or we could forget about letting VACUUM FULL collapse out LP_REDIRECT pointers, at least in system catalogs. That might be the best answer for 8.3 in any case; I am not seeing a real fix that I'd care to risk back-patching. (Note that this is not exactly trivial in itself, since then vacuum.c would need at least some minimal ability to deal with LP_REDIRECT entries.) regards, tom lane
pgsql-hackers by date: