Re: Hmmm ... isn't count_nondeletable_pages all wet? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hmmm ... isn't count_nondeletable_pages all wet?
Date
Msg-id 29644.1189904237@sss.pgh.pa.us
Whole thread Raw
In response to Hmmm ... isn't count_nondeletable_pages all wet?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hmmm ... isn't count_nondeletable_pages all wet?
Re: Hmmm ... isn't count_nondeletable_pages all wet?
List pgsql-hackers
I wrote:
> Is this analysis accurate, or am I missing something?  If it is
> accurate, do we need to postpone the upcoming releases to fix it?
> I am thinking that some previously unexplained reports of index
> corruption might now be explained ...

Yeah, it's broken.  Reproducing the race condition is a bit tricky,
but if you have a debug-enabled build at hand it's simple.
Here's a test case (turn off autovacuum if it's on, to prevent it
from getting in there ahead of the manual vacuum):

In session 1:

regression=# create table foo (f1 int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
CREATE TABLE
regression=# insert into foo select g from generate_series(1,10) g;
INSERT 0 10
regression=# delete from foo;
DELETE 10

Now start session 2, and attach to it with gdb, and set a breakpoint
at lazy_truncate_heap().  Continue, and in session 2 do

regression=# vacuum verbose foo;
INFO:  vacuuming "public.foo"
INFO:  scanned index "foo_pkey" to remove 10 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "foo": removed 10 row versions in 1 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "foo_pkey" now contains 0 row versions in 2 pages
DETAIL:  10 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "foo": found 10 removable, 0 nonremovable row versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
1 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

[ hangs here at the breakpoint ]

In session 1 do:

regression=# begin;
BEGIN
regression=# insert into foo select g+100 from generate_series(1,10) g;
INSERT 0 10
regression=# abort;
ROLLBACK

Now let gdb continue from the breakpoint, and notice session 2 thinks
it can truncate away the whole table:

INFO:  "foo": truncated 1 to 0 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
VACUUM
regression=# 

Back to session 1, insert some conflicting data:

regression=# insert into foo select g+1000 from generate_series(1,10) g;
INSERT 0 10

and now we have a corrupt index with multiple pointers to these rows:

regression=# select * from foo where f1 = 105; f1  
------1005
(1 row)

regression=# select * from foo where f1 = 1005; f1  
------1005
(1 row)

I am fairly sure that this bug explains problems previously reported
by Merlin Moncure:
http://archives.postgresql.org/pgsql-general/2006-10/msg01312.php
and Florian Weimer:
http://archives.postgresql.org/pgsql-general/2006-11/msg00305.php
In both those cases, off-list investigation showed that the symptoms
were caused by multiple index entries pointing to the same heap tuples,
where one index entry matched the actual contents of the row and
the other did not.  In both cases this occurred for a fairly small
number of rows that were clumped together into small ranges of blocks.
It looks to me like this is perfectly explained by the theory that
that range of blocks had been truncated away by a VACUUM at some point
in the table's history, and that the non-matching index entries stemmed
from an insert or update that occurred and then aborted after VACUUM had
examined the blocks the first time but before it could return to check
whether the blocks were still empty.

It's a corner case, but I say it's a must-fix.  Those bug reports have
been bothering me for most of a year ...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Hmmm ... isn't count_nondeletable_pages all wet?
Next
From: Gregory Stark
Date:
Subject: Re: Hmmm ... isn't count_nondeletable_pages all wet?