Home > mailing lists

Re: 8.4b2 tsearch2 strange error - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: 8.4b2 tsearch2 strange error
Date	June 5, 2009 19:49:13
Msg-id	13882.1244242148@sss.pgh.pa.us Whole thread Raw
In response to	Re: 8.4b2 tsearch2 strange error (Tatsuo Ishii <ishii@postgresql.org>)
Responses	Re: 8.4b2 tsearch2 strange error
List	pgsql-hackers

Tree view

I wrote:
>> I poked around in the dump for awhile.  I still can't reproduce the
>> failure from a standing start.

I've had a bit of success with reproducing the failure from scratch.
After loading the SQL dump that Tatsuo previously supplied into a UTF8,
C-locale database, generate a data-only dump of it, along the lines of

pg_dump -a ... >ishii-data.sql

Then run this psql script in the database (the function is designed to
exercise every entry in the GIN index):

create table msgstats as select * from ts_stat('select body_index from msginfo');

create language plpgsql;
create function testit () returns int as $$
declare r record;       ifail int := 0;
begin for r in select * from msgstats loop   begin   perform count(*) from msginfo where plainto_tsquery(r.word) @@
body_index;  exception when others then     raise notice 'failed for "%": %', r.word, sqlerrm;     ifail := ifail + 1;
end; end loop; return ifail;

end $$ language plpgsql;

delete from msginfo;
truncate msg_folderinfo;
\i ishii-data.sql
reset client_min_messages;

select testit();

The results are not very stable, but I fairly often get "tuple offset out
of range" errors, and my last try resulted in an Assert failure in
autovacuum.  What I think is happening is that there's some inadequately
locked page-changing logic in there, and if you can get autovacuum to
happen on msginfo concurrently with the data reload step, it'll mess up.
Given that we've not seen reports of this on 8.3, I'll bet lunch that
the bug is somewhere in the "fast insert" code.  I haven't found it
yet, though.

BTW, I also did more analysis of Tatsuo's binary dump using this
function.  Of some 38000 distinct lexemes in the body_index column,
about 1500 trigger an error in testit(); the errors are either
"tuple offset out of range" or something like "could not open segment 1
of relation base/16384/16497 (target block 955254869): No such file or
directory".  There are about 440 distinct messages that show up,
suggesting that approximately that many pages of the index contain
corrupted TIDs.  Manual examination of the index is consistent with
this, and also indicates that all of the corrupt pages are in the second
half of the table.  That's not too hard to square with the
concurrent-update theory --- autovac might have already done about half
the index before a large data load started.  However it means that
corruption is *very* probable if data insertion is happening
concurrently with a vacuum, else not such a large number of pages would
have gotten messed up.
        regards, tom lane

pgsql-hackers by date:

From: Josh Berkus
Date: 05 June 2009, 19:34:23
Subject: Re: blocking referencing system catalogs in 8.4 breaks my code

From: Emmanuel Cecchet
Date: 05 June 2009, 20:06:25
Subject: Unique namespace per session for temp tables?

Re: 8.4b2 tsearch2 strange error - Mailing list pgsql-hackers

Previous

Next