VACUUM/t_ctid bug (was Re: GiST concurrency commited) - Mailing list pgsql-hackers

From Tom Lane
Subject VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date
Msg-id 20570.1124518809@sss.pgh.pa.us
Whole thread Raw
In response to GiST concurrency commited  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)  (Gavin Sherry <swm@linuxworld.com.au>)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)  (Teodor Sigaev <teodor@sigaev.ru>)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
Awhile back, Teodor Sigaev <teodor@sigaev.ru> wrote:
> And there is one more problem: it caused approximatly one time per 2-4 million 
> statements, I got traps:
> TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File: 
> "vacuum.c", Line: 2766)
> LOG:  server process (PID 15847) was terminated by signal 6
> Sorry, but I couldn't debug this trap and my knowledge about this piece of code 
> is very limited. Postgres didn't create a core file. I don't believe this 
> problem is in touch with my GiST framework, becouse it is about heap pages. I 
> suspect trap occurs while concurrent vacuum, but I am not sure.

> PS
> My concurrency testing scripts:
> http://www.sigaev.ru/gist/
> concur.pl - generator of SQL statements
> concur.sh - simple wrapper about concur.pl which reinit db, makes db and table.

I have committed changes that I believe fix this problem:
http://archives.postgresql.org/pgsql-committers/2005-08/msg00213.php
But it needs more testing.  Would you update to CVS tip and see if you
still see the failure?

Also, if anyone else has some vacuum + concurrent update test cases,
any testing you can do in CVS tip would be useful.  This patch is big
and ugly enough that back-patching it into all the supported back
branches is a pretty scary prospect.  I don't think we have a lot of
choice --- it is a data-loss risk --- but we need to beat the heck
out of the CVS-tip version before we start pushing it into the release
branches.

My current intention is to leave it just in CVS tip for the next few
days, and not to start developing back-branch versions until after
we've made the first 8.1 beta release.  The back-ports are going to
be painful (the code involved has changed often enough that I fear
each branch will need a custom tailored patch) ... so I really don't
want to start without some confidence that the CVS-tip patch is right.

In other words ... if you can test this ... HELP!!!
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Why is lock not released?
Next
From: Gavin Sherry
Date:
Subject: Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)