On Tue, Dec 21, 2004 at 05:56:47PM -0500, Tom Lane wrote:
> Mark Wong <markw@osdl.org> writes:
> > On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote:
> >> Mark Wong <markw@osdl.org> writes:
> >>> [2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
> >>
> >> Yikes. Is this reproducible?
>
> > Yes, and I think there is one for each of the rollbacks that are
> > occuring in the workload. Except for the 1% that's supposed to happen
> > for the new-order transaction.
>
> Well, we need to find out what's causing that. There are two possible
> sources of that error (one elog in src/backend/access/nbtree/nbtinsert.c,
> and one in src/backend/access/nbtree/nbtpage.c) and neither of them
> should ever fire.
>
> If you want to track it yourself, please change those elog(ERROR)s to
> elog(PANIC) so that they'll generate core dumps, then build with
> --enable-debug if you didn't already (--enable-cassert would be good too)
> and get a debugger stack trace from the core dump.
>
> Otherwise, can you extract a test case that causes this without needing
> vast resources to run?
>
> regards, tom lane
I was going to try Matthew's suggestion of turning up the debug on
pg_autovacuum, unless you don't that'll help find the cause. I'm not
sure if I can more easily reproduce the problem but i can try.
I'll go ahead and make the elog() changes you recommended and do a run
overnight either way.
Mark