Re: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
Date
Msg-id CA+TgmobXSwaEe8qVxa+50=Fk4iJMkJUmjSqJ4bHX8bMM-b10dg@mail.gmail.com
Whole thread Raw
In response to Re: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, May 31, 2012 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> The one thing that still seems a little odd to me is that this caused
>> a pin count to get orphaned.  It seems reasonable that ignoring the
>> AccessExclusiveLock could result in not-found errors trying to open a
>> missing relation, and even fsync requests on a missing relation.  But
>> I don't see why that would cause the backend-local pin counts to get
>> messed up, which makes me wonder if there really is another bug here
>> somewhere.
>
> According to Heikki's log, the Assert was in the startup process itself,
> and it happened after an error:
>
>> 2012-05-26 10:44:28.587 CEST 10270 FATAL:  could not open file "base/21268/32994": No such file or directory
>> 2012-05-26 10:44:28.588 CEST 10270 CONTEXT:  writing block 2508 of relation base/21268/32994
>>          xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 tuples
>> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741)
>> 2012-05-26 10:44:31.131 CEST 10269 LOG:  startup process (PID 10270) was terminated by signal 6: Aborted
>
> I don't think that code is meant to recover from errors anyway, so
> the fact that it fails with a pin count held isn't exactly surprising.
> But it might be worth looking at exactly which on_proc_exit callbacks
> are installed in the startup process and what assumptions they make.

Which code isn't meant to recover from errors?

> As for where the error came from in the first place, it's easy to
> imagine somebody who's not got the word about the AccessExclusiveLock
> reading pages of the table into buffers that have already been scanned
> by the DROP.  So you'd end up with orphaned buffers belonging to a
> vanished table.  If somebody managed to dirty them by setting hint bits
> (we do allow that in HS mode no?) then later you'd have various processes
> trying to write the buffer before recycling it, which seems to fit the
> reported error.

Right, I understand the other errors.  It's just the pin count that I
am a bit confused about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [PERFORM] pg_dump and thousands of schemas
Next
From: Robert Klemme
Date:
Subject: Re: [PERFORM] pg_dump and thousands of schemas