On Thu, May 31, 2012 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> The one thing that still seems a little odd to me is that this caused
>> a pin count to get orphaned. It seems reasonable that ignoring the
>> AccessExclusiveLock could result in not-found errors trying to open a
>> missing relation, and even fsync requests on a missing relation. But
>> I don't see why that would cause the backend-local pin counts to get
>> messed up, which makes me wonder if there really is another bug here
>> somewhere.
>
> According to Heikki's log, the Assert was in the startup process itself,
> and it happened after an error:
>
>> 2012-05-26 10:44:28.587 CEST 10270 FATAL: could not open file "base/21268/32994": No such file or directory
>> 2012-05-26 10:44:28.588 CEST 10270 CONTEXT: writing block 2508 of relation base/21268/32994
>> xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 tuples
>> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741)
>> 2012-05-26 10:44:31.131 CEST 10269 LOG: startup process (PID 10270) was terminated by signal 6: Aborted
>
> I don't think that code is meant to recover from errors anyway, so
> the fact that it fails with a pin count held isn't exactly surprising.
> But it might be worth looking at exactly which on_proc_exit callbacks
> are installed in the startup process and what assumptions they make.
Which code isn't meant to recover from errors?
> As for where the error came from in the first place, it's easy to
> imagine somebody who's not got the word about the AccessExclusiveLock
> reading pages of the table into buffers that have already been scanned
> by the DROP. So you'd end up with orphaned buffers belonging to a
> vanished table. If somebody managed to dirty them by setting hint bits
> (we do allow that in HS mode no?) then later you'd have various processes
> trying to write the buffer before recycling it, which seems to fit the
> reported error.
Right, I understand the other errors. It's just the pin count that I
am a bit confused about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company