Home > mailing lists

Re: could not read block 77 of relation 1663/16385/388818775 - Mailing list pgsql-bugs

From	Marc Schablewski
Subject	Re: could not read block 77 of relation 1663/16385/388818775
Date	November 27, 2008 05:50:12
Msg-id	492E6D45.5090703@clickware.de Whole thread Raw
In response to	Re: could not read block 77 of relation 1663/16385/388818775 (John R Pierce <pierce@hogranch.com>)
Responses	Re: could not read block 77 of relation 1663/16385/388818775
List	pgsql-bugs

Tree view

I think both approaches (checksum and write protection) might contribute
to finding this bug. If pages with bogus data but correct checksum are
ever found on disk, I think this would prove that there is no hardware /
file system / os issue.

If an access violation resulting from writes to locked pages were hit,
would it be possible to log a stack backtrace?

 Especially on our test systems we can easily afford any performance
degradations resulting from this.

Question: Who is responsible for maintaining this part (buffer cache
maintenance, writer etc) of postgres code?
Could you provide the necessary patches?

Thanks in advance

Thomas Goerner
Marc Schablewski


John R Pierce wrote:
> Gregory Stark wrote:
>> John R Pierce <pierce@hogranch.com> writes:
>>
>>
>>> oracle has had an option for some time that uses read/only page
>>> protection for
>>> each page of the shared buffer area...   when oracle knows it wants
>>> to modify a
>>> page, it un-protects it via a system call.     this catches any wild
>>> writes
>>> into the shared buffer area as a memory protection fault.
>>>
>>
>> The problem with both of these approaches is that most bugs occur
>> when the
>> code *thinks* it's doing the right thing. A bug in the buffer
>> management code
>> which returns the wrong buffer or a real wild pointer dereference. I
>> don't
>> remember ever having either of those.
>>
>> That said, the second option seems pretty trivial to implement. I
>> think the
>> performance would be awful for a live database but for a read-only
>> database it
>> might make more sense.
>>
>
>
> FWIW, it has modest overhead on Oracle on Solaris on Sparc...  EXCEPT
> on the "Niagra" aka 'Coolthreads'  CPUs (the T1 processor), on that it
> was horribly slow on our write intensive transactional system.    Our
> environment is on very large scale servers where the shared buffers
> are often 32 or 64GB, I suspect this increases our exposure to
> bizarro-world writes.
>
> believe me, especially in earlier Oracle releases (6, 7, 8), this
> caught/prevented many problems which otherwise would have ended in a
> Oracle fatal Block Corruption error, which would require many hours of
> DBA hackery before the database could be restarted.
>
>
>

pgsql-bugs by date:

From: raf
Date: 27 November 2008, 00:23:25
Subject: Re: Bug in RETURN QUERY

From: Heikki Linnakangas
Date: 27 November 2008, 06:22:17
Subject: Re: could not read block 77 of relation 1663/16385/388818775

Re: could not read block 77 of relation 1663/16385/388818775 - Mailing list pgsql-bugs

Previous

Next