Thread: could not read block 77 of relation 1663/16385/388818775

could not read block 77 of relation 1663/16385/388818775

From
Alexandra Nitzschke
Date:
Hi,

we encountered the following error while inserting a record into a table:

org.postgresql.util.PSQLException: ERROR: could not read block 77 of relation 1663/16385/388818775: read only 0 of 8192

bytes

Using postgres 8.3.5

The reported object is an index.
The size of its data file is 630784 bytes.

The output of pg_filedump of block 76 is ok.
The output of pg_filedump of block 77 is the following:

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.3.0
*
* File: /tmp/388818775
* Options used: -i -f -R 77
*
* Dump created on: Thu Nov 20 11:18:26 2008
*******************************************************************
Error: Premature end of file encountered.


If you like we could send the data file to you.


Regards,

   A. Nitzschke

Re: could not read block 77 of relation 1663/16385/388818775

From
Craig Ringer
Date:
Alexandra Nitzschke wrote:
> Hi,
>
> we encountered the following error while inserting a record into a table:
>
> org.postgresql.util.PSQLException: ERROR: could not read block 77 of
> relation 1663/16385/388818775: read only 0 of 8192 bytes

This is probably a problem with your disk or filesystem. Have you
checked your disks and file system, checked your system logs for disk
errors, made sure your RAID array is in good condition, etc?

You should be able to fix it by REINDEXing the problem index. You can
find out which index it is from pg_catalog, though if you just REINDEX
the table(s) being operated on by the query that should work too.

Consider making a copy of your database and your log files before you
REINDEX in case one of the developers thinks it might actually be caused
by a PostgreSQL bug and wants to have a look.

--
Craig Ringer

Re: could not read block 77 of relation 1663/16385/388818775

From
Rafael Martinez
Date:
Alexandra Nitzschke wrote:
> Hi,
>
> we encountered the following error while inserting a record into a table:
>
> org.postgresql.util.PSQLException: ERROR: could not read block 77 of
> relation 1663/16385/388818775: read only 0 of 8192 bytes
>
> Using postgres 8.3.5
>
> The reported object is an index.
> The size of its data file is 630784 bytes.
>

Hello

You have probably a harware issue and data in your disk is corrupted.

Probably a reindex of this index will fix this problem. But I would take
a backup of this system right away just in case your disk is dying.

regards
--
 Rafael Martinez, <r.m.guerrero@usit.uio.no>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

Re: could not read block 77 of relation 1663/16385/388818775

From
Alexandra Nitzschke
Date:
Hi,

we have had similar postgres problems in the past.
Please have a look at Bug 3484.

We didn't resolve the problems metioned in bug 3484. The other postgres developers also thought, that there are
hardware
problems.
So our customer bought a new server with diffrent hardware configuration ( ... and NEW hardware drives ... ).
The error today encountered on the new machine. Just running under heavy load since two days.

We ran two file checks. One normal with no error result.
One deep scan (e2fsck -fcn) searching for corrupted blocks. No error was reported.

It really looks unlikely having a hardware problem now.

Here are some details about the servers:

*NEW* server:
postgres 8.3.5
SUSE 10.3
Kernel 2.6.22.19-0.1-default
Supermicro-Mainboard + 2x AMD Opteron Dual Core 2218 2,60 GHz
4x 1024MB ECC Registered DDR2 Ram ATP
3Ware 9650 4x SATA

*OLD* server:
postgres 8.3.5
SUSE 10.0
Kernel 2.6.22.6-smp
Intel-Mainboard + 2x Intel XEON 2,80 GHz 2MB FSB800
4x 1024MB ECC Registered DDR2 RAM
3Ware Raid Controller 9500S-4


Regards,

   A. Nitzschke



Craig Ringer schrieb:
> Alexandra Nitzschke wrote:
>> Hi,
>>
>> we encountered the following error while inserting a record into a table:
>>
>> org.postgresql.util.PSQLException: ERROR: could not read block 77 of
>> relation 1663/16385/388818775: read only 0 of 8192 bytes
>
> This is probably a problem with your disk or filesystem. Have you
> checked your disks and file system, checked your system logs for disk
> errors, made sure your RAID array is in good condition, etc?
>
> You should be able to fix it by REINDEXing the problem index. You can
> find out which index it is from pg_catalog, though if you just REINDEX
> the table(s) being operated on by the query that should work too.
>
> Consider making a copy of your database and your log files before you
> REINDEX in case one of the developers thinks it might actually be caused
> by a PostgreSQL bug and wants to have a look.
>
> --
> Craig Ringer
>
>

Re: could not read block 77 of relation 1663/16385/388818775

From
Craig Ringer
Date:
Alexandra Nitzschke wrote:
> Hi,
>
> we have had similar postgres problems in the past.
> Please have a look at Bug 3484.
>
> We didn't resolve the problems metioned in bug 3484. The other postgres
> developers also thought, that there are hardware
> problems.
> So our customer bought a new server with diffrent hardware configuration
> ( ... and NEW hardware drives ... ).
> The error today encountered on the new machine. Just running under heavy
> load since two days.

Yes, that does seem somewhat unlikely, especially if in both cases
you've only seen issues with PostgreSQL. However, I'm a bit confused
about the fact that you're seeing apparent corruption all over the place
- your earlier report mentions damaged blocks across a number of
relations, and this one is a bad index. You'd expect this sort of thing
to come up a lot on the list, so it must be assumed that there's
something a bit unusual or different about your configuration that's
either triggering a hard-to-hit bug in PostgreSQL, or that's damaging
PostgreSQL's data somehow.

Is there any chance you have EVER hard-killed the postmaster manually
(eg with "kill -9" or "kill -KILL")? If you do that and don't also kill
the backends, it's my understanding that BAD things may happen
especially if you then attempt to relaunch the postmaster.

Do you use _any_ 3rd party C extensions? Contrib modules? It doesn't
have to be in the same database, another database on the same machine
could be bad too.

Do you have any unusual workload? What is your workload like?

What procedural languages, if any, do you use? Pl/PgSQL? Pl/Perl?
Pl/Java? Pl/Python? etc. Again, in any database, not just your problem
one. If you use any other than Pl/PgSQL please also note the version of
the language interpreter/tools and in the case of Java the JVM vendor &
install method.

Does your site possibly have dodgy power? Are the servers on a UPS?

Have the servers had any crashes, kernel panics, unexpected reboots, or
hard poweroffs?

(Not that it should matter, but): Have you hard killed any backends
(kill -9 / SIGKILL)?

If you run a RAID verify using tw_cli or through the 3dm web interface,
does it report any block mismatches in the array?

--
Craig Ringer

Re: could not read block 77 of relation 1663/16385/388818775

From
Heikki Linnakangas
Date:
Craig Ringer wrote:
> Is there any chance you have EVER hard-killed the postmaster manually
> (eg with "kill -9" or "kill -KILL")? If you do that and don't also kill
> the backends, it's my understanding that BAD things may happen
> especially if you then attempt to relaunch the postmaster.

There is safeguards against that. If postmaster dies, the backends
should die quickly and gracefully too. And postmaster refuses to restart
  until all the backends have died and detached from the shared memory
segment.

In addition to Craig's question: have you ever experienced sudden power
loss, or operating system crash on these machines? Have you done "kill
-9 postmaster", "pg_ctl stop -m immediate", or similar? PostgreSQL
should recover with no data corruption, of course, but if there's a bug
somewhere, it would help to know where to look.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: could not read block 77 of relation 1663/16385/388818775

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Craig Ringer wrote:
>> Is there any chance you have EVER hard-killed the postmaster manually
>> (eg with "kill -9" or "kill -KILL")?

> There is safeguards against that. If postmaster dies, the backends
> should die quickly and gracefully too. And postmaster refuses to restart
>   until all the backends have died and detached from the shared memory
> segment.

A sufficiently bull-headed DBA can defeat those safeguards though ---
from memory, manually removing postmaster.pid and then starting a new
postmaster while old backends remain alive is the ticket to data
corruption.

            regards, tom lane

Re: could not read block 77 of relation 1663/16385/388818775

From
Alexandra Nitzschke
Date:
Hi,

here is some information about the server

- no other database system runs on the server
- suse 10.3, standard installation
- jvm 1.5.0_16
- as interface to the database we use jdbc version 8.1-407
- as procdural language we only use pl/pgsql
- we have no specials in the database like custom dataypes an so on

We have had a look at the /var/log files, no system crash, kernel panic or messages like this has happened.

Running a RAID verify reported no error.

We have setup a new database on 2008/11/17 after updating to 8.3.5.
Since then we didn't kill the postmaster manually or any backend process using kill -9.
Just normal stop/start/restart, no manually deletion of postmaster-pid or something like that.


Let me tell the circumstances:

To handle any failures of the system, we build a "pair" of servers.
One server is primary server and one is stand by server.
In case of any system failure we are able to switch the servers and can use the "old" stand by as "new" primary.

For database replication we use the "Warm Standby Using Point-In-Time Recovery" method.

Last Friday I updated postgres to 8.3.5 on the primary server and setup a new database and insert a dump from wednesday

using the pg_restore utility.

This monday I updated postgres to 8.3.5 on the standby server.
After that I intialized the database
( copy once the database from the primary system: removing data/* on stand-by, setting the database on primary in
backup-modus and then copy the database files )
and startet the recovery-mode on standby and the WAL replication from primary.
On Tuesday we have done a switch test and started up the database on the standby and use it as primary server.
Everything works fine until yesterday morning.

Our workload:
Over night we retrieve a lot of data and insert it into the database ( ~300000 - ~700000 recordsets into diffrent
tables 
but three main tables ).
In the morning we do some processing, but mostly selects.
During the day people work on the webapp on the system, but do mostly selects.
Every evening runs "vacuum analyze".

Regards,

   A. Nitzschke



Craig Ringer schrieb:
> Alexandra Nitzschke wrote:
>> Hi,
>>
>> we have had similar postgres problems in the past.
>> Please have a look at Bug 3484.
>>
>> We didn't resolve the problems metioned in bug 3484. The other postgres
>> developers also thought, that there are hardware
>> problems.
>> So our customer bought a new server with diffrent hardware configuration
>> ( ... and NEW hardware drives ... ).
>> The error today encountered on the new machine. Just running under heavy
>> load since two days.
>
> Yes, that does seem somewhat unlikely, especially if in both cases
> you've only seen issues with PostgreSQL. However, I'm a bit confused
> about the fact that you're seeing apparent corruption all over the place
> - your earlier report mentions damaged blocks across a number of
> relations, and this one is a bad index. You'd expect this sort of thing
> to come up a lot on the list, so it must be assumed that there's
> something a bit unusual or different about your configuration that's
> either triggering a hard-to-hit bug in PostgreSQL, or that's damaging
> PostgreSQL's data somehow.
>
> Is there any chance you have EVER hard-killed the postmaster manually
> (eg with "kill -9" or "kill -KILL")? If you do that and don't also kill
> the backends, it's my understanding that BAD things may happen
> especially if you then attempt to relaunch the postmaster.
>
> Do you use _any_ 3rd party C extensions? Contrib modules? It doesn't
> have to be in the same database, another database on the same machine
> could be bad too.
>
> Do you have any unusual workload? What is your workload like?
>
> What procedural languages, if any, do you use? Pl/PgSQL? Pl/Perl?
> Pl/Java? Pl/Python? etc. Again, in any database, not just your problem
> one. If you use any other than Pl/PgSQL please also note the version of
> the language interpreter/tools and in the case of Java the JVM vendor &
> install method.
>
> Does your site possibly have dodgy power? Are the servers on a UPS?
>
> Have the servers had any crashes, kernel panics, unexpected reboots, or
> hard poweroffs?
>
> (Not that it should matter, but): Have you hard killed any backends
> (kill -9 / SIGKILL)?
>
> If you run a RAID verify using tw_cli or through the 3dm web interface,
> does it report any block mismatches in the array?
>
> --
> Craig Ringer
>
>
>

Re: could not read block 77 of relation 1663/16385/388818775

From
Tom Lane
Date:
Alexandra Nitzschke <an@clickware.de> writes:
> We have had a look at the /var/log files, no system crash, kernel panic or messages like this has happened.

What this smells like is a failed page split --- somewhere in the index
there is a down-link pointing at page 77, but page 77 didn't actually
get added to the file.

If there was no system crash or hardware misfeasance then that'd
represent a Postgres bug, but it's difficult to do much about it
unless you can come up with a reproducible sequence to trigger the bug.

(Also, since you didn't mention otherwise, I assume this is a btree
index?)

            regards, tom lane

Re: could not read block 77 of relation 1663/16385/388818775

From
Alexandra Nitzschke
Date:
Yes, its a btree.


Tom Lane schrieb:
> Alexandra Nitzschke <an@clickware.de> writes:
>> We have had a look at the /var/log files, no system crash, kernel panic or messages like this has happened.
>
> What this smells like is a failed page split --- somewhere in the index
> there is a down-link pointing at page 77, but page 77 didn't actually
> get added to the file.
>
> If there was no system crash or hardware misfeasance then that'd
> represent a Postgres bug, but it's difficult to do much about it
> unless you can come up with a reproducible sequence to trigger the bug.
>
> (Also, since you didn't mention otherwise, I assume this is a btree
> index?)
>
>             regards, tom lane
>
>

Re: could not read block 77 of relation 1663/16385/388818775

From
Tom Lane
Date:
Alexandra Nitzschke <an@clickware.de> writes:
> Yes, its a btree.

Well, the btree code is sufficiently well tested/debugged that I think
there's zero chance of finding such a bug in it just on the suspicion
that there might be one there.  We really need a test case.

            regards, tom lane

Re: could not read block 77 of relation 1663/16385/388818775

From
Gregory Stark
Date:
Alexandra Nitzschke <an@clickware.de> writes:

> This monday I updated postgres to 8.3.5 on the standby server.
> After that I intialized the database
> ( copy once the database from the primary system: removing data/* on stand-by,
> setting the database on primary in backup-modus and then copy the database
> files )

Uhm, just to be sure. You did pg_start_backup() on the primary *before* you
started copying the data files across, right?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: could not read block 77 of relation 1663/16385/388818775

From
Alexandra Nitzschke
Date:
Yes, of course.

After we started up the copied database, the system runs withour error for two days.
That means some recordsets has been inserted sucessfully already.


Gregory Stark schrieb:
> Alexandra Nitzschke <an@clickware.de> writes:
>
>> This monday I updated postgres to 8.3.5 on the standby server.
>> After that I intialized the database
>> ( copy once the database from the primary system: removing data/* on stand-by,
>> setting the database on primary in backup-modus and then copy the database
>> files )
>
> Uhm, just to be sure. You did pg_start_backup() on the primary *before* you
> started copying the data files across, right?
>

Re: could not read block 77 of relation 1663/16385/388818775

From
Alexandra Nitzschke
Date:
Hi,

first of all I would like to thank you for all your efforts.

 > We really need a test case.

unfortunately this kind of bugs tend to be non-reproducable. I assume that there is a race condition which is only hit
in rare cases, under heavy load and when mars and venus are exactly aligned... ;-)

I do not think it is possible to construct a test case that reliably reproduces the bug.

However, we would be glad to incorporate any patches, additional logging code etc. in our installation of postgres that

might help you to track the bug. Since we always build postgres on the production machine this would not be any
problem.

Unfortunately we handle very sensitive data in our databases, so we cannot give you direct access to our machines. As a

  last resort I would propose the following (provided that our customer agrees):

We set up another machine and feed it with obfuscated data, putting it under the same load as our production machine.
We 
could then give you root access to that machine and you could do whatever patching, monitoring, testing etc. might be
helpful to track the problem. Do you think this might help?

BTW... how about a block checksum that is checked just before writing a block and just after reading it? I know this
would degrade performance, but I think we can afford that. Would it be possible to incorporate such code without having

to do too much patching?

Thanks in advance

Alexandra Nitzschke
Thomas Goerner


Tom Lane schrieb:
> Alexandra Nitzschke <an@clickware.de> writes:
>> Yes, its a btree.
>
> Well, the btree code is sufficiently well tested/debugged that I think
> there's zero chance of finding such a bug in it just on the suspicion
> that there might be one there.  We really need a test case.
>
>             regards, tom lane
>
>

Re: could not read block 77 of relation 1663/16385/388818775

From
John R Pierce
Date:
Alexandra Nitzschke wrote:
> BTW... how about a block checksum that is checked just before writing
> a block and just after reading it? I know this would degrade
> performance, but I think we can afford that. Would it be possible to
> incorporate such code without having to do too much patching?


oracle has had an option for some time that uses read/only page
protection for each page of the shared buffer area...   when oracle
knows it wants to modify a page, it un-protects it via a system
call.     this catches any wild writes into the shared buffer area as a
memory protection fault.

Re: could not read block 77 of relation 1663/16385/388818775

From
Gregory Stark
Date:
John R Pierce <pierce@hogranch.com> writes:

> Alexandra Nitzschke wrote:
>> BTW... how about a block checksum that is checked just before writing a block
>> and just after reading it? I know this would degrade performance, but I think
>> we can afford that. Would it be possible to incorporate such code without
>> having to do too much patching?
>
> oracle has had an option for some time that uses read/only page protection for
> each page of the shared buffer area...   when oracle knows it wants to modify a
> page, it un-protects it via a system call.     this catches any wild writes
> into the shared buffer area as a memory protection fault.

The problem with both of these approaches is that most bugs occur when the
code *thinks* it's doing the right thing. A bug in the buffer management code
which returns the wrong buffer or a real wild pointer dereference. I don't
remember ever having either of those.

That said, the second option seems pretty trivial to implement. I think the
performance would be awful for a live database but for a read-only database it
might make more sense.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: could not read block 77 of relation 1663/16385/388818775

From
John R Pierce
Date:
Gregory Stark wrote:
> John R Pierce <pierce@hogranch.com> writes:
>
>
>> oracle has had an option for some time that uses read/only page protection for
>> each page of the shared buffer area...   when oracle knows it wants to modify a
>> page, it un-protects it via a system call.     this catches any wild writes
>> into the shared buffer area as a memory protection fault.
>>
>
> The problem with both of these approaches is that most bugs occur when the
> code *thinks* it's doing the right thing. A bug in the buffer management code
> which returns the wrong buffer or a real wild pointer dereference. I don't
> remember ever having either of those.
>
> That said, the second option seems pretty trivial to implement. I think the
> performance would be awful for a live database but for a read-only database it
> might make more sense.
>


FWIW, it has modest overhead on Oracle on Solaris on Sparc...  EXCEPT on
the "Niagra" aka 'Coolthreads'  CPUs (the T1 processor), on that it was
horribly slow on our write intensive transactional system.    Our
environment is on very large scale servers where the shared buffers are
often 32 or 64GB, I suspect this increases our exposure to bizarro-world
writes.

believe me, especially in earlier Oracle releases (6, 7, 8), this
caught/prevented many problems which otherwise would have ended in a
Oracle fatal Block Corruption error, which would require many hours of
DBA hackery before the database could be restarted.

Re: could not read block 77 of relation 1663/16385/388818775

From
Marc Schablewski
Date:
I think both approaches (checksum and write protection) might contribute
to finding this bug. If pages with bogus data but correct checksum are
ever found on disk, I think this would prove that there is no hardware /
file system / os issue.

If an access violation resulting from writes to locked pages were hit,
would it be possible to log a stack backtrace?

 Especially on our test systems we can easily afford any performance
degradations resulting from this.

Question: Who is responsible for maintaining this part (buffer cache
maintenance, writer etc) of postgres code?
Could you provide the necessary patches?

Thanks in advance

Thomas Goerner
Marc Schablewski


John R Pierce wrote:
> Gregory Stark wrote:
>> John R Pierce <pierce@hogranch.com> writes:
>>
>>
>>> oracle has had an option for some time that uses read/only page
>>> protection for
>>> each page of the shared buffer area...   when oracle knows it wants
>>> to modify a
>>> page, it un-protects it via a system call.     this catches any wild
>>> writes
>>> into the shared buffer area as a memory protection fault.
>>>
>>
>> The problem with both of these approaches is that most bugs occur
>> when the
>> code *thinks* it's doing the right thing. A bug in the buffer
>> management code
>> which returns the wrong buffer or a real wild pointer dereference. I
>> don't
>> remember ever having either of those.
>>
>> That said, the second option seems pretty trivial to implement. I
>> think the
>> performance would be awful for a live database but for a read-only
>> database it
>> might make more sense.
>>
>
>
> FWIW, it has modest overhead on Oracle on Solaris on Sparc...  EXCEPT
> on the "Niagra" aka 'Coolthreads'  CPUs (the T1 processor), on that it
> was horribly slow on our write intensive transactional system.    Our
> environment is on very large scale servers where the shared buffers
> are often 32 or 64GB, I suspect this increases our exposure to
> bizarro-world writes.
>
> believe me, especially in earlier Oracle releases (6, 7, 8), this
> caught/prevented many problems which otherwise would have ended in a
> Oracle fatal Block Corruption error, which would require many hours of
> DBA hackery before the database could be restarted.
>
>
>

Re: could not read block 77 of relation 1663/16385/388818775

From
Heikki Linnakangas
Date:
Marc Schablewski wrote:
> If pages with bogus data but correct checksum are
> ever found on disk, I think this would prove that there is no hardware /
> file system / os issue.

No, it would only suggest that the issue is not in the filesystem or I/O
subsystem. Even then, it wouldn't catch bugs where the contents of one
block are copied over or swapped with another block. The checksum would
be calculated when a page is written to disk, so the corruption could
still be caused by faulty memory, memory bus, CPU or OS, while the page
sits in the buffer cache.

> If an access violation resulting from writes to locked pages were hit,
> would it be possible to log a stack backtrace?

I think you'd get a segmentation fault. With a core dump if the system
is configured so.

> Question: Who is responsible for maintaining this part (buffer cache
> maintenance, writer etc) of postgres code?

There's no named individuals, just the community in general.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: could not read block 77 of relation 1663/16385/388818775

From
Darren Reed
Date:
As another data point, I upgraded from 8.2.6 to 8.3.6
and I saw those corruption messages after restoring
pg_dump files back into the database with psql and
inserting a bunch of data (some duplicate records
got rejected.) The insertion of new data is via
perl's DBI interface.

This was after the upgrade, so there had been no
dirty stops, etc.

Thinking that it might be an index problem, I've tried
reindexing the entire database - no errors but neither
does the problem go away. Nor does it stop the database
from starting up.

Where it pops up seems to be random (going from the errors
on postgres's log from earlier starts.) At the moment,
it's stopping me from creating a new table or dropping an
existing table (where I first ran into the error.) But
accessing all of the tables is fine...

I suppose the logical thing for me to do is go back to 8.2.6.

Darren

ERROR:  could not read block 1 of relation 1664/0/1233: read only 0 of
8192 bytes

Re: could not read block 77 of relation 1663/16385/388818775

From
Gregory Stark
Date:
Darren Reed <darrenr+postgres@fastmail.net> writes:

> ERROR:  could not read block 1 of relation 1664/0/1233: read only 0 of
> 8192 bytes

FWIW this is pg_shdepend_reference_index which is actually a bit special. It's
a "shared" relation which means it spans all your databases. Your reindex
didn't rebuild to. To reindex it you would have to shut down postgres and run
REINDEX in postgres in "standalone" mode.

> I suppose the logical thing for me to do is go back to 8.2.6.

I think it would be more interesting to know how you got into this situation.
When you ran initdb did anything unusual happen? Is it possible anything later
truncated these files?

There's no reason I can imagine 8.3 would be any more susceptible to this than
8.2. And certainly no reason you would want to use a year-old release of 8.2
missing a year's worth of bug fixes and security fixes. The current release of
8.2 is 8.2.12.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: could not read block 77 of relation 1663/16385/388818775

From
Tom Lane
Date:
Darren Reed <darrenr+postgres@fastmail.net> writes:
> For me it has been very reproducible:

If you can put together a self-contained test case that would let
someone else reproduce it, we'd be very interested to take a look.

            regards, tom lane

Re: could not read block 77 of relation 1663/16385/388818775

From
Darren Reed
Date:
Gregory Stark wrote:
> Darren Reed <darrenr+postgres@fastmail.net> writes:
>
>
>> ERROR:  could not read block 1 of relation 1664/0/1233: read only 0 of
>> 8192 bytes
>>
>
> FWIW this is pg_shdepend_reference_index which is actually a bit special. It's
> a "shared" relation which means it spans all your databases. Your reindex
> didn't rebuild to. To reindex it you would have to shut down postgres and run
> REINDEX in postgres in "standalone" mode.
>
>
>> I suppose the logical thing for me to do is go back to 8.2.6.
>>
>
> I think it would be more interesting to know how you got into this situation.
> When you ran initdb did anything unusual happen? Is it possible anything later
> truncated these files?
>

Nope.
For me it has been very reproducible:
- init the database
- restore tables from dump files
- create indexes for those tables
- insert some records using perl DBI
- receive above error message

My current logifile goes like this:
- database startup
- create tables creating implicit keys
- lots of checkpoints happening too quick (restores)
- 3 insert errors due to syntax
- 2 EOFs from clients
- ERROR could not read block 1 of relation...

I haven't even run any queries.

The single-user mode REINDEX did fix it, thanks.

Darren