Thread: ERROR: could not read block 4707 of relation 1663/16384/16564: Success

ERROR: could not read block 4707 of relation 1663/16384/16564: Success

From
Deniz Atak
Date:
Hi,

I am using postgresql on Glassfish server and I have EJB 3.0 for ORM. I am trying to run a query in PSQL but receiving following error:

Local Exception Stack:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.0.0.v20091031-r5713): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: ERROR: could not read block 4707 of relation 1663/16384/16564: Success
Error Code: 0
Call: select  src_ip  from table where date > '2011.07.29' AND date < '2011.07.30'  AND (  (src_ip = 'anIP' )  OR  ( src_ip = 'anotherIP' )  ) group by src_ip;  
    bind => [2011-07-29 00:00:00.0, 2011-07-30 00:00:00.0, 195.122.20.236, 195.122.20.88, 500, 0]
Query: ResultSetMappingQuery(sql="select  src_ip  from table where date > '2011.07.29' AND date < '2011.07.30'  AND (  (src_ip = 'anIP' )  OR  ( src_ip = 'anotherIP' )  ) group by src_ip; ")

It is kind of weird to have an error that ends with success :)

Do you have any opinion about this problem? Thanks in advance.

Deniz



My guess is some one moved the data folder or the directory got deleted (/var/lib/pgsql/9.0/data/.../...1663/16384/16564). Without server restart. I am sure some experts gonna answer this very well.

Thanks
Deepak
On Jul 30, 2011, at 2:01 AM, Deniz Atak <denizatak@gmail.com> wrote:

Hi,

I am using postgresql on Glassfish server and I have EJB 3.0 for ORM. I am trying to run a query in PSQL but receiving following error:

Local Exception Stack:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.0.0.v20091031-r5713): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: ERROR: could not read block 4707 of relation 1663/16384/16564: Success
Error Code: 0
Call: select  src_ip  from table where date > '2011.07.29' AND date < '2011.07.30'  AND (  (src_ip = 'anIP' )  OR  ( src_ip = 'anotherIP' )  ) group by src_ip;  
    bind => [2011-07-29 00:00:00.0, 2011-07-30 00:00:00.0, 195.122.20.236, 195.122.20.88, 500, 0]
Query: ResultSetMappingQuery(sql="select  src_ip  from table where date > '2011.07.29' AND date < '2011.07.30'  AND (  (src_ip = 'anIP' )  OR  ( src_ip = 'anotherIP' )  ) group by src_ip; ")

It is kind of weird to have an error that ends with success :)

Do you have any opinion about this problem? Thanks in advance.

Deniz



Deniz Atak <denizatak@gmail.com> writes:
> I am using postgresql on Glassfish server and I have EJB 3.0 for ORM. I am
> trying to run a query in PSQL but receiving following error:

> Local Exception Stack:
> Exception [EclipseLink-4002] (Eclipse Persistence Services -
> 2.0.0.v20091031-r5713): org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: org.postgresql.util.PSQLException: ERROR: could not read
> block 4707 of relation 1663/16384/16564: Success

What Postgres server version is that?

If it's 8.2 or older, this probably indicates a partial block at the end
of the file.  Newer versions produce a more sensible error message for
the case, but that's just cosmetic --- the real problem is a messed-up
table.  Have you had a filesystem corruption or an out-of-disk-space
condition on this machine?

            regards, tom lane

Re: ERROR: could not read block 4707 of relation 1663/16384/16564: Success

From
Deniz Atak
Date:
Deepak, Tom thanks for answering.

Tom, we have psql 8.1.18. So you are right, this weird message is because of the old version. I will check with my colleague about the possible reasons. What can I do if there is a messed up table?

Regards,
Deniz

On Sat, Jul 30, 2011 at 11:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Deniz Atak <denizatak@gmail.com> writes:
> I am using postgresql on Glassfish server and I have EJB 3.0 for ORM. I am
> trying to run a query in PSQL but receiving following error:

> Local Exception Stack:
> Exception [EclipseLink-4002] (Eclipse Persistence Services -
> 2.0.0.v20091031-r5713): org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: org.postgresql.util.PSQLException: ERROR: could not read
> block 4707 of relation 1663/16384/16564: Success

What Postgres server version is that?

If it's 8.2 or older, this probably indicates a partial block at the end
of the file.  Newer versions produce a more sensible error message for
the case, but that's just cosmetic --- the real problem is a messed-up
table.  Have you had a filesystem corruption or an out-of-disk-space
condition on this machine?

                       regards, tom lane

I am not sure how big your table is one way we implemented here was we selected the clean rows and outputted it to a csv file. And the rows affected we had to load from the backup, luckily we had the clean backup.

Ex: assume you have 1,2,3,4,5....100 rows and the corrupted is between 60-70. I outputted clean rows from 1-59 and 71-100 to a csv file and loaded in a new table. The corrupted was loaded back from a table. This just One of doing it. There might be more the experts here can answer very well. I am interested to see others answers as well. 

My way is time consuming and if you have a very large table or tables affected it's a nightmare to fix them. 

Good luck with your recovery.
Thanks
Deepak

On Jul 31, 2011, at 11:27 PM, Deniz Atak <denizatak@gmail.com> wrote:

Deepak, Tom thanks for answering.

Tom, we have psql 8.1.18. So you are right, this weird message is because of the old version. I will check with my colleague about the possible reasons. What can I do if there is a messed up table?

Regards,
Deniz

On Sat, Jul 30, 2011 at 11:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Deniz Atak <denizatak@gmail.com> writes:
> I am using postgresql on Glassfish server and I have EJB 3.0 for ORM. I am
> trying to run a query in PSQL but receiving following error:

> Local Exception Stack:
> Exception [EclipseLink-4002] (Eclipse Persistence Services -
> 2.0.0.v20091031-r5713): org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: org.postgresql.util.PSQLException: ERROR: could not read
> block 4707 of relation 1663/16384/16564: Success

What Postgres server version is that?

If it's 8.2 or older, this probably indicates a partial block at the end
of the file.  Newer versions produce a more sensible error message for
the case, but that's just cosmetic --- the real problem is a messed-up
table.  Have you had a filesystem corruption or an out-of-disk-space
condition on this machine?

                       regards, tom lane

Re: ERROR: could not read block 4707 of relation 1663/16384/16564: Success

From
"Tomas Vondra"
Date:
On 1 Srpen 2011, 8:27, Deniz Atak wrote:
> Deepak, Tom thanks for answering.
>
> Tom, we have psql 8.1.18. So you are right, this weird message is because
> of
> the old version. I will check with my colleague about the possible
> reasons.
> What can I do if there is a messed up table?

First of all, you should find out what caused the mess. This could be
really difficult as it might be a rare hw or sw glitch. Anyway consider
upgrading to 8.1.23 if possible.

If you have a fresh backup (i.e. one with all the data in the table), just
restore it and use it. You may even use just this particular table (just
move it using COPY).

If you need to recover the data, you'll have to play a bit with it as you
need to 'skip' all the corrupted blocks. The  query reports block 4707 is
corrupted - how many blocks does the relation have?

You can skip the blocks using 'ctid' column, which is basically "(block
id, item id)" so to skip block 4707 you can do this

SELECT * FROM table WHERE (ctid < '(4707,0)'::ctid OR ctid >=
'(4708,0)'::ctid)

and if fails with another "could not read block" error, put there another
such condition.

Tomas




Re: ERROR: could not read block 4707 of relation 1663/16384/16564: Success

From
Deniz Atak
Date:
Hi Thomas,

thanks for your answer. We decided not to go further with this error, because soon we will have another product that replaces this one. Because I want to learn more about this topic, I did the following:

Before I write your code, I tried:
select oid,ctid,relname from pg_class where ctid='(4707,0)';

but it resulted in:
oid | ctid | relname
-----+------+---------
(0 rows)

Isn't there suppose to be a ctid of '(4707,0)'? Or did I misunderstand what you have written?

Regards,
Deniz

On Mon, Aug 1, 2011 at 1:21 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
On 1 Srpen 2011, 8:27, Deniz Atak wrote:
> Deepak, Tom thanks for answering.
>
> Tom, we have psql 8.1.18. So you are right, this weird message is because
> of
> the old version. I will check with my colleague about the possible
> reasons.
> What can I do if there is a messed up table?

First of all, you should find out what caused the mess. This could be
really difficult as it might be a rare hw or sw glitch. Anyway consider
upgrading to 8.1.23 if possible.

If you have a fresh backup (i.e. one with all the data in the table), just
restore it and use it. You may even use just this particular table (just
move it using COPY).

If you need to recover the data, you'll have to play a bit with it as you
need to 'skip' all the corrupted blocks. The  query reports block 4707 is
corrupted - how many blocks does the relation have?

You can skip the blocks using 'ctid' column, which is basically "(block
id, item id)" so to skip block 4707 you can do this

SELECT * FROM table WHERE (ctid < '(4707,0)'::ctid OR ctid >=
'(4708,0)'::ctid)

and if fails with another "could not read block" error, put there another
such condition.

Tomas




Re: ERROR: could not read block 4707 of relation 1663/16384/16564: Success

From
Deniz Atak
Date:
Hi Deepak,

thanks for your answer. Do you have any opinion about how can I find the corrupted rows? Do you know how to read:
"could not read block 4707 of relation 1663/16384/16564"

?
Also, there is one interesting thing: a very similar query like this:

select src_ip,round(sum(size)/175) from table where date>'2011.07.29' and l_date<'2011.07.30' and src_ip='255.255.255.255' group by src_ip;

works fine. But this one doesn't:
select src_ip,round((select sum(t1.size) from table t1)) from table  where date>'2011.07.29' and date<'2011.07.30' and src_ip='255.255.255.255' 
group by src_ip;

Regards,
Deniz

On Mon, Aug 1, 2011 at 10:08 AM, D M <dm.aeqa@gmail.com> wrote:
I am not sure how big your table is one way we implemented here was we selected the clean rows and outputted it to a csv file. And the rows affected we had to load from the backup, luckily we had the clean backup.

Ex: assume you have 1,2,3,4,5....100 rows and the corrupted is between 60-70. I outputted clean rows from 1-59 and 71-100 to a csv file and loaded in a new table. The corrupted was loaded back from a table. This just One of doing it. There might be more the experts here can answer very well. I am interested to see others answers as well. 

My way is time consuming and if you have a very large table or tables affected it's a nightmare to fix them. 

Good luck with your recovery.
Thanks
Deepak

On Jul 31, 2011, at 11:27 PM, Deniz Atak <denizatak@gmail.com> wrote:

Deepak, Tom thanks for answering.

Tom, we have psql 8.1.18. So you are right, this weird message is because of the old version. I will check with my colleague about the possible reasons. What can I do if there is a messed up table?

Regards,
Deniz

On Sat, Jul 30, 2011 at 11:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Deniz Atak <denizatak@gmail.com> writes:
> I am using postgresql on Glassfish server and I have EJB 3.0 for ORM. I am
> trying to run a query in PSQL but receiving following error:

> Local Exception Stack:
> Exception [EclipseLink-4002] (Eclipse Persistence Services -
> 2.0.0.v20091031-r5713): org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: org.postgresql.util.PSQLException: ERROR: could not read
> block 4707 of relation 1663/16384/16564: Success

What Postgres server version is that?

If it's 8.2 or older, this probably indicates a partial block at the end
of the file.  Newer versions produce a more sensible error message for
the case, but that's just cosmetic --- the real problem is a messed-up
table.  Have you had a filesystem corruption or an out-of-disk-space
condition on this machine?

                       regards, tom lane


Re: ERROR: could not read block 4707 of relation 1663/16384/16564: Success

From
"Tomas Vondra"
Date:
On 1 Srpen 2011, 13:55, Deniz Atak wrote:
> Hi Thomas,
>
> thanks for your answer. We decided not to go further with this error,
> because soon we will have another product that replaces this one. Because
> I
> want to learn more about this topic, I did the following:
>
> Before I write your code, I tried:
> select oid,ctid,relname from pg_class where ctid='(4707,0)';

You need to select from the relation with relfileid 16564, not from
pg_class. Use this

SELECT relname FROM pg_class WHERE relfilenode = 16564;

and then select from this relation.

And if you want to see the items on block 4707, you can do something like
this:

SELECT * FROM relation WHERE ctid >= '(4707,0)' AND ctid < '(4708,0)';

although it'll probably fail. Maybe "pageinspect" contrib module will tell
you more about the page
(http://www.postgresql.org/docs/current/static/pageinspect.html).

Tomas


Re: ERROR: could not read block 4707 of relation 1663/16384/16564: Success

From
"Tomas Vondra"
Date:
On 1 Srpen 2011, 10:25, Deniz Atak wrote:
> Hi Deepak,
>
> thanks for your answer. Do you have any opinion about how can I find the
> corrupted rows? Do you know how to read:
>
> "could not read block 4707 of relation 1663/16384/16564"
>
> ?
> Also, there is one interesting thing: a very similar query like this:
>
> select src_ip,round(sum(size)/175) from table where date>'2011.07.29'
> and l_date<'2011.07.30' and src_ip='255.255.255.255' group by src_ip;
>
> works fine. But this one doesn't:
>
> select src_ip,round((select sum(t1.size) from table t1)) from table
> where date>'2011.07.29' and date<'2011.07.30' and
> src_ip='255.255.255.255'
>
> group by src_ip;

The first query probably does not access the corrupted block, while the
other one does (and fails). The second query does a full table scan on t1
(in the subselect), so the changes are this is the relation with corrupted
blocks. Or maybe it's the outer table and the queries use a different
execution plans.

We need to know which relation is 16564. Try this

SELECT relname FROM pg_class WHERE relfilenode = 16564 OR oid = 16564;

and it should give you the name of the relation.

Tomas


Deniz Atak <denizatak@gmail.com> writes:
> thanks for your answer. Do you have any opinion about how can I find the
> corrupted rows? Do you know how to read:

> "could not read block 4707 of relation 1663/16384/16564"

You should read the chapter about Database Physical Storage in the
manual to find out how to interpret that as a reference to a specific
block of an operating system file.

In this case, since we believe the problem is a partial block at end of
file, the file size presumably is more than 4707*8K and less than
4708*8K.  If I had to recover from this I would physically truncate the
file to exactly 4707*8K bytes, after saving a copy of the remaining
bytes to see if there's anything useful in there.  (Most likely there's
not --- in particular, if this is the after-effects of an
out-of-disk-space condition that prevented Postgres from filling up a
whole new block, then I'd expect the partial page to be filled with
zeroes.)

On Unix machines you could use dd for that, though I'd strongly
recommend practicing on a scratch file as it's not exactly user
friendly.  Dunno what to use on Windows.

            regards, tom lane