Thread: [GENERAL] error updating a tuple after promoting a standby

[GENERAL] error updating a tuple after promoting a standby

From
Tom DalPozzo
Date:
Hi,
I was doing some tests with backup, replication, standby. After promoting a standby server, I found my db in a condition that raises me an error while trying to update a particular tuple.
Below here you can se my UPDATE statment and the error raised.
The select * from stato where id=409; executed immediately after worked well however.
I checked the file and it's readable.
Before my standby promotion test I performed millions of this UPDATE statments without problem on my db.
I can not reproduce the issue.

Perhaps I did something wrong during my test but I don't know what. I didn't touch any file in base directory however.
Anyway I'd like to know if in your opinion it's possible that this error was caused by something wrong done by me or if it should never happen as the file is perfectly readable.

Regards
Pupillo




psql (9.5.4)
Type "help" for help.

ginopino=# UPDATE stato SET dati='\x5353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353' WHERE id=409;
ERROR:  could not read block 12281 in file "base/16384/29153": read only 0 of 8192 bytes
ginopino=# select * from stato where id=409;  <<< IT WORKS FINE

Re: [GENERAL] error updating a tuple after promoting a standby

From
Adrian Klaver
Date:
On 12/21/2016 01:51 AM, Tom DalPozzo wrote:
> Hi,
> I was doing some tests with backup, replication, standby. After
> promoting a standby server, I found my db in a condition that raises me
> an error while trying to update a particular tuple.
> Below here you can se my UPDATE statment and the error raised.
> The select * from stato where id=409; executed immediately after worked
> well however.
> I checked the file and it's readable.
> Before my standby promotion test I performed millions of this UPDATE
> statments without problem on my db.
> I can not reproduce the issue.
>
> Perhaps I did something wrong during my test but I don't know what. I
> didn't touch any file in base directory however.
> Anyway I'd like to know if in your opinion it's possible that this error
> was caused by something wrong done by me or if it should never happen as
> the file is perfectly readable.
>
> Regards
> Pupillo
>
>
>
>
> psql (9.5.4)
> Type "help" for help.
>
> ginopino=# UPDATE stato SET
>
dati='\x5353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353'
> WHERE id=409;
> ERROR:  could not read block 12281 in file "base/16384/29153": read only

First I would find what base/16384/29153 actually is. So in the database 
where stato is:

select relname from pg_class where relfilenode = 29153;

> 0 of 8192 bytes
> ginopino=# select * from stato where id=409;  <<< IT WORKS FINE

But does it have the updated info?

>


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] error updating a tuple after promoting a standby

From
Tom DalPozzo
Date:
Hi, 
 
First I would find what base/16384/29153 actually is. So in the database where stato is:

select relname from pg_class where relfilenode = 29153;

below here the query you suggested, showing that file belongs to stato table as expected.
ginopino=# select relname from pg_class where relfilenode = 29153;
 relname 
---------
 stato
(1 row)
 


0 of 8192 bytes
ginopino=# select * from stato where id=409;  <<< IT WORKS FINE

But does it have the updated info?
Yes,  it  correctly shows the expected data before issuing the update query which generates the problem:
 ginopino=# select * from stato where id=409;
  id  |  .......  dati         .......
-----+--------------  .......
 409 | \x4a735300db8f4b31ab8660f85192bc.................... 

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Thanks
Pupillo

Re: [GENERAL] error updating a tuple after promoting a standby

From
Adrian Klaver
Date:
On 12/21/2016 08:17 AM, Tom DalPozzo wrote:
> Hi,
>
>
>     First I would find what base/16384/29153 actually is. So in the
>     database where stato is:
>
>     select relname from pg_class where relfilenode = 29153;
>
>
> below here the query you suggested, showing that file belongs to stato
> table as expected.
> ginopino=# select relname from pg_class where relfilenode = 29153;
>  relname
> ---------
>  stato
> (1 row)

Is there an index on this table?

Have you tried a REINDEX on it?


In your original post you mention this error occurred while testing
backup/replication/standby promotion.

What was the procedure you followed in doing the testing?

>
>
>
>         0 of 8192 bytes
>         ginopino=# select * from stato where id=409;  <<< IT WORKS FINE
>
>
>     But does it have the updated info?
>
> Yes,  it  correctly shows the expected data before issuing the update
> query which generates the problem:
>  ginopino=# select * from stato where id=409;
>   id  |  .......  dati         .......
> -----+--------------  .......
>  409 | \x4a735300db8f4b31ab8660f85192bc....................
>
>
>         --
>
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
> Thanks
> Pupillo


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] error updating a tuple after promoting a standby

From
Tom DalPozzo
Date:
Is there an index on this table?

Have you tried a REINDEX on it?

yes there is an index on id field. I tried REINDEX. Nothing changes but I notice now (but perhaps it was like that even before reindexing) that every time I issue that UPDATE query, the number of the block it can't read increases by one.  Now, after some attempts: ERROR:  could not read block 12289 in file "base/16384/29153": read only 0 of 8192 bytes.
 
In your original post you mention this error occurred while testing backup/replication/standby promotion.

What was the procedure you followed in doing the testing?
Unfortunately I don't remember every step as I was focused on completely other things... Anyway, in synthesis:
1 pg_basebackup on primary and added, to the just created backup pg_xlog dir, the needed WAL files  according to the .label file (I'm trying without archiving) .
2 copied the backup dir to my two standby PCs (one is for sync streaming replication, the other async).
3 configured recovery.conf ecc... on the standby PCs.
4 started the two standby servers . The first was in sync replication, the second in async. Messages were OK.
5 Updated some thousands of rows in the primary just to check that it worked fine.
6 stopped the primary
7 promoted the 1st standby (new primary).
8 stopped/reconfigured/restarted the 2nd standby (async replication) to point to the 1st standby.
9 checked that all messages were ok in both active PCs.
10 tried to update on the new primary getting the error (perhaps after some successful updates but I'm not sure).

A new thing: 
I noticed that, always restarting from the corrupted cluster (without reindex I mean),  if I update the row id=409 with few data (3 bytes), then it works and after that, even updating with that long data works.

Regards
Pupillo





Re: [GENERAL] error updating a tuple after promoting a standby

From
Adrian Klaver
Date:
On 12/21/2016 10:06 AM, Tom DalPozzo wrote:
>         Is there an index on this table?
>
>
>     Have you tried a REINDEX on it?
>
> yes there is an index on id field. I tried REINDEX. Nothing changes but
> I notice now (but perhaps it was like that even before reindexing) that
> every time I issue that UPDATE query, the number of the block it can't
> read increases by one.  Now, after some attempts: ERROR:  could not read
> block 12289 in file "base/16384/29153": read only 0 of 8192 bytes.

> Unfortunately I don't remember every step as I was focused on completely
> other things... Anyway, in synthesis:
> 1 pg_basebackup on primary and added, to the just created backup pg_xlog
> dir, the needed WAL files  according to the .label file (I'm trying
> without archiving) .

If it where me I would use one of the -X methods:

https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html


> A new thing:
> I noticed that, always restarting from the corrupted cluster (without
> reindex I mean),  if I update the row id=409 with few data (3 bytes),
> then it works and after that, even updating with that long data works.

To me that looks like an issue with the associated TOAST table. I do not
have a suggestion at this time. Maybe this rings a bell with someone else.

>
> Regards
> Pupillo
>
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] error updating a tuple after promoting a standby

From
Tom DalPozzo
Date:


If it where me I would use one of the -X methods:

https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html



To me that looks like an issue with the associated TOAST table. I do not have a suggestion at this time. Maybe this rings a bell with someone else.

--
Adrian Klaver
adrian.klaver@aklaver.com

Good tip, x method, I will try it, sure. 
But I'm also afraid that there is a still hidden issue.
I think that I'm using postgres in a context slightly different from the usual ones (many many updates of a table rows with big field, replication,...) so I'm afraid that issues not already seen can appear.
Regards
Pupillo

Re: [GENERAL] error updating a tuple after promoting a standby

From
Brian Sutherland
Date:
Perhaps try 9.5.5 which has a fix for a problem with the same symptoms:

    https://wiki.postgresql.org/wiki/Free_Space_Map_Problems
    https://www.postgresql.org/docs/9.5/static/release-9-5-5.html#AEN126074

On Wed, Dec 21, 2016 at 10:51:47AM +0100, Tom DalPozzo wrote:
> Hi,
> I was doing some tests with backup, replication, standby. After promoting a
> standby server, I found my db in a condition that raises me an error while
> trying to update a particular tuple.
> Below here you can se my UPDATE statment and the error raised.
> The select * from stato where id=409; executed immediately after worked
> well however.
> I checked the file and it's readable.
> Before my standby promotion test I performed millions of this UPDATE
> statments without problem on my db.
> I can not reproduce the issue.
>
> Perhaps I did something wrong during my test but I don't know what. I
> didn't touch any file in base directory however.
> Anyway I'd like to know if in your opinion it's possible that this error
> was caused by something wrong done by me or if it should never happen as
> the file is perfectly readable.
>
> Regards
> Pupillo
>
>
>
>
> psql (9.5.4)
> Type "help" for help.
>
> ginopino=# UPDATE stato SET
>
dati='\x5353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353'
> WHERE id=409;
> ERROR:  could not read block 12281 in file "base/16384/29153": read only 0
> of 8192 bytes
> ginopino=# select * from stato where id=409;  <<< IT WORKS FINE

--
Brian Sutherland


Re: [GENERAL] error updating a tuple after promoting a standby

From
Tom DalPozzo
Date:


2016-12-22 10:23 GMT+01:00 Brian Sutherland <brian@vanguardistas.net>:
Perhaps try 9.5.5 which has a fix for a problem with the same symptoms:

    https://wiki.postgresql.org/wiki/Free_Space_Map_Problems
    https://www.postgresql.org/docs/9.5/static/release-9-5-5.html#AEN126074

Yes it was that! 
I tried the procedure in wiki and it worked.
Thank you very much!
Pupillo