Re: Upgrading from Postgresql 9.3.8 to 9.6.10 - Mailing list pgsql-admin

From Yuri Niyazov
Subject Re: Upgrading from Postgresql 9.3.8 to 9.6.10
Date
Msg-id CACuBw0hxrOejqDNqhOhyNO9ps0_ROKKJYhG1w6weDNn++PkzBQ@mail.gmail.com
Whole thread Raw
In response to Re: Upgrading from Postgresql 9.3.8 to 9.6.10  (Keith <keith@keithf4.com>)
Responses Re: Upgrading from Postgresql 9.3.8 to 9.6.10  (Keith <keith@keithf4.com>)
List pgsql-admin
On Mon, Mar 25, 2019 at 11:44 PM Keith <keith@keithf4.com> wrote:
On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov <yuri@academia.edu> wrote:
We have an old cluster, running 9.3.8, that we are trying to upgrade to 9.6.10.

After running pg_upgrade and starting the server, and testing some common queries against it, we experienced the following error:

PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero page at block 17021871 HINT: Please REINDEX it.

So, if I am reading this correctly, there is an index that 9.3.8 created, and it recognizes that index as OK, but 9.6.10 thinks that index is corrupt. Since this happened on one index, it seems reasonable to assume that this could happen on any index in that database, and we need to reindex the entire database. 

Now, this is a live application, and we would like to minimize continuous downtime (multiple short downtimes are fine), so what we are considering doing is a manual reindex: for each index, create a new one, and then drop the old one, and after all that, upgrade.

However, we are leery of doing this reindexing using 9.3.8, since it's already demonstrated itself to be unreliable. 

Which version should we use to reindex? Just the latest 9.3.X? Was there a known bug with older versions missing corrupted pages in indices? 

PS:  Just in case I am glaringly doing something wrong:

Here's our mechanism of doing the upgrade: we create a streaming replica from a basebackup, and then at some point turn off writes to the primary, convert the replica to another primary, and then run pg_upgrade on this new primary. 


Have you done this more than once? If so, is it the same indexes every time you're having issues with?

Answering these questions will help you narrow down if the corruption is in your original database or something to do with the upgrade.

However, I would also highly recommend getting to the latest version of 9.3 and rebuilding your replicas afterwards. The latest patch release was 9.3.25, so your current version is quite far behind. There were quite a number of data corruption issues related to replication in the 9.3 series. More-so in the earlier patch versions, but I highly recommend also trying this before your major upgrade.

Keith

We tried it a second time, and were unable to reproduce the issue, so it seems like the issue is non-deterministic, which is a little scary. In any case, we then proceeded to upgrade to 9.3.25 as you recommended, and then upgraded to the 9.6 series from it, and the issue didn't reproduce; we've been running on the new upgraded version for thel last two weeks and it looks ok so far. 

Thanks for your help!

pgsql-admin by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Too many serialization errors in production
Next
From: Keith
Date:
Subject: Re: Upgrading from Postgresql 9.3.8 to 9.6.10