Thread: Handling glibc v2.28 breaking changes

Handling glibc v2.28 breaking changes

From
Pradeep Chhetri
Date:
Hello everyone,

I am sure this has been discussed multiple times in the past but I would like to initiate this discussion again. I have 3 nodes cluster of Postgres v9.6. They all are currently running on Debian 9 (with glibc v2.24) and need to upgrade them to Debian 10 (with glibc v2.28) without downtime. In order to bypass the glibc issue, I am trying to evaluate whether I can compile glibc v2.24 on Debian 10, pin postgres to use this manually compiled glibc and upgrade the linux distribution in rolling fashion. I would like to know how others have achieved such distro upgrades without downtime. I am new to Postgres so please pardon my ignorance. 

Thank you for your help.
Best regards,
Pradeep

Re: Handling glibc v2.28 breaking changes

From
Adrian Klaver
Date:
On 4/24/22 08:31, Pradeep Chhetri wrote:
> Hello everyone,
> 
> I am sure this has been discussed multiple times in the past but I would 
> like to initiate this discussion again. I have 3 nodes cluster of 
> Postgres v9.6. They all are currently running on Debian 9 (with glibc 
> v2.24) and need to upgrade them to Debian 10 (with glibc v2.28) without 
> downtime. In order to bypass the glibc issue, I am trying to evaluate 
> whether I can compile glibc v2.24 on Debian 10, pin postgres to use this 
> manually compiled glibc and upgrade the linux distribution in rolling 
> fashion. I would like to know how others have achieved such distro 
> upgrades without downtime. I am new to Postgres so please pardon my 
> ignorance.

You are going to have to be more specific as upgrading a distro involves 
downtime. I'm guessing you mean downtime for Postgres, still at least 
one of the instances is going to be down while it's OS is being 
upgraded. So:

1) Define how the 3 node cluster works?

2) What is the locale for the Postgres instances?

3) What is acceptable downtime in the process?

4) Are you using ICU collation?

Also you might want to look at:

https://wiki.postgresql.org/wiki/Locale_data_changes

> 
> Thank you for your help.
> Best regards,
> Pradeep


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Handling glibc v2.28 breaking changes

From
Pradeep Chhetri
Date:
Hi Adrian,

Thank you for your quick response.

By zero downtime, I meant at least one of the three nodes is up at any time to handle the writes and reads.

> Define how the 3 node cluster works?
These 3 nodes are configured as 1 primary, 1 sync replica and 1 async replica. These are managed via stolon.

> What is the locale for the Postgres instances?
We are using en_US.UTF-8 collation.

> What is acceptable downtime in the process?
We want to minimize as little as possible since these will be customer facing clusters.

> Are you using ICU collation?
As far as I know, ICU collation is supported from Postgres v10 but we are still running v9.6 so I guess that is not an option unless we upgrade our cluster first.

I am open to ways including changing architecture or upgrading cluster first or evaluating logical replication or any other option but our primary goal is to achieve it with minimal downtime.

Thank you for your help.
Best regards,
Pradeep


On Sun, Apr 24, 2022 at 11:43 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 4/24/22 08:31, Pradeep Chhetri wrote:
> Hello everyone,
>
> I am sure this has been discussed multiple times in the past but I would
> like to initiate this discussion again. I have 3 nodes cluster of
> Postgres v9.6. They all are currently running on Debian 9 (with glibc
> v2.24) and need to upgrade them to Debian 10 (with glibc v2.28) without
> downtime. In order to bypass the glibc issue, I am trying to evaluate
> whether I can compile glibc v2.24 on Debian 10, pin postgres to use this
> manually compiled glibc and upgrade the linux distribution in rolling
> fashion. I would like to know how others have achieved such distro
> upgrades without downtime. I am new to Postgres so please pardon my
> ignorance.

You are going to have to be more specific as upgrading a distro involves
downtime. I'm guessing you mean downtime for Postgres, still at least
one of the instances is going to be down while it's OS is being
upgraded. So:

1) Define how the 3 node cluster works?

2) What is the locale for the Postgres instances?

3) What is acceptable downtime in the process?

4) Are you using ICU collation?

Also you might want to look at:

https://wiki.postgresql.org/wiki/Locale_data_changes

>
> Thank you for your help.
> Best regards,
> Pradeep


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Handling glibc v2.28 breaking changes

From
Laurenz Albe
Date:
On Sun, 2022-04-24 at 23:31 +0800, Pradeep Chhetri wrote:
> I am sure this has been discussed multiple times in the past but I would like to initiate
> this discussion again. I have 3 nodes cluster of Postgres v9.6. They all are currently
> running on Debian 9 (with glibc v2.24) and need to upgrade them to Debian 10 (with glibc v2.28)
> without downtime. In order to bypass the glibc issue, I am trying to evaluate whether I can
> compile glibc v2.24 on Debian 10, pin postgres to use this manually compiled glibc and
> upgrade the linux distribution in rolling fashion.

Don't use an old glibc.

You will want to move to a different machine or upgrade the operating system, so you will
have some down time anyway.

You could consider upgrade in several steps:

- pg_upgrade to v14 on the current operating system
- use replication, than switchover to move to a current operating system on a different
  machine
- REINDEX CONCURRENTLY all indexes on string expressions

You could get data corruption and bad query results between the second and the third steps,
so keep that interval short.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Handling glibc v2.28 breaking changes

From
Nick Cleaton
Date:
On Mon, 25 Apr 2022 at 12:45, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

You could consider upgrade in several steps:

- pg_upgrade to v14 on the current operating system
- use replication, than switchover to move to a current operating system on a different
  machine
- REINDEX CONCURRENTLY all indexes on string expressions

You could get data corruption and bad query results between the second and the third steps,
so keep that interval short.

We did something like this, with the addition of a step where we used a new-OS replica to run amcheck's bt_index_check() over all of the btree indexes to find those actually corrupted by the libc upgrade in practice with our data. It was a small fraction of them, and we were able to fit an offline reindex of those btrees and all texty non-btree indexes into an acceptable downtime window, with REINDEX CONCURRENTLY of everything else as a lower priority after the upgrade.

Re: Handling glibc v2.28 breaking changes

From
Pradeep Chhetri
Date:
Thank you Laurenz and Nick. That sounds like a good plan to me.

Best Regards,
Pradeep

On Mon, Apr 25, 2022 at 9:44 PM Nick Cleaton <nick@cleaton.net> wrote:
On Mon, 25 Apr 2022 at 12:45, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

You could consider upgrade in several steps:

- pg_upgrade to v14 on the current operating system
- use replication, than switchover to move to a current operating system on a different
  machine
- REINDEX CONCURRENTLY all indexes on string expressions

You could get data corruption and bad query results between the second and the third steps,
so keep that interval short.

We did something like this, with the addition of a step where we used a new-OS replica to run amcheck's bt_index_check() over all of the btree indexes to find those actually corrupted by the libc upgrade in practice with our data. It was a small fraction of them, and we were able to fit an offline reindex of those btrees and all texty non-btree indexes into an acceptable downtime window, with REINDEX CONCURRENTLY of everything else as a lower priority after the upgrade.