Thread: pg_checksums?
Hi, I have a few questions about pg_checksums. Long story short, checksums were disabled in our environment but are getting enabled now (again!) because of block corruptionswhich destroyed a few databases in a database cluster. And before you say "told you so", the decision to disablechecksums was not mine. Some people just have to learn the hard way. Anyway, re-enabling data checksums creates a few questions: a) why isn't it possible to enable checksumming while a database cluster is up? b) why isn't it possible to check whether checksums are enabled or not? c) in a Patroni cluster consisting of a primary and a sync standby, is it safe to enable checksumming in the replica, thenswitch over and enable it in the ex-primary, i.e. now new replica, without any penalty? Or do I have to perform a reinitto really get them in sync again, though paronictl happily reports them to be in sync? d) how long does it take to enable checksums in general? Minimizing down-time is crucial. Does it depend on the databasecluster size, or the number of files it uses, or what can be taken as a criterion to estimate then necessary down-time. Thanks in advance for your insights. Cheers Paul
On 10/27/23 13:34, Paul Förster wrote: > Hi, > > I have a few questions about pg_checksums. > > Long story short, checksums were disabled in our environment but are getting enabled now (again!) because of block corruptionswhich destroyed a few databases in a database cluster. And before you say "told you so", the decision to disablechecksums was not mine. Some people just have to learn the hard way. > > Anyway, re-enabling data checksums creates a few questions: > > a) why isn't it possible to enable checksumming while a database cluster is up? Data might be changing. > b) why isn't it possible to check whether checksums are enabled or not? (This is my tiny test instance.) $ pg_controldata | grep checksum Data page checksum version: 0 postgres=# show data_checksums; data_checksums ---------------- off (1 row) > c) in a Patroni cluster consisting of a primary and a sync standby, is it safe to enable checksumming in the replica, thenswitch over and enable it in the ex-primary, i.e. now new replica, without any penalty? Or do I have to perform a reinitto really get them in sync again, though paronictl happily reports them to be in sync? > > d) how long does it take to enable checksums in general? Minimizing down-time is crucial. Does it depend on the databasecluster size, or the number of files it uses, or what can be taken as a criterion to estimate then necessary down-time. > > Thanks in advance for your insights. > > Cheers > Paul > -- Born in Arizona, moved to Babylonia.
Hi Ron, > On Oct 27, 2023, at 21:02, Ron <ronljohnsonjr@gmail.com> wrote: >> b) why isn't it possible to check whether checksums are enabled or not? > > (This is my tiny test instance.) > > $ pg_controldata | grep checksum > Data page checksum version: 0 > > postgres=# show data_checksums; > data_checksums > ---------------- > off > (1 row) this helps a lot. Thanks very much. Now, there are only two other questions. Cheers Paul
> On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote: > a) why isn't it possible to enable checksumming while a database cluster is up? It is surprisingly complicated to enable checksums on a live cluster, a patch was submitted a while back but ultimately never made it into postgres. The below threads may shine some light on the problem: https://www.postgresql.org/message-id/flat/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp%3D-7OJWBbcg%40mail.gmail.com https://www.postgresql.org/message-id/flat/560A2239-5DE2-4B9C-92BC-878C6822F47C%40yesql.se#10b665b2193445a3d7f24be7917a952c -- Daniel Gustafsson
On Fri, Oct 27, 2023 at 10:45:16PM +0200, Daniel Gustafsson wrote: > > On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote: > > > a) why isn't it possible to enable checksumming while a database cluster is up? > > It is surprisingly complicated to enable checksums on a live cluster, a patch > was submitted a while back but ultimately never made it into postgres. The > below threads may shine some light on the problem: > > https://www.postgresql.org/message-id/flat/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp%3D-7OJWBbcg%40mail.gmail.com > https://www.postgresql.org/message-id/flat/560A2239-5DE2-4B9C-92BC-878C6822F47C%40yesql.se#10b665b2193445a3d7f24be7917a952c Yeah, it was a big surprise that this feature was so hard to implement because we have _no_ infrastructure for having multiple data layouts active in a live system. The discussion eventually made that clear. If we have more features that need this kind of dynamic ability, we might revisit this feature too. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Hi Bruce, hi Daniel, > On Oct 27, 2023, at 23:21, Bruce Momjian <bruce@momjian.us> wrote: > > On Fri, Oct 27, 2023 at 10:45:16PM +0200, Daniel Gustafsson wrote: >>> On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote: >> >>> a) why isn't it possible to enable checksumming while a database cluster is up? >> >> It is surprisingly complicated to enable checksums on a live cluster, a patch >> was submitted a while back but ultimately never made it into postgres. The >> below threads may shine some light on the problem: >> >> https://www.postgresql.org/message-id/flat/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp%3D-7OJWBbcg%40mail.gmail.com >> https://www.postgresql.org/message-id/flat/560A2239-5DE2-4B9C-92BC-878C6822F47C%40yesql.se#10b665b2193445a3d7f24be7917a952c > > Yeah, it was a big surprise that this feature was so hard to implement > because we have _no_ infrastructure for having multiple data layouts > active in a live system. The discussion eventually made that clear. > > If we have more features that need this kind of dynamic ability, we > might revisit this feature too. Ok, I see. But unfortunately still, my questions c) and d) are unanswered. I'd especially be interested in an answer to c), i.e. isit *safe* to "pg_checksum -e" the replica instance in a patroni cluster, switch over, and then do the other one? Cheers Paul
On 2023-10-27 23:37:24 +0200, Paul Förster wrote: > But unfortunately still, my questions c) and d) are unanswered. I'd > especially be interested in an answer to c), i.e. is it *safe* to > "pg_checksum -e" the replica instance in a patroni cluster, switch > over, and then do the other one? I don't think so. AFAIK Replication keeps the data files in sync on a bit-for-bit level and turning on checksums changes the data layout. Running a cluster where one node has checksums and the other doesn't would result in a complete mess. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Hi Peter > On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > I don't think so. AFAIK Replication keeps the data files in sync on a > bit-for-bit level and turning on checksums changes the data layout. > Running a cluster where one node has checksums and the other doesn't > would result in a complete mess. I agree with the last sentence. This is why I asked if it is safe to enable checksums on a replica, switch over and thendo it again on the ex primary, i.e. now new replica without doing a reinit. Cheers Paul
On 2023-10-29 10:11:07 +0100, Paul Förster wrote: > On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > I don't think so. AFAIK Replication keeps the data files in sync on a > > bit-for-bit level and turning on checksums changes the data layout. > > Running a cluster where one node has checksums and the other doesn't > > would result in a complete mess. > > I agree with the last sentence. This is why I asked if it is safe to > enable checksums on a replica, switch over and then do it again on the > ex primary, i.e. now new replica without doing a reinit. It *might* work if there are zero writes on the primary during the downtime of the replica (because those writes couldn't be replicated), but that seems hard to ensure. Even if you could get away with making the primary read-only (is this even possible?) I wouldn't have much confidence in the result and reinit the (new) replica anyway. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Hi Peter, > On Oct 29, 2023, at 11:49, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > It *might* work if there are zero writes on the primary during the > downtime of the replica (because those writes couldn't be replicated), > but that seems hard to ensure. Even if you could get away with making > the primary read-only (is this even possible?) I wouldn't have much > confidence in the result and reinit the (new) replica anyway. As soon as I stop the replica to enable checksums, even writes can't get replicated anymore. So during enabling checksums,a replica is definitely protected against modifications by its primary, simply because it's down. The modificationsof the primary are applied to the replica when it comes back online. So, I don't see a problem at this particularstage. My fear is merely that enabling checksums does something to the physical state of the data files which are not compatiblewith the other side. Like for example manipulate the file headers in some way. Maybe this question is better suited for the admin list than this general list? Cheers Paul
On 10/29/23 04:11, Paul Förster wrote: > Hi Peter > >> On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: >> I don't think so. AFAIK Replication keeps the data files in sync on a >> bit-for-bit level and turning on checksums changes the data layout. >> Running a cluster where one node has checksums and the other doesn't >> would result in a complete mess. > I agree with the last sentence. This is why I asked if it is safe to enable checksums on a replica, switch over and thendo it again on the ex primary, i.e. now new replica without doing a reinit. For that to work, the secondary files would have to remain identical to the primary files. Theoretically that _should_ happen, but it might not, or whatever command that enables checksums after the fact might have a sanity check. As for safety, what do you mean by "safe"? -- Born in Arizona, moved to Babylonia.
On 2023-10-29 16:15:37 +0100, Paul Förster wrote: > On Oct 29, 2023, at 11:49, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > It *might* work if there are zero writes on the primary during the > > downtime of the replica (because those writes couldn't be replicated), > > but that seems hard to ensure. Even if you could get away with making > > the primary read-only (is this even possible?) I wouldn't have much > > confidence in the result and reinit the (new) replica anyway. > > As soon as I stop the replica to enable checksums, even writes can't > get replicated anymore. So during enabling checksums, a replica is > definitely protected against modifications by its primary, simply > because it's down. The modifications of the primary are applied to the > replica when it comes back online. And this is where it would break down. The modifications can't be applied to the replica any more because the replica now contains checksums and the modifications don't. In the best case the replica would catch the discrepancy and refuse to apply the modifications which would lead to the loss of these modifications. In the worst case it would apply them anyway causing severe data corruption. > So, I don't see a problem at this particular stage. > > My fear is merely that enabling checksums does something to the > physical state of the data files which are not compatible with the > other side. Exactly. Those checksums have to be stored somewhere. > Like for example manipulate the file headers in some way. Not just the file headers. Every single data block. (Ok, it looks like the space for the checksum is reserved even if checksums aren't enabled[1]. So at least pg_checksums doesn't have to move data around to enable them. But overwriting a page with a checksum with one without one would still be bad.) hp [1] https://www.postgresql.org/docs/current/storage-page-layout.html#PAGEHEADERDATA-TABLE -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Hi Ron, > On Oct 29, 2023, at 16:37, Ron <ronljohnsonjr@gmail.com> wrote: > > As for safety, what do you mean by "safe"? Safe in the sense that, if I enable checksums on a replica, switch over and the enable checksums on the other side, if thisis ok, or whether future mutations on the primary will corrupt the replica. That's why I asked if I need to perform a patronictl reinit. Cheers Paul
Hi Ron, > On Oct 29, 2023, at 16:38, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > And this is where it would break down. The modifications can't be > applied to the replica any more because the replica now contains > checksums and the modifications don't. In the best case the replica > would catch the discrepancy and refuse to apply the modifications which > would lead to the loss of these modifications. In the worst case it > would apply them anyway causing severe data corruption. ... > Not just the file headers. Every single data block. > > (Ok, it looks like the space for the checksum is reserved even if > checksums aren't enabled[1]. So at least pg_checksums doesn't have to > move data around to enable them. But overwriting a page with a checksum > with one without one would still be bad.) Those are the kind of answers and insights I was looking for. Thank you very much. Ok, I will do a reinit then. Cheers Paul
On 10/29/23 12:57, Paul Förster wrote: > Hi Ron, > >> On Oct 29, 2023, at 16:37, Ron <ronljohnsonjr@gmail.com> wrote: >> >> As for safety, what do you mean by "safe"? > Safe in the sense that, if I enable checksums on a replica, switch over and the enable checksums on the other side, ifthis is ok, or whether future mutations on the primary will corrupt the replica. Trying it would tell you something. > That's why I asked if I need to perform a patronictl reinit. Best to ask Percona. -- Born in Arizona, moved to Babylonia.
On Sun, Oct 29, 2023 at 11:49:11AM +0100, Peter J. Holzer wrote: > On 2023-10-29 10:11:07 +0100, Paul Förster wrote: >> On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: >>> I don't think so. AFAIK Replication keeps the data files in sync on a >>> bit-for-bit level and turning on checksums changes the data layout. >>> Running a cluster where one node has checksums and the other doesn't >>> would result in a complete mess. >> >> I agree with the last sentence. This is why I asked if it is safe to >> enable checksums on a replica, switch over and then do it again on the >> ex primary, i.e. now new replica without doing a reinit. > > It *might* work if there are zero writes on the primary during the > downtime of the replica (because those writes couldn't be replicated), > but that seems hard to ensure. Even if you could get away with making > the primary read-only (is this even possible?) I wouldn't have much > confidence in the result and reinit the (new) replica anyway. Hm? Page checksums are written when a page is flushed to disk, we don't set them for dirty buffers or full-page writes included in WAL, so it should be OK to do something like the following: - Stop cleanly a standby. - Run pg_checksums on the standby to enable them. - Restart the standby. - Catchup with the latest changes - Stop cleanly the primary, letting the shutdown checkpoint be replicated to the standby. - Promote the standby. - Enable checksums on the previous primary. - Start the previous primary to be a standby of the node you failed over to. -- Michael
Attachment
On 2023-10-30 09:56:31 +0900, Michael Paquier wrote: > Hm? Page checksums are written when a page is flushed to disk, we > don't set them for dirty buffers or full-page writes included in WAL, > so it should be OK to do something like the following: > - Stop cleanly a standby. > - Run pg_checksums on the standby to enable them. > - Restart the standby. > - Catchup with the latest changes > - Stop cleanly the primary, letting the shutdown checkpoint be > replicated to the standby. > - Promote the standby. > - Enable checksums on the previous primary. > - Start the previous primary to be a standby of the node you failed > over to. I stand corrected. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2023-10-29 13:26:27 -0500, Ron wrote: > On 10/29/23 12:57, Paul Förster wrote: > > Safe in the sense that, if I enable checksums on a replica, switch > > over and the enable checksums on the other side, if this is ok, or > > whether future mutations on the primary will corrupt the replica. > > Trying it would tell you something. > > > That's why I asked if I need to perform a patronictl reinit. > > Best to ask Percona. Why Percona? hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Hi Michael, > On Oct 30, 2023, at 01:56, Michael Paquier <michael@paquier.xyz> wrote: > > Hm? Page checksums are written when a page is flushed to disk, we > don't set them for dirty buffers or full-page writes included in WAL, > so it should be OK to do something like the following: > - Stop cleanly a standby. > - Run pg_checksums on the standby to enable them. > - Restart the standby. > - Catchup with the latest changes > - Stop cleanly the primary, letting the shutdown checkpoint be > replicated to the standby. > - Promote the standby. > - Enable checksums on the previous primary. > - Start the previous primary to be a standby of the node you failed > over to. That's exactly the reasoning behind my initial idea and question. Patroni does the switchover job for me including catchingup on the latest changes, etc. Seems that opinions vary. Are there any hard facts? It turns out that enabling checksums can take quite some time to complete, i.e. downtime for the application which is hardto do in a 24x7 environment. Cheers Paul
Hi Peter, > On Oct 30, 2023, at 11:03, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > On 2023-10-29 13:26:27 -0500, Ron wrote: >> Best to ask Percona. > > Why Percona? Probably a typo. Patroni is used. Cheers Paul
On 10/30/23 08:18, Paul Förster wrote: > Hi Peter, > >> On Oct 30, 2023, at 11:03, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: >> On 2023-10-29 13:26:27 -0500, Ron wrote: >>> Best to ask Percona. >> Why Percona? > Probably a typo. Patroni is used. Erroneously thinking that Percona develops Patroni. :D -- Born in Arizona, moved to Babylonia.
On Mon, 30 Oct 2023 at 14:46, Ron <ronljohnsonjr@gmail.com> wrote:
Erroneously thinking that Percona develops Patroni. :D
IIRC, they may have made one or two contributions, but very minor.
But anyway, Patroni is orthogonal to pg_checksums.
As Michael already said, the following workflow works just fine (I did it dozens of times):
1. enable checksums on the standby node
2. start the standby and let it catch up with the primary
3. switchover to a standby node
4. enable checksums on the former primary (now replica).
Regards,
--
Alexander Kukushkin
Hi Alexander, > On Oct 30, 2023, at 14:56, Alexander Kukushkin <cyberdemn@gmail.com> wrote: ... > But anyway, Patroni is orthogonal to pg_checksums. ... Just to be sure I understand you correctly: This does not work with Patroni? Cheers Paul
On Mon, 30 Oct 2023, 19:34 Paul Förster, <paul.foerster@gmail.com> wrote:
Just to be sure I understand you correctly: This does not work with Patroni?
That's not what I said.
Patroni only manages Postgres. It is exactly the same Postgres as you would run it without Patroni.
Everything will work.
Regards,
--
Alexander Kukushkin
Hi Michael,
> On Oct 30, 2023, at 01:56, Michael Paquier <michael@paquier.xyz> wrote:
>
> > - Enable checksums on the previous primary.
> - Start the previous primary to be a standby of the node you failed
> over to.
That's exactly the reasoning behind my initial idea and question. Patroni does the switchover job for me including catching up on the latest changes, etc.
Seems that opinions vary. Are there any hard facts?The best hard facts are those generated in your environment.
It turns out that enabling checksums can take quite some time to complete, i.e. downtime for the application which is hard to do in a 24x7 environment.
Yes. Try it first with a smaller sample.
Cheers
Paul
Hi Alexander, > On Oct 30, 2023, at 19:49, Alexander Kukushkin <cyberdemn@gmail.com> wrote: > That's not what I said. That's why I asked. Because you used the word orthogonal. 🤣 > Patroni only manages Postgres. It is exactly the same Postgres as you would run it without Patroni. > Everything will work. Now that is, what I suspected because of what I have learned how Patroni (and PostgreSQL replication) works so far. Thanks very much. Cheers Paul
On Mon, Oct 30, 2023 at 6:57 AM Alexander Kukushkin <cyberdemn@gmail.com> wrote: ... > As Michael already said, the following workflow works just fine (I did it dozens of times): > 1. enable checksums on the standby node > 2. start the standby and let it catch up with the primary > 3. switchover to a standby node > 4. enable checksums on the former primary (now replica). There is also a good trick described in https://www.crunchydata.com/blog/fun-with-pg_checksums to avoid accidental start of Postgres: after pg_ctl stop and before pg_checksums --enable, do: mv data/pg_twophase data/pg_twophase.DO_NOT_START_THIS_DATABASE and once pg_checksums --enable is done, move it back. Additionally, I compiled some thoughts about running pg_checksums without downtime (Patroni-friendly, of course) here: https://twitter.com/samokhvalov/status/1719961485160689993.
Hi Nikolay, > On Nov 2, 2023, at 07:36, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > There is also a good trick described in > https://www.crunchydata.com/blog/fun-with-pg_checksums to avoid > accidental start of Postgres: [...] > Additionally, I compiled some thoughts about running pg_checksums > without downtime (Patroni-friendly, of course) here: > https://twitter.com/samokhvalov/status/1719961485160689993. These two links are very interesting. Thanks very much. Cheers Paul