Re: Online checksums patch - once again - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Online checksums patch - once again |
Date | |
Msg-id | CA+TgmoaryNACeLnLpfhx==gYy1+59VhMtrWspcZGay9eDvYHMw@mail.gmail.com Whole thread Raw |
In response to | Re: Online checksums patch - once again (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: Online checksums patch - once again
Re: Online checksums patch - once again |
List | pgsql-hackers |
On Thu, Jan 23, 2020 at 6:19 AM Daniel Gustafsson <daniel@yesql.se> wrote: > A bigger question is how to handle the offline capabilities. pg_checksums can > enable or disable checksums in an offline cluster, which will put the cluster > in a state where the pg_control file and the catalog don't match at startup. > One strategy could be to always trust the pg_control file and alter the catalog > accordingly, but that still leaves a window of inconsistent cluster state. I suggest that we define things so that the catalog state is only meaningful during a state transition. That is, suppose the cluster state is either "on", "enabling", or "off". When it's "on", checksums are written and verified. When it is "off", checksums are not written and not verified. When it's "enabling", checksums are written but not verified. Also, when and only when the state is "enabling", the background workers that try to rewrite relations to add checksums run, and those workers look at the catalog state to figure out what to do. Once the state changes to "on", those workers don't run any more, and so the catalog state does not make any difference. A tricky problem is to handling the case where the state is switched from "enabling" to "on" and then back to "off" and then to "enabling" again. You don't want to confuse the state from the previous round of enabling with the state for the current round of enabling. Suppose in addition to storing the cluster-wide state of on/off/enabling, we also store an "enable counter" which is incremented every time the state goes from "off" to "enabling". Then, for each database and relation, we store a counter that indicates the value of the enable counter at the time we last scanned/rewrote that relation to set checksums. Now, you're covered. And, to save space, it can probably be a 32-bit counter, since 4 billion disable/reenable cycles ought to be enough for anybody. It would not be strictly necessary to store this in pg_class. Another thing that could be done is to store it in a separate system table that could even be truncated when enabling is not in progress - though it would be unwise to assume that it's always truncated at the beginning of an enabling cycle, since it would be hard to guarantee that the previous enabling cycle didn't fail when trying to truncate. So you'd probably still end up with something like the counter approach. I am inclined to think that inventing a whole new catalog for this is over-engineering, but someone might think differently. Note that creating a table while enabling is in progress needs to set the enabling counter for the new table to the new value of the enabling counter, not the old one, because the new table starts empty and won't end up with any pages that don't have valid checksums. Similarly, TRUNCATE, CLUSTER, VACUUM FULL, and rewriting variants of ALTER TABLE can set the new value for the enabling counter as a side effect. That's probably easier and more efficient if it's just value in pg_class than if they have to go poking around in another catalog. So I am tentatively inclined to think that just putting it in pg_class makes more sense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: