Thread: Always bump PG_CONTROL_VERSION?
Hackers, I would like to propose bumping PG_CONTROL_VERSION with each release even if there are no changes to the ControlFileData struct. Historically PG_CONTROL_VERSION has only been bumped when there were changes to ControlFileData. pgBackRest uses PG_CONTROL_VERSION to identify the version of PostgreSQL when it is not running. If PG_CONTROL_VERSION does not change from a prior version then we also use CATALOG_VERSION_NO to uniquely identify the version. This works fine, but is pretty fragile during the alpha/beta releases when CATALOG_VERSION_NO is likely to change with each release. Of course, PG_CONTROL_VERSION might change as well but this seems to be extremely rare for an alpha/beta release. There are a few commits like eeca4cd3 and 99dd8b05a that would seem to argue that bumping PG_CONTROL_VERSION at least once for each release is a good idea in general. It doesn't seem too useful to be able to run pg_resetwal or pg_controldata against another version, in the few cases that it would actually work, e.g. 9.6/9.5. Thoughts? -- -David david@pgmasters.net
On 2021-May-12, David Steele wrote: > pgBackRest uses PG_CONTROL_VERSION to identify the version of PostgreSQL > when it is not running. If PG_CONTROL_VERSION does not change from a prior > version then we also use CATALOG_VERSION_NO to uniquely identify the > version. Why don't you use the PG_VERSION file in the datadir? -- Álvaro Herrera 39°49'30"S 73°17'W "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman)
Hi, On 2021-05-12 14:58:11 -0400, David Steele wrote: > pgBackRest uses PG_CONTROL_VERSION to identify the version of PostgreSQL > when it is not running. If PG_CONTROL_VERSION does not change from a prior > version then we also use CATALOG_VERSION_NO to uniquely identify the > version. Why aren't you using PG_VERSION? Greetings, Andres Freund
On 5/12/21 3:21 PM, Alvaro Herrera wrote: > On 2021-May-12, David Steele wrote: > >> pgBackRest uses PG_CONTROL_VERSION to identify the version of PostgreSQL >> when it is not running. If PG_CONTROL_VERSION does not change from a prior >> version then we also use CATALOG_VERSION_NO to uniquely identify the >> version. > > Why don't you use the PG_VERSION file in the datadir? Mostly because there is other data we need in pg_control and it is simpler to read one file than two. Regards, -- -David david@pgmasters.net
David Steele <david@pgmasters.net> writes: > On 5/12/21 3:21 PM, Alvaro Herrera wrote: >> Why don't you use the PG_VERSION file in the datadir? > Mostly because there is other data we need in pg_control and it is > simpler to read one file than two. I'm disinclined to change the longstanding rule in this area for a reason as weak as that. Even if we did change the rule going forward, you'd still need to do it properly for existing releases, so I don't see that you're going to save anything. regards, tom lane
Hi, On 2021-05-12 16:18:16 -0400, Tom Lane wrote: > Even if we did change the rule going forward, you'd still need to > do it properly for existing releases, so I don't see that you're > going to save anything. It turns out that the last time a major version didn't have a unique control file version was 9.5, I assume that's where David is coming from. That said, I don't think it's a good practice to use the control file version as an identifier for the major version. Who knows, it might be necessary to add an optional new format in a minor version at some point or such crazyness. And then there's the beta stuff you'd mentioned, etc. Greetings, Andres Freund
On 5/12/21 4:18 PM, Tom Lane wrote: > David Steele <david@pgmasters.net> writes: >> On 5/12/21 3:21 PM, Alvaro Herrera wrote: >>> Why don't you use the PG_VERSION file in the datadir? > >> Mostly because there is other data we need in pg_control and it is >> simpler to read one file than two. > > I'm disinclined to change the longstanding rule in this area for > a reason as weak as that. It's a bit more than that -- for instance we have pg_control in every backup but in order to get PG_VERSION we may need to reference a prior backup since it never changes. pg_control is also checked on every archive_command/restore_command so reading an extra file adds up when performance is paramount. There are also many unit tests that need to write this data, etc. In short, it would be very nice to have one place to get info about a cluster. > Even if we did change the rule going forward, you'd still need to > do it properly for existing releases, so I don't see that you're > going to save anything. It's not really a burden for existing releases. The issue is during the alpha/beta phase when the CATALOG_VERSION_NO can change several times in a few months. Perhaps it was unwise to frame this in the requirements for an external tool, but I still think eeca4cd3 and 99dd8b05a argue for it being a good idea. Or perhaps we could just add the version number to pg_control? At least then pg_controldata could display the version. Regards, -- -David david@pgmasters.net
On Wed, May 12, 2021 at 01:30:27PM -0700, Andres Freund wrote: > That said, I don't think it's a good practice to use the control file > version as an identifier for the major version. Who knows, it might be > necessary to add an optional new format in a minor version at some point > or such crazyness. And then there's the beta stuff you'd mentioned, etc. Yes, PG_VERSION, as you wrote upthread already, is already fine for the job, and FWIW, I have yet to see a case where being able to easily detect the minor version in a data folder matters. And, I am of the opinion to not change the control file version if there is no need to do so. -- Michael
Attachment
On Thu, May 13, 2021 at 11:04:54AM +0900, Michael Paquier wrote: > On Wed, May 12, 2021 at 01:30:27PM -0700, Andres Freund wrote: > > That said, I don't think it's a good practice to use the control file > > version as an identifier for the major version. Who knows, it might be > > necessary to add an optional new format in a minor version at some point > > or such crazyness. And then there's the beta stuff you'd mentioned, etc. > > Yes, PG_VERSION, as you wrote upthread already, is already fine for > the job, and FWIW, I have yet to see a case where being able to easily > detect the minor version in a data folder matters. Would it even make any sense? It could be "has version X.Y ever started the cluster", or "was it the last version that started the cluster", or something else? > And, I am of the opinion to not change the control file version if > there is no need to do so. +1
On 5/12/21 10:04 PM, Michael Paquier wrote: > On Wed, May 12, 2021 at 01:30:27PM -0700, Andres Freund wrote: >> That said, I don't think it's a good practice to use the control file >> version as an identifier for the major version. Who knows, it might be >> necessary to add an optional new format in a minor version at some point >> or such crazyness. And then there's the beta stuff you'd mentioned, etc. > > Yes, PG_VERSION, as you wrote upthread already, is already fine for > the job, and FWIW, I have yet to see a case where being able to easily > detect the minor version in a data folder matters. Regardless of PG_VERSION doing the job or not, shouldn't there be a bump in PG_CONTROL_VERSION whenever there is a structural or semantic change in the control file data? And wouldn't the easiest way to ensure that be to bump the version with every release? Also, can someone give me a good reason NOT to bump the version? Thanks, Jan -- Jan Wieck Postgres User since 1994
Jan Wieck <jan@wi3ck.info> writes: > Regardless of PG_VERSION doing the job or not, shouldn't there be a bump > in PG_CONTROL_VERSION whenever there is a structural or semantic change > in the control file data? And wouldn't the easiest way to ensure that be > to bump the version with every release? No, the way to do that is to change the version number in the commit that changes the file's contents. > Also, can someone give me a good reason NOT to bump the version? It creates unnecessary churn, not to mention a false sense of complacency. Bumping the version in the commit that changes things is not optional, because if you don't do that then you'll probably burn some other developer also working on HEAD. So I don't want people thinking they can skip this because it was done at the beginning of the development cycle. We've learned these things the hard way for CATVERSION. I think the only reason that PG_CONTROL version or WAL version might seem different is that we haven't changed them often enough for people to have fresh memories of problems. regards, tom lane
On 5/13/21 6:45 PM, Tom Lane wrote: > Bumping the version in the commit that changes > things is not optional, because if you don't do that then you'll > probably burn some other developer also working on HEAD. So > I don't want people thinking they can skip this because it was > done at the beginning of the development cycle. And we make sure this is done how? Regards, Jan -- Jan Wieck Postgres User since 1994
Jan Wieck <jan@wi3ck.info> writes: > On 5/13/21 6:45 PM, Tom Lane wrote: >> Bumping the version in the commit that changes >> things is not optional, because if you don't do that then you'll >> probably burn some other developer also working on HEAD. So >> I don't want people thinking they can skip this because it was >> done at the beginning of the development cycle. > And we make sure this is done how? Peer pressure? It's not that different from N other ways to do a patch incorrectly, of course. regards, tom lane
Hi, On 2021-05-13 17:42:52 -0400, Jan Wieck wrote: > Also, can someone give me a good reason NOT to bump the version? There's several types of tools (particularly around backup) that need to parse control files. Unnecessarily increasing the numbers of versions that need to be dealt with makes that a bit harder. Greetings, Andres Freund
On Mon, May 17, 2021 at 03:47:01PM -0700, Andres Freund wrote: > There's several types of tools (particularly around backup) that need to > parse control files. Unnecessarily increasing the numbers of versions > that need to be dealt with makes that a bit harder. I am digressing here, sorry for that.. But it is worth noting that it is fun to debug issues where a user changes the system locale and breaks some logic that parses the output of pg_controldata because of the translations of the text fields. I've really wondered over the years whether there should be more switches to pick up only the field values, for the popular ones like TLIs for example. pg_config does that. -- Michael