Thread: Tracking cluster upgrade and configuration history
Hello Hackers, While supporting customers, it would frequently be useful to have more information about the history of a cluster. For example,which prior versions were ever installed and running on the cluster? Has the cluster ever been run with fsync=off? When did the server last enter recovery, if ever? Was a backup_label file present at that time? Some of this type of information could strictly be fixed size, such as a fixed set of timestamps for the time at which afixed set of things last occurred, or a fixed set of bits indicating whether a fixed set of things ever happened. Some other types would be variable size, but hopefully short in practice, like a list of all postgres versions that haveever been run on the cluster. Logging the information via the usual log mechanism seems insufficient, as log files may get rotated and this informationlost. Would it be acceptable to store some fixed set of flag bits and timestamps in pg_control? Space there is at a premium. Would it make sense to alternately, or additionally, store some of this information in a flat text file in pg_data, say anew file named "cluster_history" or such? I'm happy to put together a more concrete proposal, but solicit your opinions first on the merits of the idea generally,and if you think the idea good, on the specifics you'd like to see included. Thanks! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 12, 2020 at 4:01 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > While supporting customers, it would frequently be useful to have more information about the history of a cluster. Forexample, which prior versions were ever installed and running on the cluster? Has the cluster ever been run with fsync=off? When did the server last enter recovery, if ever? Was a backup_label file present at that time? > +1 for the idea. The information will be useful at times for debugging purposes. > > Some of this type of information could strictly be fixed size, such as a fixed set of timestamps for the time at whicha fixed set of things last occurred, or a fixed set of bits indicating whether a fixed set of things ever happened. > > Some other types would be variable size, but hopefully short in practice, like a list of all postgres versions that haveever been run on the cluster. > > Logging the information via the usual log mechanism seems insufficient, as log files may get rotated and this informationlost. > True. Just a thought, can we use existing logging mechanism and APIs to write to a new file that never gets rotated by the syslogger(Of course, we need to think of the maximum file size that's allowed)? The idea is like this: we use elog/ereport and so on with a new debug level, when specified, instead of logging into the standard log files, we log it to the new file. > > Would it be acceptable to store some fixed set of flag bits and timestamps in pg_control? Space there is at a premium. > Since we allocate ControlFileData in shared memory and also we may have some data with timestamps, variable texts and so on, having this included in pg_control data structure would not seem a good idea to me. > > Would it make sense to alternately, or additionally, store some of this information in a flat text file in pg_data, saya new file named "cluster_history" or such? > IMHO, this is also a good idea. We need to think of the APIs to open/read/write/close that history file? How often and which processes and what type of data they write? Is it that the postmaster alone will write into that file? If multiple processes are allowed to write, how to deal with concurrent writers? Will users have to open manually and read that file? or Will we have some program similar to pg_controldata? Will we have some maximum limit to the size of this file? > > I'm happy to put together a more concrete proposal, but solicit your opinions first on the merits of the idea generally,and if you think the idea good, on the specifics you'd like to see included. > Welcome to know more about this idea. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
2020年11月16日(月) 15:48 Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>: > > On Thu, Nov 12, 2020 at 4:01 AM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: > > > > While supporting customers, it would frequently be useful to have more information about the history of a cluster. Forexample, which prior versions were ever installed and running on the cluster? Has the cluster ever been run with fsync=off? When did the server last enter recovery, if ever? Was a backup_label file present at that time? > > > > +1 for the idea. The information will be useful at times for debugging purposes. It's certainly something which would be nice to have. > > Would it make sense to alternately, or additionally, store some of this information in a flat text file in pg_data, saya new file named "cluster_history" or such? > > > > IMHO, this is also a good idea. We need to think of the APIs to > open/read/write/close that history file? How often and which processes > and what type of data they write? Is it that the postmaster alone will > write into that file? If multiple processes are allowed to write, how > to deal with concurrent writers? Will users have to open manually and > read that file? or Will we have some program similar to > pg_controldata? Will we have some maximum limit to the size of this > file? pg_stat_statements might be worth looking at as one way of handling that kind of file. However the problem with keeping a separate file which is not WAL-logged would mean it doesn't get propagated to standbys, and there's also the question of how it could be maintained across upgrades via pg_upgrade. FWIW I did once create a background worker extension [1] which logs configuration changes to a table, though it's not particularly maintained or recommended for production use. [1] https://github.com/ibarwick/config_log Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com
> On Nov 15, 2020, at 10:47 PM, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Nov 12, 2020 at 4:01 AM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: >> >> While supporting customers, it would frequently be useful to have more information about the history of a cluster. Forexample, which prior versions were ever installed and running on the cluster? Has the cluster ever been run with fsync=off? When did the server last enter recovery, if ever? Was a backup_label file present at that time? >> > > +1 for the idea. The information will be useful at times for debugging purposes. Thanks for the feedback. > >> >> Some of this type of information could strictly be fixed size, such as a fixed set of timestamps for the time at whicha fixed set of things last occurred, or a fixed set of bits indicating whether a fixed set of things ever happened. >> >> Some other types would be variable size, but hopefully short in practice, like a list of all postgres versions that haveever been run on the cluster. >> >> Logging the information via the usual log mechanism seems insufficient, as log files may get rotated and this informationlost. >> > > True. Just a thought, can we use existing logging mechanism and APIs > to write to a new file that never gets rotated by the syslogger(Of > course, we need to think of the maximum file size that's allowed)? The > idea is like this: we use elog/ereport and so on with a new debug > level, when specified, instead of logging into the standard log files, > we log it to the new file. That's going in a very different direction from what I had in mind. I was imagining something like a single binary or textfile of either fixed size or something very short but not fixed. The "very short but not fixed" part of that seems abit too hand-waving on reflection. Any variable length list, such as the list of postgres versions started on the cluster,could be made fixed length by only tracking the most recent N of them, perhaps with a flag bit to indicate if thelist has overflowed. Using elog/ereport with a new log level that gets directed into a different log file is an interesting idea, but it is notclear how to use elog/ereport in a principled way to write files that need never get too large. >> Would it be acceptable to store some fixed set of flag bits and timestamps in pg_control? Space there is at a premium. >> > > Since we allocate ControlFileData in shared memory and also we may > have some data with timestamps, variable texts and so on, having this > included in pg_control data structure would not seem a good idea to > me. Variable length texts seem completely out of scope for this. I would expect the data to be a collection of integer typesand flag bits. Fixed length text might also be possible, but I don't have any examples in mind of text that we'd wantto track. >> Would it make sense to alternately, or additionally, store some of this information in a flat text file in pg_data, saya new file named "cluster_history" or such? >> > > IMHO, this is also a good idea. We need to think of the APIs to > open/read/write/close that history file? How often and which processes > and what type of data they write? Is it that the postmaster alone will > write into that file? If multiple processes are allowed to write, how > to deal with concurrent writers? Will users have to open manually and > read that file? or Will we have some program similar to > pg_controldata? Will we have some maximum limit to the size of this > file? This depends in part on feedback about which information others on this list would like to see included, but I was imaginingsomething similar to how pg_control works, or using pg_control itself. The maximum size for pg_control is 512 bytes,and on my system sizeof(ControlFileData) = 296, which leaves 216 bytes free. I didn't check how much that might changeon systems with different alignments. We could either use some of the ~200 bytes currently available in pg_control,or use another file, "pg_history" or such, following the design pattern already used for pg_control. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Nov 15, 2020, at 11:23 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote: > > 2020年11月16日(月) 15:48 Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>: >> >> On Thu, Nov 12, 2020 at 4:01 AM Mark Dilger >> <mark.dilger@enterprisedb.com> wrote: >>> >>> While supporting customers, it would frequently be useful to have more information about the history of a cluster. Forexample, which prior versions were ever installed and running on the cluster? Has the cluster ever been run with fsync=off? When did the server last enter recovery, if ever? Was a backup_label file present at that time? >>> >> >> +1 for the idea. The information will be useful at times for debugging purposes. > > It's certainly something which would be nice to have. Thanks for the feedback. >>> Would it make sense to alternately, or additionally, store some of this information in a flat text file in pg_data, saya new file named "cluster_history" or such? >>> >> >> IMHO, this is also a good idea. We need to think of the APIs to >> open/read/write/close that history file? How often and which processes >> and what type of data they write? Is it that the postmaster alone will >> write into that file? If multiple processes are allowed to write, how >> to deal with concurrent writers? Will users have to open manually and >> read that file? or Will we have some program similar to >> pg_controldata? Will we have some maximum limit to the size of this >> file? > > pg_stat_statements might be worth looking at as one way of handling that kind > of file. > > However the problem with keeping a separate file which is not WAL-logged would > mean it doesn't get propagated to standbys, and there's also the question > of how it could be maintained across upgrades via pg_upgrade. Hmmm. I was not expecting the file to be propagated to standbys. The information could legitimately be different for aprimary and a standby. As a very simple example, there may be a flag bit for whether the cluster has operated as a standby. That does raise questions about what sort of information about a primary that a standby should track, in case theyget promoted to primary and information about the old primary would be useful for troubleshooting. Ideas welcome.... > > FWIW I did once create a background worker extension [1] which logs > configuration changes to a table, though it's not particularly maintained or > recommended for production use. I'm happy to change course if the consensus on the list favors using something larger, like log files or logging to a table,but for now I'm still thinking about this in terms of something smaller than that. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company