Thread: try to find out the checkpoint record?
Currently we need to read pg_control to know the location(LSN) of the checkpoint record. This means if pg_control is lost or corrupted, we have to give up the database recovery. I think we could start from the first WAL segment and read through entire WAL logs to find out the latest valid checkpoint record. This may take considerable amount of time, but still better than giving up recovery IMO. Any reason we cannot do this? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Currently we need to read pg_control to know the location(LSN) of the > checkpoint record. This means if pg_control is lost or corrupted, we > have to give up the database recovery. I think we could start from the > first WAL segment and read through entire WAL logs to find out the > latest valid checkpoint record. This may take considerable amount of > time, but still better than giving up recovery IMO. Any reason we > cannot do this? Is it worth worrying about? I don't recall that we've ever heard of a loss-of-pg_control failure in the field. Certainly it *could* happen, but I can gin up plenty of implausible scenarios where scanning pg_xlog for a checkpoint would give the wrong answer as well. (Our habit of recycling xlog segments by renaming them makes us vulnerable to confusion over filenames, for example.) Since pg_control is deliberately kept to less than one disk block and is written only once per checkpoint, you'd have to be really unlucky to lose it anyway. Also, you can rebuild pg_control from scratch using pg_resetxlog, so loss of pg_control is not in itself worse than loss of the pg_xlog directory. My feeling is that pg_clog is by far the most fragile part of the logging mechanism at the moment: two very critical bits per transaction and essentially no error checking. If you want to improve reliability, think about how to make clog more robust. regards, tom lane
> Is it worth worrying about? I don't recall that we've ever heard of a > loss-of-pg_control failure in the field. Certainly it *could* happen, > but I can gin up plenty of implausible scenarios where scanning pg_xlog > for a checkpoint would give the wrong answer as well. (Our habit of > recycling xlog segments by renaming them makes us vulnerable to > confusion over filenames, for example.) Since pg_control is > deliberately kept to less than one disk block and is written only once > per checkpoint, you'd have to be really unlucky to lose it anyway. If my memory is corrent, some of my customers or member of local mailing list has reported that they could not start postmaster because it failed in the middle of the starting up process. And two or three said they lost pg_control for unknown reason. I'm not sure the trouble was only with pg_control or not though. Regarding the file name of xlog segments, I think we could read the XLogPageHeader and could easily determine which is the oldest and which one is the recyled one. > Also, you can rebuild pg_control from scratch using pg_resetxlog, > so loss of pg_control is not in itself worse than loss of the pg_xlog > directory. One annoying thing with pg_resetxlog is that we need to use -f option if it cannot read pg_control and have to give it haphazard OID. Moreover After using pg_resetxlog, we have to take a fulldump and reconsutruct the database. This will lead very long down time if the data is huge. So I don't think pg_resetxlog is the best solution in this case. > My feeling is that pg_clog is by far the most fragile part of the > logging mechanism at the moment: two very critical bits per transaction > and essentially no error checking. If you want to improve reliability, > think about how to make clog more robust. What's wrong with improving one of fragile parts of the system? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > What's wrong with improving one of fragile parts of the system? My opinion is that pg_control is the *least* fragile part of the logging datastructures. If we had infinite manpower, I'd say sure, go implement a fallback mechanism for pg_control. But we don't, and I honestly think your time would be better spent elsewhere. BTW, you'd not be able to automatically reconstruct pg_control from the contents of xlog anyway (no way to find out the locale settings, for instance). So it will still require some manual intervention. If you're really set on working on this, it might pay to think of it as an additional behavioral mode for pg_resetxlog, rather than something that's going to happen deep inside the backend. regards, tom lane