PITR Phase 2 - Design Planning - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | PITR Phase 2 - Design Planning |
Date | |
Msg-id | 1082993847.3999.93.camel@stromboli Whole thread Raw |
Responses |
Re: PITR Phase 2 - Design Planning
Re: PITR Phase 2 - Design Planning Re: PITR Phase 2 - Design Planning |
List | pgsql-hackers |
Since Phase1 is functioning and should hopefully soon complete, we can now start thinking about Phase 2: full recovery to a point-in-time. Previous thinking was that a command line switch would be used to specify recover to a given point in time, rather than the default, which will be recover all the way to end of (available) xlogs. Recovering to a specific point in time forces us to consider what the granularity is of time. We could recover: 1.to end of a full transaction log file 2.to end of a full transaction Transaction log files currently have timestamps, so that is straightforward, but probably not the best we can do. We would rollforward until the xlog file time > desired point in time. To make (2) work we would have to have a timestamp associated with each transaction. This could be in one of two places: 1. the transaction record in the clog 2. the log record in the xlog We would then recover the xlog record by record, until we found a record that had a timestamp > desired point-in-time. Currently, neither of these places have a timestamp. Hmmmm. We can't use pg_control because we are assuming that it needs recovery... I can't see any general way of adding a timestamp in any less than 2 bytes. We don't need a timezone. The timestamp could refer to a number of seconds since last checkpoint; since this is limited already by a GUC to force checkpoints every so often. Although code avoids a checkpoint if no updates have taken place, we wouldn't be too remiss to use a forced checkpoint every 32,000 seconds (9 hours). Assuming that accuracy of the point-in-time was of the order of seconds?? If we went to 0.1 second accuracy, we could checkpoint (force) every 40 minutes or so. All of that seems too restrictive. If we went to milliseconds, then we could use a 4 byte value and use a checkpoint (force) every 284 hours or 1.5 weeks. Thoughts? Clog uses 2 bits per transaction, so even 2 bytes extra per transaction will make the clog 9 times larger than originally intended. This could well cause it to segment quicker, but I'm sure no one would be happy with that. So, lets not add anything to the clog. The alternative is to make the last part of the XlogHeader record a timestamp value, increasing each xlog write. It might be possible to make this part of the header optional depending upon whether or not PITR was required, but then my preference is against such dynamic coding. So, I propose: - appending 8 byte date/time data into xlog file header record - appending 4 bytes of time offset onto each xlog record - altering the recovery logic to compare the calculated time of each xlog record (file header + offset) against the desired point-in-time, delivered to it by GUC. Input is sought from anybody with detailed NTP knowledge, since the working of NTP drift correction may have some subtle interplay with this proposal. Also, while that code is being altered, some additional log records need to be added when recovery of each new xlog starts, with timing, to allow DBAs watching a recovery to calculate expected completion times for the recovery, which is essential for long recovery situations. I am also considering any changes that may be required to prepare the way for a future implementation of parallel redo recovery. Best regards, Simon Riggs, 2ndQuadrant http://www.2ndquadrant.com
pgsql-hackers by date: