Re: Why we really need timelines *now* in PITR - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Why we really need timelines *now* in PITR |
Date | |
Msg-id | 24170.1090262006@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Why we really need timelines *now* in PITR (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Why we really need timelines *now* in PITR
Re: Why we really need timelines *now* in PITR |
List | pgsql-hackers |
I wrote: > I think there's really no way around the issue: somehow we've got to > keep some meta-history outside the $PGDATA area, if we want to do this > in a clean fashion. After further thought I think we can fix this stuff by creating a "history file" for each timeline. This will make recovery slightly more complicated but I don't think it would be any material performance problem. Here's how it goes: * Timeline IDs are 32-bit ints with no particular semantic significance (that is, we do not assume timeline 3 is a child of 2, or anything like that). The actual parentage of a timeline has to be found by inspecting its history file. * History files will be named by their timeline ID, say "00000042.history". They will be created in /pg_xlog whenever a new timeline is created by the act of doing a recovery to a point in time earlier than the end of existing WAL. When doing WAL archiving a history file can be copied off to the archive area by the existing archiver mechanism (ie, we'll make a .ready file for it as soon as it's written). * History files will be plain text (for human consumption) and will essentially consist of a list of parent timeline IDs in sequence. I envision adding the timeline split timestamp and starting WAL segment number too, but these are for documentation purposes --- the system doesn't need them. We may as well allow comments in there as well, so that the DBA can annotate the reasons for a PITR split to have been done. So the contents might look like # Recover from unintentional TRUNCATE00000001 0000000A00142568 2005-05-16 12:34:15 EDT# Ex-assistant DBA dropped wrongtable00000007 0000002200005434 2005-11-17 18:44:44 EST When we split off a new timeline, we just have to copy the parent's history file (which we can do verbatim including comments) and then add a new line at the end showing the immediate parent's timeline ID and the other details of the split. Initdb can create 00000001.history with empty contents (since that timeline has no parents). * When we need to do recovery, we first identify the source timeline (either by reading the current timeline ID from pg_control, or the DBA can tell us with a parameter in recovery.conf). We then read the history file for that timeline, and remember its sequence of parent timeline IDs. We can crosscheck that pg_control's timeline ID is one of this set of timeline IDs, too --- if it's not then the wrong backup file was restored. * During recovery, whenever we need to open a WAL segment file, we first try to open it with the source timeline ID; if that doesn't exist, try the immediate parent timeline ID; then the grandparent, etc. Whenever we find a WAL file with a particular timeline ID, we forget about all parents further up in the history, and won't try to open their segments anymore (this is the generalization of my previous rule that you never drop down in timeline number as you scan forward). * If we end recovery because we have rolled forward off the end of WAL, we can just continue using the source timeline ID --- we are extending that timeline. (Thus, an ordinary crash and restart doesn't require generating a new timeline ID; nor do we generate a new line during normal postmaster stop/start.) But if we stop recovery at a requested point-in-time earlier than end of WAL, we have to branch off a new timeline. We do this by:* Selecting a previously unused timeline ID (see below).* Writing a history file for this ID, bycopying the parent timeline's history file and adding a new line at the end.* Copying the last-used WAL segment of theparent timeline, giving it the same segment number but the new timeline's ID. This becomes the active WAL segment whenwe start operating. * We can identify the highest timeline ID ever used by simply starting with the source timeline ID and probing pg_xlog and the archive area for history files N+1.history, N+2.history, etc until we find an ID for which there is no history file. Under reasonable scenarios this will not take very many probes, so it doesn't seem that we need any addition to the archiver API to make it more efficient. * Since history files will be small and made infrequently (one hopes you do not need to do a PITR recovery very often...) I see no particular reason not to leave them in /pg_xlog indefinitely. The DBA can clean out old ones if she is a neatnik, but I don't think the system needs to or should delete them. Similarly the archive area could be expected to retain history files indefinitely. * However, you *can* throw away a history file once you are no longer interested in rolling back to times predating the splitoff point of the timeline. If we don't find a history file we can just act as though the timeline has no parents (extends indefinitely far in the past). (Hm, so we don't actually have to bother creating 00000001.history...) * I'm intending to replace the current concept of StartUpID (SUI) by timeline IDs --- we'll record timeline IDs not SUIs in data page headers and WAL page headers. SUI isn't doing anything of value for us; I think it was probably intended to do what timelines will do, but it's not defined quite right for the purpose. One good thing about timeline IDs for WAL page headers is that we know exactly which IDs should be expected in a WAL file (either the current timeline or one of its parents); this allows a much tighter check than is possible with SUIs. Anybody see any holes in this design? regards, tom lane
pgsql-hackers by date: