Thread: Point in time recovery: recreating relation files
The current WAL recovery implementation does not recover newly created objects such as tables. My suggested patch is: When XLogOpenRelation fails to open the relation file, if errno is ENOENT (no file or directory) we shuld attempt to recreate the file using smgrcreate. This seems to work fine for tables, indexes and sequences but can anyone see any potential problems? I have not tried this with Toast tables; are these handled any differently? Is it reasonable to assume that recreating the file in this way is safe? It seems OK to me as we only recreate the file if it does not already exist, so we are not in danger of making a bad situation worse. If no-one tells me this is a bad idea, I will submit a patch. -- Marc marc@bloodnok.com
Marc Munro <marc@bloodnok.com> writes: > The current WAL recovery implementation does not recover newly created > objects such as tables. My suggested patch is: > When XLogOpenRelation fails to open the relation file, if errno is > ENOENT (no file or directory) we shuld attempt to recreate the file > using smgrcreate. No, that's wrong. The missing ingredient is that the WAL log should explicitly log table creations. (And also table drops.) If you look you will find some comments showing the places where code is missing. If you try to do it as you suggest above, then you will erroneously recreate files that have been dropped. regards, tom lane
On Wed, 2002-02-27 at 19:44, Tom Lane wrote: > No, that's wrong. The missing ingredient is that the WAL log should > explicitly log table creations. (And also table drops.) If you look > you will find some comments showing the places where code is missing. > > If you try to do it as you suggest above, then you will erroneously > recreate files that have been dropped. OK, that makes sense. I will take another look. Thanks. -- Marc marc@bloodnok.com
> No, that's wrong. The missing ingredient is that the WAL log should > explicitly log table creations. (And also table drops.) If you look > you will find some comments showing the places where code is missing. I'm wondering where we could record the LSN when creating or dropping tables. > If you try to do it as you suggest above, then you will erroneously > recreate files that have been dropped. Yes, but I think we need to compare log's LSN and tables LSN. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> No, that's wrong. The missing ingredient is that the WAL log should >> explicitly log table creations. (And also table drops.) If you look >> you will find some comments showing the places where code is missing. > I'm wondering where we could record the LSN when creating or dropping > tables. Um, why would that matter? regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > >> No, that's wrong. The missing ingredient is that the WAL log should > >> explicitly log table creations. (And also table drops.) If you look > >> you will find some comments showing the places where code is missing. > > > I'm wondering where we could record the LSN when creating or dropping > > tables. > > Um, why would that matter? In my understanding to prevent redo-ing two or more times while in the recovery process, we need to compare LSN in the object against the LSN in the WAL log. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > I'm wondering where we could record the LSN when creating or dropping > tables. >> >> Um, why would that matter? > In my understanding to prevent redo-ing two or more times while in the > recovery process, we need to compare LSN in the object against the LSN > in the WAL log. But undo/redo checking on file creation or deletion is trivial: either the kernel has the file or it doesn't. We do not need any other check AFAICS. regards, tom lane
> > In my understanding to prevent redo-ing two or more times while in the > > recovery process, we need to compare LSN in the object against the LSN > > in the WAL log. > > But undo/redo checking on file creation or deletion is trivial: either > the kernel has the file or it doesn't. We do not need any other check > AFAICS. Are you saying that the table creation log record would contain a relfilenode? I'm not sure the relfilenode is same before and after the recovery if we consider the point time recovery. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> But undo/redo checking on file creation or deletion is trivial: either >> the kernel has the file or it doesn't. We do not need any other check >> AFAICS. > Are you saying that the table creation log record would contain a > relfilenode? Sure. What else would it contain? > I'm not sure the relfilenode is same before and after the > recovery if we consider the point time recovery. Considering that all the WAL entries concerning updates to the table will name it by relfilenode, we'd better be prepared to ensure that the relfilenode doesn't change over recovery. regards, tom lane
Could someone explain to this poor newbie (who is hoping to implement this) exactly what the issue is here? Like Tom, I could originally see no reason to worry about the LSN for file creation but I am very concerned that I have failed to grasp Tatsuo's concerns. Is there some reason why the relfilenode might change either during or as a result of recovery? Unless I have missed the point again, during recovery we must recreate files with exactly the same path, name and relfilenode as they would have originally been created, and in the same order relative to the creation of the relation. I see no scope for anything to be different. On Wed, 2002-03-06 at 21:29, Tom Lane wrote: > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > >> But undo/redo checking on file creation or deletion is trivial: either > >> the kernel has the file or it doesn't. We do not need any other check > >> AFAICS. > > > Are you saying that the table creation log record would contain a > > relfilenode? > > Sure. What else would it contain? > > > I'm not sure the relfilenode is same before and after the > > recovery if we consider the point time recovery. > > Considering that all the WAL entries concerning updates to the table > will name it by relfilenode, we'd better be prepared to ensure that > the relfilenode doesn't change over recovery. > > regards, tom lane -- Marc marc@bloodnok.com
> Could someone explain to this poor newbie (who is hoping to implement > this) exactly what the issue is here? Like Tom, I could originally see > no reason to worry about the LSN for file creation but I am very > concerned that I have failed to grasp Tatsuo's concerns. > > Is there some reason why the relfilenode might change either during or > as a result of recovery? Unless I have missed the point again, during > recovery we must recreate files with exactly the same path, name and > relfilenode as they would have originally been created, and in the same > order relative to the creation of the relation. I see no scope for > anything to be different. Sorry for the confusion. I'm not very familiar with other DBMSs, and I just don't know what kind of features for point in time recovery in them could provide. One a scenario I could imagine is recovering single table with different name. I'm not sure this is implemented by other DBMS though. BTW, next issue would be TRUCATE and CREATE/DROP DATABASE. I regard this is not currently supported by WAL. -- Tatsuo Ishii