Thread: Point in time recovery: recreating relation files

Point in time recovery: recreating relation files

From
Marc Munro
Date:
The current WAL recovery implementation does not recover newly created
objects such as tables.  My suggested patch is:

When XLogOpenRelation fails to open the relation file, if errno is
ENOENT (no file or directory) we shuld attempt to recreate the file
using smgrcreate.

This seems to work fine for tables, indexes and sequences but can anyone
see any potential problems?  I have not tried this with Toast tables;
are these handled any differently?

Is it reasonable to assume that recreating the file in this way is
safe?  It seems OK to me as we only recreate the file if it does not
already exist, so we are not in danger of making a bad situation worse.

If no-one tells me this is a bad idea, I will submit a patch.

-- 
Marc        marc@bloodnok.com


Re: Point in time recovery: recreating relation files

From
Tom Lane
Date:
Marc Munro <marc@bloodnok.com> writes:
> The current WAL recovery implementation does not recover newly created
> objects such as tables.  My suggested patch is:

> When XLogOpenRelation fails to open the relation file, if errno is
> ENOENT (no file or directory) we shuld attempt to recreate the file
> using smgrcreate.

No, that's wrong.  The missing ingredient is that the WAL log should
explicitly log table creations.  (And also table drops.)  If you look
you will find some comments showing the places where code is missing.

If you try to do it as you suggest above, then you will erroneously
recreate files that have been dropped.
        regards, tom lane


Re: Point in time recovery: recreating relation files

From
Marc Munro
Date:
On Wed, 2002-02-27 at 19:44, Tom Lane wrote:
> No, that's wrong.  The missing ingredient is that the WAL log should
> explicitly log table creations.  (And also table drops.)  If you look
> you will find some comments showing the places where code is missing.
> 
> If you try to do it as you suggest above, then you will erroneously
> recreate files that have been dropped.

OK, that makes sense.  I will take another look.  Thanks.

-- 
Marc        marc@bloodnok.com


Re: Point in time recovery: recreating relation files

From
Tatsuo Ishii
Date:
> No, that's wrong.  The missing ingredient is that the WAL log should
> explicitly log table creations.  (And also table drops.)  If you look
> you will find some comments showing the places where code is missing.

I'm wondering where we could record the LSN when creating or dropping
tables.

> If you try to do it as you suggest above, then you will erroneously
> recreate files that have been dropped.

Yes, but I think we need to compare log's LSN and tables LSN.
--
Tatsuo Ishii


Re: Point in time recovery: recreating relation files

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> No, that's wrong.  The missing ingredient is that the WAL log should
>> explicitly log table creations.  (And also table drops.)  If you look
>> you will find some comments showing the places where code is missing.

> I'm wondering where we could record the LSN when creating or dropping
> tables.

Um, why would that matter?
        regards, tom lane


Re: Point in time recovery: recreating relation files

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> No, that's wrong.  The missing ingredient is that the WAL log should
> >> explicitly log table creations.  (And also table drops.)  If you look
> >> you will find some comments showing the places where code is missing.
> 
> > I'm wondering where we could record the LSN when creating or dropping
> > tables.
> 
> Um, why would that matter?

In my understanding to prevent redo-ing two or more times while in the
recovery process, we need to compare LSN in the object against the LSN
in the WAL log.
--
Tatsuo Ishii


Re: Point in time recovery: recreating relation files

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I'm wondering where we could record the LSN when creating or dropping
> tables.
>> 
>> Um, why would that matter?

> In my understanding to prevent redo-ing two or more times while in the
> recovery process, we need to compare LSN in the object against the LSN
> in the WAL log.

But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't.  We do not need any other check
AFAICS.
        regards, tom lane


Re: Point in time recovery: recreating relation files

From
Tatsuo Ishii
Date:
> > In my understanding to prevent redo-ing two or more times while in the
> > recovery process, we need to compare LSN in the object against the LSN
> > in the WAL log.
> 
> But undo/redo checking on file creation or deletion is trivial: either
> the kernel has the file or it doesn't.  We do not need any other check
> AFAICS.

Are you saying that the table creation log record would contain a
relfilenode? I'm not sure the relfilenode is same before and after the
recovery if we consider the point time recovery.
--
Tatsuo Ishii


Re: Point in time recovery: recreating relation files

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> But undo/redo checking on file creation or deletion is trivial: either
>> the kernel has the file or it doesn't.  We do not need any other check
>> AFAICS.

> Are you saying that the table creation log record would contain a
> relfilenode?

Sure.  What else would it contain?

> I'm not sure the relfilenode is same before and after the
> recovery if we consider the point time recovery.

Considering that all the WAL entries concerning updates to the table
will name it by relfilenode, we'd better be prepared to ensure that
the relfilenode doesn't change over recovery.
        regards, tom lane


Re: Point in time recovery: recreating relation files

From
Marc Munro
Date:
Could someone explain to this poor newbie (who is hoping to implement
this) exactly what the issue is here?  Like Tom, I could originally see
no reason to worry about the LSN for file creation but I am very
concerned that I have failed to grasp Tatsuo's concerns.

Is there some reason why the relfilenode might change either during or
as a result of recovery?  Unless I have missed the point again, during
recovery we must recreate files with exactly the same path, name and
relfilenode as they would have originally been created, and in the same
order relative to the creation of the relation.  I see no scope for
anything to be different.


On Wed, 2002-03-06 at 21:29, Tom Lane wrote:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> But undo/redo checking on file creation or deletion is trivial: either
> >> the kernel has the file or it doesn't.  We do not need any other check
> >> AFAICS.
> 
> > Are you saying that the table creation log record would contain a
> > relfilenode?
> 
> Sure.  What else would it contain?
> 
> > I'm not sure the relfilenode is same before and after the
> > recovery if we consider the point time recovery.
> 
> Considering that all the WAL entries concerning updates to the table
> will name it by relfilenode, we'd better be prepared to ensure that
> the relfilenode doesn't change over recovery.
> 
>             regards, tom lane
-- 
Marc        marc@bloodnok.com


Re: Point in time recovery: recreating relation files

From
Tatsuo Ishii
Date:
> Could someone explain to this poor newbie (who is hoping to implement
> this) exactly what the issue is here?  Like Tom, I could originally see
> no reason to worry about the LSN for file creation but I am very
> concerned that I have failed to grasp Tatsuo's concerns.
> 
> Is there some reason why the relfilenode might change either during or
> as a result of recovery?  Unless I have missed the point again, during
> recovery we must recreate files with exactly the same path, name and
> relfilenode as they would have originally been created, and in the same
> order relative to the creation of the relation.  I see no scope for
> anything to be different.

Sorry for the confusion. I'm not very familiar with other DBMSs, and I
just don't know what kind of features for point in time recovery in
them could provide. One a scenario I could imagine is recovering
single table with different name. I'm not sure this is implemented by
other DBMS though.

BTW, next issue would be TRUCATE and CREATE/DROP DATABASE.
I regard this is not currently supported by WAL.
--
Tatsuo Ishii