PITR, checkpoint, and local relations - Mailing list pgsql-hackers

From J. R. Nield
Subject PITR, checkpoint, and local relations
Date
Msg-id 1027493028.1227.47.camel@localhost.localdomain
Whole thread Raw
Responses Re: PITR, checkpoint, and local relations
List pgsql-hackers
As per earlier discussion, I'm working on the hot backup issues as part
of the PITR support. While I was looking at the buffer manager and the
relcache/MyDb issues to figure out the best way to work this, it
occurred to me that PITR will introduce a big problem with the way we
handle local relations.

The basic problem is that local relations (rd_myxactonly == true) are
not part of a checkpoint, so there is no way to get a lower bound on the
starting LSN needed to recover a local relation. In the past this did
not matter, because either the local file would be (effectively)
discarded during recovery because it had not yet become visible, or the
file would be flushed before the transaction creating it made it
visible. Now this is a problem.

So I need a decision from the core team on what to do about the local
buffer manager. My preference would be to forget about the local buffer
manager entirely, or if not that then to allow it only for _true_
temporary data. The only alternative I can devise is to create some way
for all other backends to participate in a checkpoint, perhaps using a
signal. I'm not sure this can be done safely. 

Anyway, I'm glad the tuplesort stuff doesn't try to use relation files
:-)

Can the core team let me know if this is acceptable, and whether I
should move ahead with changes to the buffer manager (and some other
stuff) needed to avoid special treatment of rd_myxactonly relations?

Also to Richard: have you guys at multera dealt with this issue already?
Is there some way around this that I'm missing?


Regards,
 John Nield




Just as an example of this problem, imagine the following sequence:

1) Transaction TX1 creates a local relation LR1 which will eventually
become a globally visible table. Tuples are inserted into the local
relation, and logged to the WAL file. Some tuples remain in the local
buffer cache and are not yet written out, although they are logged. TX1
is still in progress.

2) Backup starts, and checkpoint is called to get a minimum starting LSN
(MINLSN) for the backed-up files. Only the global buffers are flushed.

3) Backup process copies LR1 into the backup directory. (postulate some
way of coordinating with the local buffer manager, a problem I have not
solved).

4) TX1 commits and flushes its local buffers. A dirty buffer exists
whose LSN is before MINLSN. LR1 becomes globally visible.

5) Backup finishes copying all the files, including the local relations,
and then flushes the log. The log files between MINLSN and the current
LSN are copied to the backup directory, and backup is done.

6) Sometime later, a system administrator restores the backup and plays
the logs forward starting at MINLSN. LR1 will be corrupt, because some
of the log entries required for its restoration will be before MINLSN.
This corruption will not be detected until something goes wrong.

BTW: The problem doesn't only happen with backup! It occurs at every
checkpoint as well, I just missed it until I started working on the hot
backup issue.

-- 
J. R. Nield
jrnield@usol.com





pgsql-hackers by date:

Previous
From: Marc Lavergne
Date:
Subject: CREATE SYNONYM suggestions
Next
From: Tom Lane
Date:
Subject: Re: partial index on system indexes?