PITR, checkpoint, and local relations - Mailing list pgsql-hackers
From | J. R. Nield |
---|---|
Subject | PITR, checkpoint, and local relations |
Date | |
Msg-id | 1027493028.1227.47.camel@localhost.localdomain Whole thread Raw |
Responses |
Re: PITR, checkpoint, and local relations
|
List | pgsql-hackers |
As per earlier discussion, I'm working on the hot backup issues as part of the PITR support. While I was looking at the buffer manager and the relcache/MyDb issues to figure out the best way to work this, it occurred to me that PITR will introduce a big problem with the way we handle local relations. The basic problem is that local relations (rd_myxactonly == true) are not part of a checkpoint, so there is no way to get a lower bound on the starting LSN needed to recover a local relation. In the past this did not matter, because either the local file would be (effectively) discarded during recovery because it had not yet become visible, or the file would be flushed before the transaction creating it made it visible. Now this is a problem. So I need a decision from the core team on what to do about the local buffer manager. My preference would be to forget about the local buffer manager entirely, or if not that then to allow it only for _true_ temporary data. The only alternative I can devise is to create some way for all other backends to participate in a checkpoint, perhaps using a signal. I'm not sure this can be done safely. Anyway, I'm glad the tuplesort stuff doesn't try to use relation files :-) Can the core team let me know if this is acceptable, and whether I should move ahead with changes to the buffer manager (and some other stuff) needed to avoid special treatment of rd_myxactonly relations? Also to Richard: have you guys at multera dealt with this issue already? Is there some way around this that I'm missing? Regards, John Nield Just as an example of this problem, imagine the following sequence: 1) Transaction TX1 creates a local relation LR1 which will eventually become a globally visible table. Tuples are inserted into the local relation, and logged to the WAL file. Some tuples remain in the local buffer cache and are not yet written out, although they are logged. TX1 is still in progress. 2) Backup starts, and checkpoint is called to get a minimum starting LSN (MINLSN) for the backed-up files. Only the global buffers are flushed. 3) Backup process copies LR1 into the backup directory. (postulate some way of coordinating with the local buffer manager, a problem I have not solved). 4) TX1 commits and flushes its local buffers. A dirty buffer exists whose LSN is before MINLSN. LR1 becomes globally visible. 5) Backup finishes copying all the files, including the local relations, and then flushes the log. The log files between MINLSN and the current LSN are copied to the backup directory, and backup is done. 6) Sometime later, a system administrator restores the backup and plays the logs forward starting at MINLSN. LR1 will be corrupt, because some of the log entries required for its restoration will be before MINLSN. This corruption will not be detected until something goes wrong. BTW: The problem doesn't only happen with backup! It occurs at every checkpoint as well, I just missed it until I started working on the hot backup issue. -- J. R. Nield jrnield@usol.com
pgsql-hackers by date: