Thread: PITR Redo Create Database fails
I've discovered that CREATE DATABASE doesn't redo correctly in an archive recovery test. This isn't a bug --in the current code--, because when crash recovery occurs, the database directories are already there, so this only doesn't work when using the PITR patches. During archive recovery, nothing is there, so needs to be created. It looks like CREATE DATABASE doesn't produce redo, nor is there a replay command created for it. That's not a big deal - maybe - because creating a directory for that database fixes it, with no further problems. Could somebody that knows about tablespaces etc, look into this and suggest a solution/patch? I need time to complete the PITR features... The failing code is copied below, from src/backend/storagesmgr/md.c ---------------------------------------------- static MdfdVec * mdopen(SMgrRelation reln, bool allowNotFound) {MdfdVec *mdfd;char *path;File fd; /* No work if already open */if (reln->md_fd) return reln->md_fd; path = relpath(reln->smgr_rnode); fd = FileNameOpenFile(path, O_RDWR | PG_BINARY, 0600); if (fd < 0){ /* ----------------------------------------------- The FileNameOpenFile fails when the first relation in the database is created. The code assumes that any failure of the FileNameOpenFile is because the file is already there, then tries to open it which also fails. The failure is caused by the fact that there is no directory (as well as no file), but that isn't tested for. It would be easy to create the directory at this point IF that didn't conflict with tablespaces etc.. Please look into this if possible, Thanks, Best Regards, Simon Riggs
On Thu, Jul 08, 2004 at 02:58:01PM +0100, Simon Riggs wrote: > I've discovered that CREATE DATABASE doesn't redo correctly in an > archive recovery test. > > This isn't a bug --in the current code--, because when crash recovery > occurs, the database directories are already there, so this only doesn't > work when using the PITR patches. During archive recovery, nothing is > there, so needs to be created. > > It looks like CREATE DATABASE doesn't produce redo, nor is there a > replay command created for it. [...] > The FileNameOpenFile fails when the first relation in the database is > created. The code assumes that any failure of the FileNameOpenFile is > because the file is already there, then tries to open it which also > fails. The failure is caused by the fact that there is no directory (as > well as no file), but that isn't tested for. I don't think it's a good idea to just create a directory if it's not already there. It would mean creating a spurious directory with an empty file if the data is corrupted and a wrong RelFileNode is in memory for whatever reason. The correct solution would be to emit a XLog record for CREATE DATABASE ... -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "El destino baraja y nosotros jugamos" (A. Schopenhauer)
On Thu, 2004-07-08 at 16:19, Alvaro Herrera wrote: > On Thu, Jul 08, 2004 at 02:58:01PM +0100, Simon Riggs wrote: > > > I've discovered that CREATE DATABASE doesn't redo correctly in an > > archive recovery test. > > > > This isn't a bug --in the current code--, because when crash recovery > > occurs, the database directories are already there, so this only doesn't > > work when using the PITR patches. During archive recovery, nothing is > > there, so needs to be created. > > > > It looks like CREATE DATABASE doesn't produce redo, nor is there a > > replay command created for it. > > [...] > > > The FileNameOpenFile fails when the first relation in the database is > > created. The code assumes that any failure of the FileNameOpenFile is > > because the file is already there, then tries to open it which also > > fails. The failure is caused by the fact that there is no directory (as > > well as no file), but that isn't tested for. > > I don't think it's a good idea to just create a directory if it's not > already there. It would mean creating a spurious directory with an > empty file if the data is corrupted and a wrong RelFileNode is in memory > for whatever reason. > > The correct solution would be to emit a XLog record for CREATE > DATABASE ... I'd prefer a formal approach, hence why I raised this. Interesting what you say about about the other stuff - a simple create dir worked without any problems - I wonder if a whole bunch of stuff is missing there? It does seem likely that other rm_redo functions for other resource managers contain similar bugs when used with Archive Recovery. Hopefully, these will emerge during beta... ...its too late in the day for me to fix some of these - I need to complete the PITR functionality of the archive recovery. Best Regards, Simon Riggs
On Thu, Jul 08, 2004 at 06:33:56PM +0100, Simon Riggs wrote: > On Thu, 2004-07-08 at 16:19, Alvaro Herrera wrote: > > On Thu, Jul 08, 2004 at 02:58:01PM +0100, Simon Riggs wrote: > > > > The correct solution would be to emit a XLog record for CREATE > > DATABASE ... > > I'd prefer a formal approach, hence why I raised this. What do you mean by formal? What's so informal about my approach? > Interesting what you say about about the other stuff - a simple create > dir worked without any problems - I wonder if a whole bunch of stuff is > missing there? Probably not, but just creating a directory if it isn't found doesn't strike me as the best solution. Consider the case where somebody gets a corrupted catalog, saying that some table has a file on the directory 11298172987. Would you just create the directory, create an empty file with the name you are looking for, and then report that the blocknumber that was asked wasn't found? Sounds like a recipe for confusion at the least, random files and directories created everywhere on worse situations. I don't understand why you don't want a CREATE DATABASE XLog record. It certainly seems like the correct situation. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Jason Tesser: You might not have understood me or I am not understanding you. Paul Thomas: It feels like we're 2 people divided by a common language...
On Thu, 2004-07-08 at 20:14, Alvaro Herrera wrote: > On Thu, Jul 08, 2004 at 06:33:56PM +0100, Simon Riggs wrote: > > On Thu, 2004-07-08 at 16:19, Alvaro Herrera wrote: > > > On Thu, Jul 08, 2004 at 02:58:01PM +0100, Simon Riggs wrote: > > > > > > The correct solution would be to emit a XLog record for CREATE > > > DATABASE ... > > > > I'd prefer a formal approach, hence why I raised this. > > What do you mean by formal? What's so informal about my approach? We have miscommunicated...let me re-word this for clarity: >>I agree with your suggestion that we should adopt a formal approach to solving the problem, rather than just a quick-fix, as I suggested might be an option. I've raised this to highlight the issue and encourage other hackers to help solve this issue.<< Best regards, Simon Riggs