Thread: FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory
FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory
From
Gianni Ciolli
Date:
Hello, we found a bug while testing the latest version of Hot Standby. Then we could reproduce it on the unpatched HEAD, so we are going to ignore it in the next few days. During a Warm Standby session using current HEAD I obtained the following error on the standby node: ---8<------8<------8<------8<------8<------8<------8<------8<------8<--- 2009-01-16 16:24:01 GMT[30678]LOG: restored log file "0000000100000001000000C2" from archive 2009-01-16 16:24:01 GMT[30678]FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory 2009-01-16 16:24:01 GMT[30678]CONTEXT: writing block 1 of relation pg_tblspc/491086/467369/491103xlog redo checkpoint: redo1/C2001AB8; tli 1; xid 0/89982; oid 491520; multi 1; offset 0; online 2009-01-16 16:24:01 GMT[30665]LOG: startup process (PID 30678) exited with exit code 1 2009-01-16 16:24:01 GMT[30665]LOG: aborting startup due to startup process failure 2009-01-16 16:24:01 GMT[30677]DEBUG: logger shutting down ---8<------8<------8<------8<------8<------8<------8<------8<------8<--- After setting up the session, I started an endless loop of "make installcheck" on the primary node; the error happened after 40/50 minutes. At the present I can't say exactly which test was responsible for that, but this information should be obtainable by raising debug level on the primary and comparing WAL segment numbers while looking at both logfiles. Anyway, since the error was raised by bgwriter when running with Hot Standby patch applied, is likely to be something to do with the guts of checkpointing. Best regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it
Re: FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory
From
Tom Lane
Date:
Gianni Ciolli <gianni.ciolli@2ndquadrant.it> writes: > we found a bug while testing the latest version of Hot Standby. Then > we could reproduce it on the unpatched HEAD, so we are going to ignore > it in the next few days. You didn't actually say how to repeat it on unpatched HEAD. regards, tom lane
Re: FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory
From
Gianni Ciolli
Date:
On Fri, Jan 16, 2009 at 06:39:11PM +0100, Gianni Ciolli wrote: (...) > During a Warm Standby session using current HEAD I obtained the > following error on the standby node: On Fri, Jan 16, 2009 at 12:56:59PM -0500, Tom Lane wrote: > Gianni Ciolli <gianni.ciolli@2ndquadrant.it> writes: > > we found a bug while testing the latest version of Hot Standby. Then > > we could reproduce it on the unpatched HEAD, so we are going to ignore > > it in the next few days. > > You didn't actually say how to repeat it on unpatched HEAD. > > regards, tom lane Sorry for the misunderstanding; I used "current HEAD" and "unpatched HEAD" as synonymous. All the procedure that I described in that mail has been done with unpatched HEAD; the only mentions of Hot Standby are outside that procedure. Best regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it
Re: FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory
From
Simon Riggs
Date:
On Fri, 2009-01-16 at 19:12 +0100, Gianni Ciolli wrote: > On Fri, Jan 16, 2009 at 06:39:11PM +0100, Gianni Ciolli wrote: > (...) > > During a Warm Standby session using current HEAD I obtained the > > following error on the standby node: I think I understand the cause of these bugs in CVS HEAD now. In various places in current HEAD we throw a checkpoint when we want to be certain that all buffers have been flushed. In recovery, a checkpoint isn't always a restartpoint for two reasons: timing and rmgr state. This gives both a cause for the error and an explanation of why it does not occur consistently. ISTM this could likely effect previous releases as well. We need to put some marker into WAL to allow the same actions to be repeated in recovery. We can't just force these "correctness checkpoints" to be restartpoints because they might be invalid, but we can force CheckPointGuts() (or something less) without updating the control file. With regard to various changes I have in motion, the CheckPointGuts() would need to be executed in full before further WAL replay occurs, so would need to be executed by the Startup process and not by the bgwriter to ensure we performed the correct sequence of actions. CHECKPOINT_FORCE might be the right indicator of when to do take special action in recovery, not sure. Will look at this again later. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Re: FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > In various places in current HEAD we throw a checkpoint when we want to > be certain that all buffers have been flushed. > > In recovery, a checkpoint isn't always a restartpoint for two reasons: > timing and rmgr state. This gives both a cause for the error and an > explanation of why it does not occur consistently. ISTM this could > likely effect previous releases as well. Were you able to narrow this down? Do you know exactly what command caused it? At least replay of CREATE DATABASE already calls FlushDatabaseBuffers(), but are we missing that from some other place? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory
From
Simon Riggs
Date:
On Mon, 2009-01-26 at 09:48 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > In various places in current HEAD we throw a checkpoint when we want to > > be certain that all buffers have been flushed. > > > > In recovery, a checkpoint isn't always a restartpoint for two reasons: > > timing and rmgr state. This gives both a cause for the error and an > > explanation of why it does not occur consistently. ISTM this could > > likely effect previous releases as well. > > Were you able to narrow this down? Do you know exactly what command > caused it? We know it wasn't any specific command because it caused the bgwriter to crash when HS patch was applied. But no, I'm not looking at it yet, until we're done with HS. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support