Thread: check for missing tablespaces?
I am in the middle of helping a customer recover from a situation where a tablespace was missing when a machine was rebooted and postgres restarted, and I'm wondering if we should not have some sort of check for this on startup. Maybe we could check for the existence of the PG_VERSION file or something like that? Of course, that wouldn't help if the tablespace were subsequently lost, but it doesn't seem like a terribly expensive or unreasonable sanity check for startup. cheers andrew
* Andrew Dunstan (andrew@dunslane.net) wrote: > I am in the middle of helping a customer recover from a situation where > a tablespace was missing when a machine was rebooted and postgres > restarted, and I'm wondering if we should not have some sort of check > for this on startup. Maybe we could check for the existence of the > PG_VERSION file or something like that? Of course, that wouldn't help if > the tablespace were subsequently lost, but it doesn't seem like a > terribly expensive or unreasonable sanity check for startup. I agree entirely with doing this. Now that we've got tablespaces and our users are using them to split things on to seperate partitions which could be accessed in different ways (eg: a tablespace on local storage and one on a SAN) it makes alot of sense to make sure everything is in order before allowing the system to completely come up. I'd much rather PG not start when the SAN mappings aren't in place. Thanks, Stephen
Andrew Dunstan <andrew@dunslane.net> writes: > I am in the middle of helping a customer recover from a situation where > a tablespace was missing when a machine was rebooted and postgres > restarted, and I'm wondering if we should not have some sort of check > for this on startup. Maybe we could check for the existence of the > PG_VERSION file or something like that? ... and do what? What exactly went wrong so badly with the missing tablespace? Doesn't seem like it should be any worse than if individual table files went missing. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I am in the middle of helping a customer recover from a situation where >> a tablespace was missing when a machine was rebooted and postgres >> restarted, and I'm wondering if we should not have some sort of check >> for this on startup. Maybe we could check for the existence of the >> PG_VERSION file or something like that? >> > > ... and do what? > > What exactly went wrong so badly with the missing tablespace? Doesn't > seem like it should be any worse than if individual table files went > missing. > > > Well, in the present instance probably nothing drastic went wrong. The tablespace only contained indexes - if an index is missing then updates to the base table will fail, right? In general, I think I'd probably prefer normal database startup to fail if a tablespace is missing. That way I will know about it right then and can remedy it. This is something that is much more likely to happen than an arbitrary missing file, ISTM, and at the same time the check would be much easier than looking for an arbitrary file. At the very least we could issue a warning. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> ... and do what? > In general, I think I'd probably prefer normal database startup to fail > if a tablespace is missing. That way I will know about it right then and > can remedy it. This is something that is much more likely to happen than > an arbitrary missing file, ISTM, and at the same time the check would be > much easier than looking for an arbitrary file. At the very least we > could issue a warning. So what you're imagining is * iterate through each symlink in $PGDATA/pg_tblspc * check that PG_VERSION exists (and has the right contents??) in each pointed-to directory * fail if not I guess this is reasonable, since we make a similar check for the core data directory itself. Watch out for the initdb sequence though. regards, tom lane
Tom Lane wrote: > So what you're imagining is > > * iterate through each symlink in $PGDATA/pg_tblspc > * check that PG_VERSION exists (and has the right contents??) in > each pointed-to directory > * fail if not > > I guess this is reasonable, since we make a similar check for the core > data directory itself. Watch out for the initdb sequence though. > > > I'll put this on my TODO list. I'm not sure about the initdb reference - there won't be any tablespaces to check for during initdb, will there? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I'm not sure about the initdb reference - there won't be any tablespaces > to check for during initdb, will there? No, but I think pg_tblspc/ itself might not be there either. Just a case to test your patch on ... regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I'm not sure about the initdb reference - there won't be any tablespaces >> to check for during initdb, will there? >> > > No, but I think pg_tblspc/ itself might not be there either. Just a > case to test your patch on ... > > > Sure it is, it's set up by initdb along with the other subdirectories before it creates the PG_VERSION files. Anyway, I don't know that there's any urgency about this. cheers andrew
Andrew Dunstan wrote: > > I am in the middle of helping a customer recover from a situation > where a tablespace was missing when a machine was rebooted and > postgres restarted, Have you uncovered why the tablespace when missing? > and I'm wondering if we should not have some sort of check for this on > startup +1 Andrew Chernow
Andrew Dunstan wrote: > > I am in the middle of helping a customer recover from a situation > where a tablespace was missing when a machine was rebooted and > postgres restarted, Have you uncovered why the tablespace went missing? > and I'm wondering if we should not have some sort of check for this on > startup +1 Andrew Chernow
Andrew Chernow wrote: > Andrew Dunstan wrote: >> >> I am in the middle of helping a customer recover from a situation >> where a tablespace was missing when a machine was rebooted and >> postgres restarted, > Have you uncovered why the tablespace went missing? No. It's on a SAN, and I understand our hardware guys are talking to the SAN vendor to get to the bottom of it. The network in the data centre was reconfigured last night, but that should not in fact have affected the SAN AFAIK. Other SAN devices mounted quite happily, including the one containing the main data directory. Anyway, from this POV all we really need to know is that the device hosting this tablespace failed to mount when the machine was rebooted, and then postgres restarted. cheers andrew
> Anyway, from this POV all we really need to know is that the device > hosting this tablespace failed to mount when the machine was rebooted, > and then postgres restarted. > Good to know postgresql had nothing to do with the missing data. I wasn't sure if it was user error, config problem or hardware. From my experience, it doesn't really matter if you have a sophisticated SAN or put together an ad-hoc nas/das solution, storage likes to bite you. The only thing that helps me sleep at night is sound replication and backups (in some cases triple redundancy with aging data sets). Andrew