Thread: check for missing tablespaces?

check for missing tablespaces?

From
Andrew Dunstan
Date:
I am in the middle of helping a customer recover from a situation where 
a tablespace was missing when a machine was rebooted and postgres 
restarted, and I'm wondering if we should not have some sort of check 
for this on startup. Maybe we could check for the existence of the 
PG_VERSION file or something like that? Of course, that wouldn't help if 
the tablespace were subsequently lost, but it doesn't seem like a 
terribly expensive or unreasonable sanity check for startup.

cheers

andrew


Re: check for missing tablespaces?

From
Stephen Frost
Date:
* Andrew Dunstan (andrew@dunslane.net) wrote:
> I am in the middle of helping a customer recover from a situation where
> a tablespace was missing when a machine was rebooted and postgres
> restarted, and I'm wondering if we should not have some sort of check
> for this on startup. Maybe we could check for the existence of the
> PG_VERSION file or something like that? Of course, that wouldn't help if
> the tablespace were subsequently lost, but it doesn't seem like a
> terribly expensive or unreasonable sanity check for startup.

I agree entirely with doing this.  Now that we've got tablespaces and
our users are using them to split things on to seperate partitions which
could be accessed in different ways (eg: a tablespace on local storage
and one on a SAN) it makes alot of sense to make sure everything is in
order before allowing the system to completely come up.  I'd much rather
PG not start when the SAN mappings aren't in place.
Thanks,
    Stephen

Re: check for missing tablespaces?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I am in the middle of helping a customer recover from a situation where 
> a tablespace was missing when a machine was rebooted and postgres 
> restarted, and I'm wondering if we should not have some sort of check 
> for this on startup. Maybe we could check for the existence of the 
> PG_VERSION file or something like that?

... and do what?

What exactly went wrong so badly with the missing tablespace?  Doesn't
seem like it should be any worse than if individual table files went
missing.
        regards, tom lane


Re: check for missing tablespaces?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I am in the middle of helping a customer recover from a situation where 
>> a tablespace was missing when a machine was rebooted and postgres 
>> restarted, and I'm wondering if we should not have some sort of check 
>> for this on startup. Maybe we could check for the existence of the 
>> PG_VERSION file or something like that?
>>     
>
> ... and do what?
>
> What exactly went wrong so badly with the missing tablespace?  Doesn't
> seem like it should be any worse than if individual table files went
> missing.
>
>             
>   

Well, in the present instance probably nothing drastic went wrong. The 
tablespace only contained indexes - if an index is missing then updates 
to the base table will fail, right?

In general, I think I'd probably prefer normal database startup to fail 
if a tablespace is missing. That way I will know about it right then and 
can remedy it. This is something that is much more likely to happen than 
an arbitrary missing file, ISTM, and at the same time the check would be 
much easier than looking for an arbitrary file. At the very least we 
could issue a warning.

cheers

andrew




Re: check for missing tablespaces?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> ... and do what?

> In general, I think I'd probably prefer normal database startup to fail 
> if a tablespace is missing. That way I will know about it right then and 
> can remedy it. This is something that is much more likely to happen than 
> an arbitrary missing file, ISTM, and at the same time the check would be 
> much easier than looking for an arbitrary file. At the very least we 
> could issue a warning.

So what you're imagining is

* iterate through each symlink in $PGDATA/pg_tblspc
* check that PG_VERSION exists (and has the right contents??) in each pointed-to directory
* fail if not

I guess this is reasonable, since we make a similar check for the core
data directory itself.  Watch out for the initdb sequence though.
        regards, tom lane


Re: check for missing tablespaces?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> So what you're imagining is
>
> * iterate through each symlink in $PGDATA/pg_tblspc
> * check that PG_VERSION exists (and has the right contents??) in
>   each pointed-to directory
> * fail if not
>
> I guess this is reasonable, since we make a similar check for the core
> data directory itself.  Watch out for the initdb sequence though.
>
>             
>   

I'll put this on my TODO list.

I'm not sure about the initdb reference - there won't be any tablespaces 
to check for during initdb, will there?

cheers

andrew


Re: check for missing tablespaces?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I'm not sure about the initdb reference - there won't be any tablespaces 
> to check for during initdb, will there?

No, but I think pg_tblspc/ itself might not be there either.  Just a
case to test your patch on ...
        regards, tom lane


Re: check for missing tablespaces?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I'm not sure about the initdb reference - there won't be any tablespaces 
>> to check for during initdb, will there?
>>     
>
> No, but I think pg_tblspc/ itself might not be there either.  Just a
> case to test your patch on ...
>
>             
>   

Sure  it is, it's set up by initdb along with the other subdirectories 
before it creates the PG_VERSION files.

Anyway, I don't know that there's any urgency about this.

cheers

andrew


Re: check for missing tablespaces?

From
Andrew Chernow
Date:
Andrew Dunstan wrote:
>
> I am in the middle of helping a customer recover from a situation 
> where a tablespace was missing when a machine was rebooted and 
> postgres restarted, 
Have you uncovered why the tablespace when missing?
> and I'm wondering if we should not have some sort of check for this on 
> startup
+1

Andrew Chernow


Re: check for missing tablespaces?

From
Andrew Chernow
Date:
Andrew Dunstan wrote:
>
> I am in the middle of helping a customer recover from a situation 
> where a tablespace was missing when a machine was rebooted and 
> postgres restarted, 
Have you uncovered why the tablespace went missing?
> and I'm wondering if we should not have some sort of check for this on 
> startup
+1

Andrew Chernow


Re: check for missing tablespaces?

From
Andrew Dunstan
Date:

Andrew Chernow wrote:
> Andrew Dunstan wrote:
>>
>> I am in the middle of helping a customer recover from a situation 
>> where a tablespace was missing when a machine was rebooted and 
>> postgres restarted, 
> Have you uncovered why the tablespace went missing?

No. It's on a SAN, and I understand our hardware guys are talking to the 
SAN vendor to get to the bottom of it. The network in the data centre 
was reconfigured last night, but that should not in fact have affected 
the SAN AFAIK. Other SAN devices mounted quite happily, including the 
one containing the main data directory.

Anyway, from this POV all we really need to know is that the device 
hosting this tablespace failed to mount when the machine was rebooted, 
and then postgres restarted.

cheers

andrew



Re: check for missing tablespaces?

From
Andrew Chernow
Date:
> Anyway, from this POV all we really need to know is that the device 
> hosting this tablespace failed to mount when the machine was rebooted, 
> and then postgres restarted.
>
Good to know postgresql had nothing to do with the missing data.  I 
wasn't sure if it was user error, config problem or hardware.
From my experience, it doesn't really matter if you have a 
sophisticated SAN or put together an ad-hoc nas/das solution, storage 
likes to bite you.  The only thing that helps me sleep at night is sound 
replication and backups (in some cases triple redundancy with aging data 
sets).

Andrew