Thread: 8.4.4 locked after power failure

8.4.4 locked after power failure

From
Grzegorz Jaśkiewicz
Date:
So here's the thing. I got a message from one of the developers, that
running 'create temporary sequence xyz;' hangs on the database.
That seemed suspicious. I tried running any ddl command, and that hang.
No other connections to the database.

It turned out that it had a power failure earlier in the morning. That
seems ok, but in the past postgresql will always recover fine (at
least 8.3.x).
This time I had to reindex, and vacuum all user and system catalogues
to get system back in order.

Any ideas ? Is that something new, is it fixed in any newer releases ?


Unfortunately it had to be done rather quick - so I couldn't salvage any data.


--
GJ

Re: 8.4.4 locked after power failure

From
Merlin Moncure
Date:
2011/9/12 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> So here's the thing. I got a message from one of the developers, that
> running 'create temporary sequence xyz;' hangs on the database.
> That seemed suspicious. I tried running any ddl command, and that hang.
> No other connections to the database.
>
> It turned out that it had a power failure earlier in the morning. That
> seems ok, but in the past postgresql will always recover fine (at
> least 8.3.x).
> This time I had to reindex, and vacuum all user and system catalogues
> to get system back in order.
>
> Any ideas ? Is that something new, is it fixed in any newer releases ?
>
>
> Unfortunately it had to be done rather quick - so I couldn't salvage any data.

It seems odd that you could not create a temp sequence but you were
able to reindex the entire database.  did you confirm you were
blocking on a non-granted lock?

merlin

Re: 8.4.4 locked after power failure

From
Grzegorz Jaśkiewicz
Date:
2011/9/12 Merlin Moncure <mmoncure@gmail.com>:
>
> It seems odd that you could not create a temp sequence but you were
> able to reindex the entire database.  did you confirm you were
> blocking on a non-granted lock?


I could revacuum/reindex all stuff, only if I had to do the system
catalogues first. That seemed a bit suspicious.

Also, I had to stop all other connections - because any other locked
transaction would prevent it from going on.
Oh, the last thing - pg_cancel_backend() couldn't kill the locked
backend. I had to restart the whole postgresql using -m immediate.

Odd stuff indeed, as I was hoping postgresql will clean things like
locks on recovery from crash. Never happened to me with 8.3

--
GJ

Re: 8.4.4 locked after power failure

From
Grzegorz Jaśkiewicz
Date:
I'm guessing I won't get much more from devs , without providing more
info here which unfortunately has been lost.

Re: 8.4.4 locked after power failure

From
Merlin Moncure
Date:
2011/9/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> I'm guessing I won't get much more from devs , without providing more
> info here which unfortunately has been lost.

yup -- you destroyed all the evidence.  if it happens again, try
posting some more info, particularly what's going on with locks,
pg_stat_activity, maybe an strace of any stuck processes, etc.  also
irc can be a good place to get some quick help if its time sensitive.

merlin

Re: 8.4.4 locked after power failure

From
Grzegorz Jaśkiewicz
Date:
2011/9/13 Merlin Moncure <mmoncure@gmail.com>:
> 2011/9/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>> I'm guessing I won't get much more from devs , without providing more
>> info here which unfortunately has been lost.
>
> yup -- you destroyed all the evidence.  if it happens again, try
> posting some more info, particularly what's going on with locks,
> pg_stat_activity, maybe an strace of any stuck processes, etc.  also
> irc can be a good place to get some quick help if its time sensitive.

See, I was hoping that just the fact that it occurred might trigger
some suspicions on the -devel side of things. I know it would in my
product.



--
GJ

Re: 8.4.4 locked after power failure

From
Merlin Moncure
Date:
2011/9/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> 2011/9/13 Merlin Moncure <mmoncure@gmail.com>:
>> 2011/9/13 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>>> I'm guessing I won't get much more from devs , without providing more
>>> info here which unfortunately has been lost.
>>
>> yup -- you destroyed all the evidence.  if it happens again, try
>> posting some more info, particularly what's going on with locks,
>> pg_stat_activity, maybe an strace of any stuck processes, etc.  also
>> irc can be a good place to get some quick help if its time sensitive.
>
> See, I was hoping that just the fact that it occurred might trigger
> some suspicions on the -devel side of things. I know it would in my
> product.

sure.  In lieu of useful and/or specific information, the #1 suspicion
is always going to be 'operator error', because, more often than not,
that's what it is.  Having seen and read your posts on the various
lists for some time now I know that if you report a problem then there
likely is a problem, but without more detail this is essentially a
fishing expedition :-).

merlin

Re: 8.4.4 locked after power failure

From
Craig Ringer
Date:
On 09/13/2011 11:04 PM, Grzegorz Jaśkiewicz wrote:
> 2011/9/13 Merlin Moncure<mmoncure@gmail.com>:
>> 2011/9/13 Grzegorz Jaśkiewicz<gryzman@gmail.com>:
>>> I'm guessing I won't get much more from devs , without providing more
>>> info here which unfortunately has been lost.
>>
>> yup -- you destroyed all the evidence.  if it happens again, try
>> posting some more info, particularly what's going on with locks,
>> pg_stat_activity, maybe an strace of any stuck processes, etc.  also
>> irc can be a good place to get some quick help if its time sensitive.
>
> See, I was hoping that just the fact that it occurred might trigger
> some suspicions on the -devel side of things. I know it would in my
> product.

Of course it does. How would _you_ go about reproducing it or tracing it
so you could even begin to track it down, though? There just isn't
enough information to work with.

It's like someone contacting you and explaining that they saved a file
containing a huge embedded image from your office suite software, but
when they went to open the file later it crashed the office suite. They
were in a hurry and didn't have the disk space to keep the old one, so
they had to redo the work and save over the file. It's good to get the
report, but ... what do you do about it? They don't have a faulty file,
a reproducible test case, any or any good prospects for working on
creating a test case.

--
Craig Ringer