Thread: BUG #9756: Inconsistent database after OS restart

BUG #9756: Inconsistent database after OS restart

From

sdl@mnppsaturn.ru

Date:

28 March 2014, 15:30:13

The following bug has been logged on the website:

Bug reference:      9756
Logged by:          Dmitry Samokhin
Email address:      sdl@mnppsaturn.ru
PostgreSQL version: 9.2.8
Operating system:   Windows XP, 7
Description:

After every Windows restart, database couldn't be shut down properly and
recovering actions are performed on the next startup. But if we control the
service manually by the Windows Services snap-in, no problem with the
correct DB cluster shutdown. Looks like there's something wrong with the
correct autovacuum process shutdown which leads to the DB inconsistency.

Custom configuration options:
autovacuum_naptime = '1d'
The others are default.

Here is the log entries:

*** Manual service start ***

2014-03-28 11:36:05 MSK LOG:  database system was shut down at 2014-03-28
11:35:39 MSK
2014-03-28 11:36:05 MSK LOG:  database system is ready to accept
connections
2014-03-28 11:36:05 MSK LOG:  autovacuum launcher started

*** Manual service stop ***

2014-03-28 11:36:41 MSK LOG:  received fast shutdown request
2014-03-28 11:36:41 MSK LOG:  aborting any active transactions
2014-03-28 11:36:41 MSK LOG:  autovacuum launcher shutting down
2014-03-28 11:36:41 MSK LOG:  shutting down
2014-03-28 11:36:41 MSK LOG:  database system is shut down

*** Manual service start again***

2014-03-28 11:37:02 MSK LOG:  database system was shut down at 2014-03-28
11:36:41 MSK
2014-03-28 11:37:02 MSK LOG:  database system is ready to accept
connections
2014-03-28 11:37:02 MSK LOG:  autovacuum launcher started

*** Windows restart: ***

-- Stopping:
2014-03-28 11:37:34 MSK LOG:  received fast shutdown request
2014-03-28 11:37:34 MSK LOG:  aborting any active transactions
2014-03-28 11:37:34 MSK LOG:  autovacuum launcher shutting down
2014-03-28 11:37:34 MSK ERROR:  canceling statement due to user request

-- Starting:
2014-03-28 11:38:08 MSK LOG:  database system was interrupted; last known up
at 2014-03-28 11:37:02 MSK
2014-03-28 11:38:08 MSK LOG:  database system was not properly shut down;
automatic recovery in progress
2014-03-28 11:38:08 MSK FATAL:  the database system is starting up
2014-03-28 11:38:08 MSK LOG:  record with zero length at 0/1B44FF0
2014-03-28 11:38:08 MSK LOG:  redo is not required
2014-03-28 11:38:09 MSK LOG:  database system is ready to accept
connections
2014-03-28 11:38:09 MSK LOG:  autovacuum launcher started

Thanks,
Dmitry

Re: BUG #9756: Inconsistent database after OS restart

From

Jeff Janes

Date:

28 March 2014, 17:39:56

On Fri, Mar 28, 2014 at 12:58 AM, <sdl@mnppsaturn.ru> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      9756
> Logged by:          Dmitry Samokhin
> Email address:      sdl@mnppsaturn.ru
> PostgreSQL version: 9.2.8
> Operating system:   Windows XP, 7
> Description:
>
> After every Windows restart, database couldn't be shut down properly and
> recovering actions are performed on the next startup. But if we control the
> service manually by the Windows Services snap-in, no problem with the
> correct DB cluster shutdown. Looks like there's something wrong with the
> correct autovacuum process shutdown which leads to the DB inconsistency.
>
>
> *** Windows restart: ***
>
> -- Stopping:
> 2014-03-28 11:37:34 MSK LOG:  received fast shutdown request
> 2014-03-28 11:37:34 MSK LOG:  aborting any active transactions
> 2014-03-28 11:37:34 MSK LOG:  autovacuum launcher shutting down
> 2014-03-28 11:37:34 MSK ERROR:  canceling statement due to user request
>

It looks like Windows told the server to shutdown, but did not give it
enough time to complete the process before killing it uncleanly.

Can you make Windows shutdown more patient?

Cheers,

Jeff

Re: BUG #9756: Inconsistent database after OS restart

From

"Samokhin, Dmitry [MNPP Saturn]"

Date:

31 March 2014, 13:21:19

Jeff,
=20
Due to a huge variety of services and their functionality, it's almost
impossible to know (or predict) how much time every service needs to
complete its init and deinit operations. Giving any "reasonable", =
"average",
"sufficient" time to all of them to do this work would result in
unreliability, on one hand, and waste of hardware and human resources, =
on
the other hand. Instead, the service workflow in Windows is designed so
services receive control events from the Service Control Manager and =
then
they report when they are actually up or down, so Windows is able to
continue the required either startup or shutdown process. For details, =
see
http://msdn.microsoft.com/en-us/library/windows/desktop/ee126211%28v=3Dvs=
.85%2
9.aspx
=20
Regards,
Dmitry.
=20

-----Original Message-----
From: Jeff Janes [mailto:jeff.janes@gmail.com]=20
Sent: Friday, March 28, 2014 9:40 PM
To: sdl@mnppsaturn.ru
Cc: Pg Bugs
Subject: Re: [BUGS] BUG #9756: Inconsistent database after OS restart


On Fri, Mar 28, 2014 at 12:58 AM, <sdl@mnppsaturn.ru> wrote:


The following bug has been logged on the website:

Bug reference:      9756
Logged by:          Dmitry Samokhin
Email address:      sdl@mnppsaturn.ru
PostgreSQL version: 9.2.8
Operating system:   Windows XP, 7
Description:

After every Windows restart, database couldn't be shut down properly and
recovering actions are performed on the next startup. But if we control =
the
service manually by the Windows Services snap-in, no problem with the
correct DB cluster shutdown. Looks like there's something wrong with the
correct autovacuum process shutdown which leads to the DB inconsistency.




*** Windows restart: ***

-- Stopping:
2014-03-28 11:37:34 MSK LOG:  received fast shutdown request
2014-03-28 11:37:34 MSK LOG:  aborting any active transactions
2014-03-28 11:37:34 MSK LOG:  autovacuum launcher shutting down
2014-03-28 11:37:34 MSK ERROR:  canceling statement due to user request



It looks like Windows told the server to shutdown, but did not give it
enough time to complete the process before killing it uncleanly.

Can you make Windows shutdown more patient?

Cheers,

Jeff

Re: BUG #9756: Inconsistent database after OS restart

From

Jeff Janes

Date:

31 March 2014, 15:56:01

On Mon, Mar 31, 2014 at 1:15 AM, Samokhin, Dmitry [MNPP Saturn] <
sdl@mnppsaturn.ru> wrote:

>  Jeff,
>
> Due to a huge variety of services and their functionality, it's almost
> impossible to know (or predict) how much time every service needs to
> complete its init and deinit operations. Giving any "reasonable",
> "average", "sufficient" time to all of them to do this work would result in
> unreliability, on one hand, and waste of hardware and human resources, on
> the other hand. Instead, the service workflow in Windows is designed
> so services receive control events from the Service Control Manager and
> then they report when they are actually up or down, so Windows is able
> to continue the required either startup or shutdown process. For details,
> see
> http://msdn.microsoft.com/en-us/library/windows/desktop/ee126211%28v=vs.85%29.aspx
>

Sorry, I don't understand most of that, and don't see anything there about
what happens when Windows itself is shutting down.  But from the log, it is
clear that PostgreSQL did get the notification it was supposed to exit
(however it received that notification), but that it was not allowed to
complete the process.  Maybe Windows was told to reboot with a "force" flag
that causes it to not wait, or maybe Windows decided that PostgreSQL was
not communicating with the SCM properly and therefore deserved no mercy.

Are you using the EDB installers, or compiling it yourself?

Cheers,

Jeff

Re: BUG #9756: Inconsistent database after OS restart

From

"Samokhin, Dmitry [MNPP Saturn]"

Date:

01 April 2014, 15:01:56

> -----Original Message-----
> From: Jeff Janes [mailto:jeff.janes@gmail.com]
> Sent: Monday, March 31, 2014 7:56 PM
> To: sdl@mnppsaturn.ru
> Cc: Pg Bugs
> Subject: Re: [BUGS] BUG #9756: Inconsistent database after OS restart
>
> Are you using the EDB installers, or compiling it yourself?
>
>
> Cheers,
>
>
> Jeff

Only prebuilt EDB installer, postgresql-9.2.8-1-windows.exe.
I can produce a more verbose log, if that helps.

Dmitry.

Re: BUG #9756: Inconsistent database after OS restart

From

"Samokhin, Dmitry [MNPP Saturn]"

Date:

01 April 2014, 15:02:20

> -----Original Message-----
> From: Jeff Janes [mailto:jeff.janes@gmail.com]=20
> Sent: Monday, March 31, 2014 7:56 PM
> To: sdl@mnppsaturn.ru
> Cc: Pg Bugs
> Subject: Re: [BUGS] BUG #9756: Inconsistent database after OS restart
>
> Sorry, I don't understand most of that, and don't see anything there =
about
what happens when Windows itself is shutting down.
> But from the log, it is clear that PostgreSQL did get the notification =
it
was supposed to exit (however it received that
> notification), but that it was not allowed to complete the process.  =
Maybe
Windows was told to reboot with a "force" flag
> that causes it to not wait, or maybe Windows decided that PostgreSQL =
was
not communicating with the SCM properly and
> therefore deserved no mercy. =20

Some more useful info:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms685149%28v=3Dvs=
.85%2
9.aspx

This article says that there is a registry key

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTime=
out

which controls the time limit for services cleanup when the operating =
system
is rebooting (see also
http://technet.microsoft.com/en-us/library/cc976045.aspx). Its default =
value
is 20000 (20 s), while Postgres usually needs 1-3 seconds to completely
stop.


Maybe there are certain cases when the autovacuum subprocess is somehow =
shut
down by the Postgres control module after the whole cluster is done and =
this
behaviour affects the DB integrity.

Dmitry.