Thread: BUG #9756: Inconsistent database after OS restart
The following bug has been logged on the website: Bug reference: 9756 Logged by: Dmitry Samokhin Email address: sdl@mnppsaturn.ru PostgreSQL version: 9.2.8 Operating system: Windows XP, 7 Description: After every Windows restart, database couldn't be shut down properly and recovering actions are performed on the next startup. But if we control the service manually by the Windows Services snap-in, no problem with the correct DB cluster shutdown. Looks like there's something wrong with the correct autovacuum process shutdown which leads to the DB inconsistency. Custom configuration options: autovacuum_naptime = '1d' The others are default. Here is the log entries: *** Manual service start *** 2014-03-28 11:36:05 MSK LOG: database system was shut down at 2014-03-28 11:35:39 MSK 2014-03-28 11:36:05 MSK LOG: database system is ready to accept connections 2014-03-28 11:36:05 MSK LOG: autovacuum launcher started *** Manual service stop *** 2014-03-28 11:36:41 MSK LOG: received fast shutdown request 2014-03-28 11:36:41 MSK LOG: aborting any active transactions 2014-03-28 11:36:41 MSK LOG: autovacuum launcher shutting down 2014-03-28 11:36:41 MSK LOG: shutting down 2014-03-28 11:36:41 MSK LOG: database system is shut down *** Manual service start again*** 2014-03-28 11:37:02 MSK LOG: database system was shut down at 2014-03-28 11:36:41 MSK 2014-03-28 11:37:02 MSK LOG: database system is ready to accept connections 2014-03-28 11:37:02 MSK LOG: autovacuum launcher started *** Windows restart: *** -- Stopping: 2014-03-28 11:37:34 MSK LOG: received fast shutdown request 2014-03-28 11:37:34 MSK LOG: aborting any active transactions 2014-03-28 11:37:34 MSK LOG: autovacuum launcher shutting down 2014-03-28 11:37:34 MSK ERROR: canceling statement due to user request -- Starting: 2014-03-28 11:38:08 MSK LOG: database system was interrupted; last known up at 2014-03-28 11:37:02 MSK 2014-03-28 11:38:08 MSK LOG: database system was not properly shut down; automatic recovery in progress 2014-03-28 11:38:08 MSK FATAL: the database system is starting up 2014-03-28 11:38:08 MSK LOG: record with zero length at 0/1B44FF0 2014-03-28 11:38:08 MSK LOG: redo is not required 2014-03-28 11:38:09 MSK LOG: database system is ready to accept connections 2014-03-28 11:38:09 MSK LOG: autovacuum launcher started Thanks, Dmitry
On Fri, Mar 28, 2014 at 12:58 AM, <sdl@mnppsaturn.ru> wrote: > The following bug has been logged on the website: > > Bug reference: 9756 > Logged by: Dmitry Samokhin > Email address: sdl@mnppsaturn.ru > PostgreSQL version: 9.2.8 > Operating system: Windows XP, 7 > Description: > > After every Windows restart, database couldn't be shut down properly and > recovering actions are performed on the next startup. But if we control the > service manually by the Windows Services snap-in, no problem with the > correct DB cluster shutdown. Looks like there's something wrong with the > correct autovacuum process shutdown which leads to the DB inconsistency. > > > *** Windows restart: *** > > -- Stopping: > 2014-03-28 11:37:34 MSK LOG: received fast shutdown request > 2014-03-28 11:37:34 MSK LOG: aborting any active transactions > 2014-03-28 11:37:34 MSK LOG: autovacuum launcher shutting down > 2014-03-28 11:37:34 MSK ERROR: canceling statement due to user request > It looks like Windows told the server to shutdown, but did not give it enough time to complete the process before killing it uncleanly. Can you make Windows shutdown more patient? Cheers, Jeff
Jeff, =20 Due to a huge variety of services and their functionality, it's almost impossible to know (or predict) how much time every service needs to complete its init and deinit operations. Giving any "reasonable", = "average", "sufficient" time to all of them to do this work would result in unreliability, on one hand, and waste of hardware and human resources, = on the other hand. Instead, the service workflow in Windows is designed so services receive control events from the Service Control Manager and = then they report when they are actually up or down, so Windows is able to continue the required either startup or shutdown process. For details, = see http://msdn.microsoft.com/en-us/library/windows/desktop/ee126211%28v=3Dvs= .85%2 9.aspx =20 Regards, Dmitry. =20 -----Original Message----- From: Jeff Janes [mailto:jeff.janes@gmail.com]=20 Sent: Friday, March 28, 2014 9:40 PM To: sdl@mnppsaturn.ru Cc: Pg Bugs Subject: Re: [BUGS] BUG #9756: Inconsistent database after OS restart On Fri, Mar 28, 2014 at 12:58 AM, <sdl@mnppsaturn.ru> wrote: The following bug has been logged on the website: Bug reference: 9756 Logged by: Dmitry Samokhin Email address: sdl@mnppsaturn.ru PostgreSQL version: 9.2.8 Operating system: Windows XP, 7 Description: After every Windows restart, database couldn't be shut down properly and recovering actions are performed on the next startup. But if we control = the service manually by the Windows Services snap-in, no problem with the correct DB cluster shutdown. Looks like there's something wrong with the correct autovacuum process shutdown which leads to the DB inconsistency. *** Windows restart: *** -- Stopping: 2014-03-28 11:37:34 MSK LOG: received fast shutdown request 2014-03-28 11:37:34 MSK LOG: aborting any active transactions 2014-03-28 11:37:34 MSK LOG: autovacuum launcher shutting down 2014-03-28 11:37:34 MSK ERROR: canceling statement due to user request It looks like Windows told the server to shutdown, but did not give it enough time to complete the process before killing it uncleanly. Can you make Windows shutdown more patient? Cheers, Jeff
On Mon, Mar 31, 2014 at 1:15 AM, Samokhin, Dmitry [MNPP Saturn] < sdl@mnppsaturn.ru> wrote: > Jeff, > > Due to a huge variety of services and their functionality, it's almost > impossible to know (or predict) how much time every service needs to > complete its init and deinit operations. Giving any "reasonable", > "average", "sufficient" time to all of them to do this work would result in > unreliability, on one hand, and waste of hardware and human resources, on > the other hand. Instead, the service workflow in Windows is designed > so services receive control events from the Service Control Manager and > then they report when they are actually up or down, so Windows is able > to continue the required either startup or shutdown process. For details, > see > http://msdn.microsoft.com/en-us/library/windows/desktop/ee126211%28v=vs.85%29.aspx > Sorry, I don't understand most of that, and don't see anything there about what happens when Windows itself is shutting down. But from the log, it is clear that PostgreSQL did get the notification it was supposed to exit (however it received that notification), but that it was not allowed to complete the process. Maybe Windows was told to reboot with a "force" flag that causes it to not wait, or maybe Windows decided that PostgreSQL was not communicating with the SCM properly and therefore deserved no mercy. Are you using the EDB installers, or compiling it yourself? Cheers, Jeff
> -----Original Message----- > From: Jeff Janes [mailto:jeff.janes@gmail.com] > Sent: Monday, March 31, 2014 7:56 PM > To: sdl@mnppsaturn.ru > Cc: Pg Bugs > Subject: Re: [BUGS] BUG #9756: Inconsistent database after OS restart > > Are you using the EDB installers, or compiling it yourself? > > > Cheers, > > > Jeff Only prebuilt EDB installer, postgresql-9.2.8-1-windows.exe. I can produce a more verbose log, if that helps. Dmitry.
> -----Original Message----- > From: Jeff Janes [mailto:jeff.janes@gmail.com]=20 > Sent: Monday, March 31, 2014 7:56 PM > To: sdl@mnppsaturn.ru > Cc: Pg Bugs > Subject: Re: [BUGS] BUG #9756: Inconsistent database after OS restart > > Sorry, I don't understand most of that, and don't see anything there = about what happens when Windows itself is shutting down. > But from the log, it is clear that PostgreSQL did get the notification = it was supposed to exit (however it received that > notification), but that it was not allowed to complete the process. = Maybe Windows was told to reboot with a "force" flag > that causes it to not wait, or maybe Windows decided that PostgreSQL = was not communicating with the SCM properly and > therefore deserved no mercy. =20 Some more useful info: http://msdn.microsoft.com/en-us/library/windows/desktop/ms685149%28v=3Dvs= .85%2 9.aspx This article says that there is a registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTime= out which controls the time limit for services cleanup when the operating = system is rebooting (see also http://technet.microsoft.com/en-us/library/cc976045.aspx). Its default = value is 20000 (20 s), while Postgres usually needs 1-3 seconds to completely stop. Maybe there are certain cases when the autovacuum subprocess is somehow = shut down by the Postgres control module after the whole cluster is done and = this behaviour affects the DB integrity. Dmitry.