Thread: BUG #3427: Autovacuum crashed server
The following bug has been logged online: Bug reference: 3427 Logged by: Rainer Bauer Email address: usenet@munnin.com PostgreSQL version: 8.2.4 Operating system: WinXP SP2 Description: Autovacuum crashed server Details: Hello, My server was running now without any problems since Mid-May: "PostgreSQL 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special)" Today was the first time it crashed (see log file below). Unfortunatelly I changed log_min_messages to info yesterday, otherwise it would contain more info. At the time of the crash there were 4 connections open to the database and at least one was "probably" doing an update. I searched the archive and only found these threads, which are not directly connected to Autovacuum: <http://archives.postgresql.org/pgsql-bugs/2006-12/msg00163.php> <http://archives.postgresql.org/pgsql-jdbc/2006-10/msg00109.php> <http://archives.postgresql.org/pgsql-hackers/2006-10/msg01306.php> One thing to note is that I have enabled "UseServerSidePrepare" for ODBC connections last week, because Hiroshi Inoue recommended it: <http://archives.postgresql.org/pgsql-odbc/2007-06/msg00052.php> Since the server never crashed before I thought this might have something to do with the failure. Rainer ============================================== These are my autovacuum settings: autovacuum = on autovacuum_vacuum_scale_factor = 0.15 autovacuum_analyze_scale_factor = 0.07 ============================================== 2007-07-04 09:21:19 LOG: database system was shut down at 2007-07-03 23:45:14 Central Europe Standard Time 2007-07-04 09:21:19 LOG: checkpoint record is at A/31B884C8 2007-07-04 09:21:19 LOG: redo record is at A/31B884C8; undo record is at 0/0; shutdown TRUE 2007-07-04 09:21:19 LOG: next transaction ID: 0/27694933; next OID: 127547 2007-07-04 09:21:19 LOG: next MultiXactId: 1; next MultiXactOffset: 0 2007-07-04 09:21:20 LOG: database system is ready 2007-07-04 10:53:00 LOG: autovacuum process (PID 1712) exited with exit code -1073741819 2007-07-04 10:53:00 LOG: terminating any other active server processes 2007-07-04 10:53:00 WARNING: terminating connection because of crash of another server process 2007-07-04 10:53:00 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2007-07-04 10:53:00 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-07-04 10:53:00 WARNING: terminating connection because of crash of another server process 2007-07-04 10:53:00 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2007-07-04 10:53:00 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-07-04 10:53:00 WARNING: terminating connection because of crash of another server process 2007-07-04 10:53:00 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2007-07-04 10:53:00 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-07-04 10:53:00 WARNING: terminating connection because of crash of another server process 2007-07-04 10:53:00 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2007-07-04 10:53:00 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-07-04 10:53:01 LOG: all server processes terminated; reinitializing 2007-07-04 10:53:01 LOG: database system was interrupted at 2007-07-04 10:53:00 Central Europe Standard Time 2007-07-04 10:53:01 LOG: checkpoint record is at A/3C572C78 2007-07-04 10:53:01 LOG: redo record is at A/3C3AD8F8; undo record is at 0/0; shutdown FALSE 2007-07-04 10:53:01 LOG: next transaction ID: 0/27849484; next OID: 135739 2007-07-04 10:53:01 LOG: next MultiXactId: 1; next MultiXactOffset: 0 2007-07-04 10:53:01 LOG: database system was not properly shut down; automatic recovery in progress 2007-07-04 10:53:01 LOG: redo starts at A/3C3AD8F8 2007-07-04 10:53:01 LOG: record with zero length at A/3D1F1E60 2007-07-04 10:53:01 LOG: redo done at A/3D1F1E30 2007-07-04 10:53:06 LOG: database system is ready
Rainer Bauer wrote: > 2007-07-04 09:21:19 LOG: database system was shut down at 2007-07-03 > 23:45:14 Central Europe Standard Time > 2007-07-04 09:21:19 LOG: checkpoint record is at A/31B884C8 > 2007-07-04 09:21:19 LOG: redo record is at A/31B884C8; undo record is at > 0/0; shutdown TRUE > 2007-07-04 09:21:19 LOG: next transaction ID: 0/27694933; next OID: 127547 > 2007-07-04 09:21:19 LOG: next MultiXactId: 1; next MultiXactOffset: 0 > 2007-07-04 09:21:20 LOG: database system is ready > 2007-07-04 10:53:00 LOG: autovacuum process (PID 1712) exited with exit > code -1073741819 Is there any way to figure out what "exit code -1073741819" means? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
spam@munnin.com wrote: > Hello Alvaro, > > sorry to responding directly, but I haven't subscribed to the bugs > mailing list. Please CC the list anyway, so that 1. it is classified correctly on my end, and 2. so that somebody approves it and it ends up being posted to the list eventually. Note: I changed the reply address to the original one you used. spam@munnin.com sounds suspicious. > AH> Is there any way to figure out what "exit code -1073741819" means? > > Yes, -1073741819 is 0xc0000005 aka "Access Violation". Oh, OK. How can this happen? I don't know much about Windows. Maybe you have some antivirus is checking the file at the same time Autovac is checking it? Also, it would be good to have a more complete report of what happenned. Can you reproduce this problem? If so, please provide the whole ERROR report (change log_min_messages I guess). Did it fail trying to open a file? -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en La Feria de las Tinieblas, R. Bradbury)
Alvaro Herrera wrote: >> AH> Is there any way to figure out what "exit code -1073741819" means? >> >> Yes, -1073741819 is 0xc0000005 aka "Access Violation". > > Oh, OK. How can this happen? I don't know much about Windows. Maybe > you have some antivirus is checking the file at the same time Autovac is > checking it? Note that Windows can (and often will) give you Access Violation when you try to access any memory outside of your address-space (since that's technically an access violation), so it can be whatever would cause segmentation fault on unix. //Magnus
Hello Alvaro, >> AH> Is there any way to figure out what "exit code -1073741819" means? >> >> Yes, -1073741819 is 0xc0000005 aka "Access Violation". > >Oh, OK. How can this happen? I don't know much about Windows. Like Magnus already wrote: accessing memory outside your adress space gives you an Access Violation on Windows. However, most of the time this means a NULL pointer was dereferenced. >Maybe you have some antivirus is checking the file at the same time Autovac is checking it? Yes this could be possible. The postgres server is running on my developer machine together with an Antivirus software. But like I said this machine was running now since May without any problems (i.e. Postgres never crashed). >Also, it would be good to have a more complete report of what happenned. >Can you reproduce this problem? No. The server and the 2 programs accessing the server were happily running for several weeks without any problem. >If so, please provide the whole ERROR report (change log_min_messages I guess). >Did it fail trying to open a file? What I can do is adjust log_min_messages and report back if Autovacuum should ever crash again. What value do you want it to be? And should I also adjust log_min_error_statement or any other setting (I have enough space left on the log partition, so there shouldn't be a problem). Rainer