Thread: BUG #2419: could not reattach to shared memory
The following bug has been logged online: Bug reference: 2419 Logged by: Andy Male Email address: andy@ubic.co.uk PostgreSQL version: 8.1 Operating system: Windows 2003 Server (standard) SP1 Description: could not reattach to shared memory Details: FULL ERROR IN WINDOWS EVENT LOG - The description for Event ID ( 0 ) in Source ( PostgreSQL ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: FATAL: could not reattach to shared memory (key=5432001, addr=01960000): Invalid argument There is no correspondng error in pg_log. Once the error happens the database become unreponsive, current connections stop responding and you cannot make new connections. Stopping and starting the database removes the error and normal operation can continue. The error happens intermittently (three or four times a day) and doesn't seem to have a specific cause. The database is not heavily used and is processing around 100 transactions per minute.
I wish I had more to suggest to you. We are working on a few problems with semaphore on Win2003 SP1, but nothing related to shared memory. --------------------------------------------------------------------------- Andy Male wrote: > > The following bug has been logged online: > > Bug reference: 2419 > Logged by: Andy Male > Email address: andy@ubic.co.uk > PostgreSQL version: 8.1 > Operating system: Windows 2003 Server (standard) SP1 > Description: could not reattach to shared memory > Details: > > FULL ERROR IN WINDOWS EVENT LOG - > > The description for Event ID ( 0 ) in Source ( PostgreSQL ) cannot be found. > The local computer may not have the necessary registry information or > message DLL files to display messages from a remote computer. You may be > able to use the /AUXSOURCE= flag to retrieve this description; see Help and > Support for details. The following information is part of the event: FATAL: > could not reattach to shared memory (key=5432001, addr=01960000): Invalid > argument > > There is no correspondng error in pg_log. > > Once the error happens the database become unreponsive, current connections > stop responding and you cannot make new connections. Stopping and starting > the database removes the error and normal operation can continue. > > The error happens intermittently (three or four times a day) and doesn't > seem to have a specific cause. The database is not heavily used and is > processing around 100 transactions per minute. > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, Thanks for your response. I have enabled additional logging and have reduced the connections (max_connections) to 50 to reduce the memory overhead. There should not be more than 8 connections at once and I wanted to rule out some issue where the client wasn't releasing the connections. The error has reduced in frequency but still happens. The following log shows that there was a problem writing to the log file due to permissions problem. Obviously this is not the case because the log file was accessible before and after the error occurs since it contains further logging about the problem. The subsequent message in the log suggests something is wrong with the client application (or I may be misinterpreting this message) - "This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information." Then there are various other server error/warnings and a report of "possibly corrupted shared memory" which is the original problem reported. I am not sure what to do, I do not know how to debug the problem and currently Postgres is un-useable in a production environment. I am considering recoding the app to use an alternate database, but that is going to be a fairly lengthy process just to prove the problem is with the database rather than the client application. I can't actually see how a client application could or should be able to cause a memory error in the database. Any help would be most appreciated. Thanks Andy ****** LOG FILE SNIP ****** 2006-05-07 23:44:19 10.10.12.100(4018)LOG: 00000: statement: EXECUTE npgsqlportal1 [PREPARE: select * from fn_Driver_Session_Updated($1::int8,$2::bool)] 2006-05-07 23:44:19 10.10.12.100(4018)LOCATION: exec_execute_message, postgres.c:1718 2006-05-07 23:44:20 10.10.12.100(4018)PANIC: 42501: could not write to log file 0, segment 90 at offset 2998272, length 8192: Permission denied 2006-05-07 23:44:20 10.10.12.100(4018)CONTEXT: writing block 75 of relation 1663/20632/100738 SQL statement "update tbl_query_ui_consumer_session_mapping set ui_consumer_session_data_action_type_id = 3, client_received = false where session_id = $1 and client_session_id = $2 " PL/pgSQL function "fn_driver_session_updated" line 58 at SQL statement 2006-05-07 23:44:20 10.10.12.100(4018)LOCATION: XLogWrite, xlog.c:1474 2006-05-07 23:44:20 10.10.12.100(4018)STATEMENT: select * from fn_Driver_Session_Updated($1::int8,$2::bool) This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 2006-05-07 23:44:21 LOG: 00000: server process (PID 5924) was terminated by signal 3 2006-05-07 23:44:21 LOCATION: LogChildExit, postmaster.c:2425 2006-05-07 23:44:21 LOG: 00000: terminating any other active server processes 2006-05-07 23:44:21 LOCATION: HandleChildCrash, postmaster.c:2306 2006-05-07 23:44:21 10.10.10.100(2467)WARNING: 57P02: terminating connection because of crash of another server process 2006-05-07 23:44:21 10.10.10.100(2467)DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2006-05-07 23:44:21 10.10.10.100(2467)HINT: In a moment you should be able to reconnect to the database and repeat your command. 2006-05-07 23:44:21 10.10.10.100(2467)LOCATION: quickdie, postgres.c:2103 2006-05-07 23:44:21 10.10.10.100(2466)WARNING: 57P02: terminating connection because of crash of another server process 2006-05-07 23:44:21 10.10.10.100(2466)DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2006-05-07 23:44:21 10.10.10.100(2466)HINT: In a moment you should be able to reconnect to the database and repeat your command. 2006-05-07 23:44:21 10.10.10.100(2466)LOCATION: quickdie, postgres.c:2103 2006-05-07 23:44:21 10.10.12.100(4021)WARNING: 57P02: terminating connection because of crash of another server process -----Original Message----- From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] Sent: 07 May 2006 00:16 To: Andy Male Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #2419: could not reattach to shared memory I wish I had more to suggest to you. We are working on a few problems with semaphore on Win2003 SP1, but nothing related to shared memory.
Andy Male wrote: Hi Andy, This is your problem: > 2006-05-07 23:44:20 10.10.12.100(4018)PANIC: 42501: could not write to log > file 0, segment 90 at offset 2998272, length 8192: Permission denied > 2006-05-07 23:44:20 10.10.12.100(4018)CONTEXT: writing block 75 of relation > 1663/20632/100738 > SQL statement "update tbl_query_ui_consumer_session_mapping set > ui_consumer_session_data_action_type_id = 3, client_received = false where > session_id = $1 and client_session_id = $2 " > PL/pgSQL function "fn_driver_session_updated" line 58 at SQL > statement > 2006-05-07 23:44:20 10.10.12.100(4018)LOCATION: XLogWrite, xlog.c:1474 > 2006-05-07 23:44:20 10.10.12.100(4018)STATEMENT: select * from > fn_Driver_Session_Updated($1::int8,$2::bool) The "Permission denied" message is a report Postgres is getting from the operating system. Notice it is marked as PANIC -- an unrecoverable error. What you should be investigating is why does the operating system reject the writing of that file. It clearly is a database file; try looking at the files named $PGDATA/base/20632/100738 or possibly $PGDATA/pg_tblspc/1663/20632/100738. What permissions do those files have? Who owns them? If it's not the user who runs the Postgres processes, or they are not accesible to it, then something else in the system changed that, which is what you need to figure out and disable. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, 2006-05-08 at 08:31 -0400, Alvaro Herrera wrote: > > 2006-05-07 23:44:20 10.10.12.100(4018)PANIC: 42501: could not write to log > > file 0, segment 90 at offset 2998272, length 8192: Permission denied This is a pg_xlog error, so it looks like you have a whole-system issue, not just isolated tables. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Hi, Thank you for your replies. I accept that the "Permission denied" problem does suggest that the DB error may be caused by the OS somehow. There is no problem with the permissions/ownership of the files because the Postgres account created and owns and those files; this rules out any sort of security problem. It is possible that the file is inaccessible through some other reason and that Postgres is merely reporting that it can't access the file and, that it 'could' be caused by a permissions problem rather than it 'is' a permissions problem. The error log information in this case isn't really very useful since it doesn't accurately report the real cause of the error. The OS doesn't report any other errors and there are no other systems problem or file access problems; the only problem lies within the database. To try and reproduce the problem on another machine, I did a new install of the same version of Postgres (8.1.3) and dump/restored the database onto this new server. So far it has been running with the same load and activity for almost 30 hours and the problem has not surfaced. In theory, Postgres and the database are identical and therefore, the fact that it doesn't error in the same way does confirm this is an OS problem (assuming the problem doesn't occur at some point in the future). The two servers are identical hardware and have the same version of OS, Windows Server 2003 SP1. None of this helps me because I still have a production server on which I can't run the database since I can't debug the error. Any suggestions? Thank you for your assistance. Andy
"Andy" <andy@otelex.co.uk> writes: > To try and reproduce the problem on another machine, I did a new install of > the same version of Postgres (8.1.3) and dump/restored the database onto > this new server. So far it has been running with the same load and activity > for almost 30 hours and the problem has not surfaced. In theory, Postgres > and the database are identical and therefore, the fact that it doesn't error > in the same way does confirm this is an OS problem (assuming the problem > doesn't occur at some point in the future). The two servers are identical > hardware and have the same version of OS, Windows Server 2003 SP1. We've seen reports of intermittent permission failures on Windows being caused by broken antivirus software. What security software have you got on those machines, and does the failure go away if you remove it? regards, tom lane
I have removed the AV software and the problem doesn't go away, it does seem to happen less frequently but is still there. It is the same error - "FATAL: could not reattach to shared memory (key=5432001, addr=01960000): Invalid argument" The referenced key and address are always the same, is there some way we can view the data being stored in the memory, or look at what data was stored in the memory, and where, to see if the reference is valid? Regards Andy "Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us> writes: >> We've seen reports of intermittent permission failures on Windows being >> caused by broken antivirus software. What security software have you >> got on those machines, and does the failure go away if you remove it? "