Thread: BUG #2419: could not reattach to shared memory

BUG #2419: could not reattach to shared memory

From
"Andy Male"
Date:
The following bug has been logged online:

Bug reference:      2419
Logged by:          Andy Male
Email address:      andy@ubic.co.uk
PostgreSQL version: 8.1
Operating system:   Windows 2003 Server (standard) SP1
Description:        could not reattach to shared memory
Details:

FULL ERROR IN WINDOWS EVENT LOG -

The description for Event ID ( 0 ) in Source ( PostgreSQL ) cannot be found.
The local computer may not have the necessary registry information or
message DLL files to display messages from a remote computer. You may be
able to use the /AUXSOURCE= flag to retrieve this description; see Help and
Support for details. The following information is part of the event: FATAL:
could not reattach to shared memory (key=5432001, addr=01960000): Invalid
argument

There is no correspondng error in pg_log.

Once the error happens the database become unreponsive, current connections
stop responding and you cannot make new connections.  Stopping and starting
the database removes the error and normal operation can continue.

The error happens intermittently (three or four times a day) and doesn't
seem to have a specific cause.  The database is not heavily used and is
processing around 100 transactions per minute.

Re: BUG #2419: could not reattach to shared memory

From
Bruce Momjian
Date:
I wish I had more to suggest to you.  We are working on a few problems
with semaphore on Win2003 SP1, but nothing related to shared memory.

---------------------------------------------------------------------------

Andy Male wrote:
>
> The following bug has been logged online:
>
> Bug reference:      2419
> Logged by:          Andy Male
> Email address:      andy@ubic.co.uk
> PostgreSQL version: 8.1
> Operating system:   Windows 2003 Server (standard) SP1
> Description:        could not reattach to shared memory
> Details:
>
> FULL ERROR IN WINDOWS EVENT LOG -
>
> The description for Event ID ( 0 ) in Source ( PostgreSQL ) cannot be found.
> The local computer may not have the necessary registry information or
> message DLL files to display messages from a remote computer. You may be
> able to use the /AUXSOURCE= flag to retrieve this description; see Help and
> Support for details. The following information is part of the event: FATAL:
> could not reattach to shared memory (key=5432001, addr=01960000): Invalid
> argument
>
> There is no correspondng error in pg_log.
>
> Once the error happens the database become unreponsive, current connections
> stop responding and you cannot make new connections.  Stopping and starting
> the database removes the error and normal operation can continue.
>
> The error happens intermittently (three or four times a day) and doesn't
> seem to have a specific cause.  The database is not heavily used and is
> processing around 100 transactions per minute.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: BUG #2419: could not reattach to shared memory

From
"Andy Male"
Date:
Hi,

Thanks for your response.  I have enabled additional logging and have
reduced the connections (max_connections) to 50 to reduce the memory
overhead.  There should not be more than 8 connections at once and I wanted
to rule out some issue where the client wasn't releasing the connections.

The error has reduced in frequency but still happens.  The following log
shows that there was a problem writing to the log file due to permissions
problem.  Obviously this is not the case because the log file was accessible
before and after the error occurs since it contains further logging about
the problem.

The subsequent message in the log suggests something is wrong with the
client application (or I may be misinterpreting this message) -

"This application has requested the Runtime to terminate it in an unusual
way. Please contact the application's support team for more information."

Then there are various other server error/warnings and a report of "possibly
corrupted shared memory" which is the original problem reported.

I am not sure what to do, I do not know how to debug the problem and
currently Postgres is un-useable in a production environment.  I am
considering recoding the app to use an alternate database, but that is going
to be a fairly lengthy process just to prove the problem is with the
database rather than the client application.  I can't actually see how a
client application could or should be able to cause a memory error in the
database.  Any help would be most appreciated.

Thanks
Andy

****** LOG FILE SNIP ******

2006-05-07 23:44:19 10.10.12.100(4018)LOG:  00000: statement: EXECUTE
npgsqlportal1  [PREPARE:  select * from
fn_Driver_Session_Updated($1::int8,$2::bool)]
2006-05-07 23:44:19 10.10.12.100(4018)LOCATION:  exec_execute_message,
postgres.c:1718
2006-05-07 23:44:20 10.10.12.100(4018)PANIC:  42501: could not write to log
file 0, segment 90 at offset 2998272, length 8192: Permission denied
2006-05-07 23:44:20 10.10.12.100(4018)CONTEXT:  writing block 75 of relation
1663/20632/100738
    SQL statement "update tbl_query_ui_consumer_session_mapping set
ui_consumer_session_data_action_type_id = 3, client_received = false where
session_id =  $1  and client_session_id =  $2 "
    PL/pgSQL function "fn_driver_session_updated" line 58 at SQL
statement
2006-05-07 23:44:20 10.10.12.100(4018)LOCATION:  XLogWrite, xlog.c:1474
2006-05-07 23:44:20 10.10.12.100(4018)STATEMENT:  select * from
fn_Driver_Session_Updated($1::int8,$2::bool)

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.

2006-05-07 23:44:21 LOG:  00000: server process (PID 5924) was terminated by
signal 3

2006-05-07 23:44:21 LOCATION:  LogChildExit, postmaster.c:2425

2006-05-07 23:44:21 LOG:  00000: terminating any other active server
processes

2006-05-07 23:44:21 LOCATION:  HandleChildCrash, postmaster.c:2306

2006-05-07 23:44:21 10.10.10.100(2467)WARNING:  57P02: terminating
connection because of crash of another server process

2006-05-07 23:44:21 10.10.10.100(2467)DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

2006-05-07 23:44:21 10.10.10.100(2467)HINT:  In a moment you should be able
to reconnect to the database and repeat your command.

2006-05-07 23:44:21 10.10.10.100(2467)LOCATION:  quickdie, postgres.c:2103

2006-05-07 23:44:21 10.10.10.100(2466)WARNING:  57P02: terminating
connection because of crash of another server process

2006-05-07 23:44:21 10.10.10.100(2466)DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

2006-05-07 23:44:21 10.10.10.100(2466)HINT:  In a moment you should be able
to reconnect to the database and repeat your command.

2006-05-07 23:44:21 10.10.10.100(2466)LOCATION:  quickdie, postgres.c:2103

2006-05-07 23:44:21 10.10.12.100(4021)WARNING:  57P02: terminating
connection because of crash of another server process


-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: 07 May 2006 00:16
To: Andy Male
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #2419: could not reattach to shared memory


I wish I had more to suggest to you.  We are working on a few problems
with semaphore on Win2003 SP1, but nothing related to shared memory.

Re: BUG #2419: could not reattach to shared memory

From
Alvaro Herrera
Date:
Andy Male wrote:

Hi Andy,

This is your problem:

> 2006-05-07 23:44:20 10.10.12.100(4018)PANIC:  42501: could not write to log
> file 0, segment 90 at offset 2998272, length 8192: Permission denied
> 2006-05-07 23:44:20 10.10.12.100(4018)CONTEXT:  writing block 75 of relation
> 1663/20632/100738
>     SQL statement "update tbl_query_ui_consumer_session_mapping set
> ui_consumer_session_data_action_type_id = 3, client_received = false where
> session_id =  $1  and client_session_id =  $2 "
>     PL/pgSQL function "fn_driver_session_updated" line 58 at SQL
> statement
> 2006-05-07 23:44:20 10.10.12.100(4018)LOCATION:  XLogWrite, xlog.c:1474
> 2006-05-07 23:44:20 10.10.12.100(4018)STATEMENT:  select * from
> fn_Driver_Session_Updated($1::int8,$2::bool)

The "Permission denied" message is a report Postgres is getting from the
operating system.  Notice it is marked as PANIC -- an unrecoverable
error.  What you should be investigating is why does the operating
system reject the writing of that file.  It clearly is a database file;
try looking at the files named $PGDATA/base/20632/100738 or possibly
$PGDATA/pg_tblspc/1663/20632/100738.  What permissions do those files
have?  Who owns them?  If it's not the user who runs the Postgres
processes, or they are not accesible to it, then something else in the
system changed that, which is what you need to figure out and disable.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #2419: could not reattach to shared memory

From
Simon Riggs
Date:
On Mon, 2006-05-08 at 08:31 -0400, Alvaro Herrera wrote:
> > 2006-05-07 23:44:20 10.10.12.100(4018)PANIC:  42501: could not write to log
> > file 0, segment 90 at offset 2998272, length 8192: Permission denied

This is a pg_xlog error, so it looks like you have a whole-system issue,
not just isolated tables.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: BUG #2419: could not reattach to shared memory

From
"Andy"
Date:
Hi,



Thank you for your replies.



I accept that the "Permission denied" problem does suggest that the DB error
may be caused by the OS somehow.



There is no problem with the permissions/ownership of the files because the
Postgres account created and owns and those files; this rules out any sort
of security problem.  It is possible that the file is inaccessible through
some other reason and that Postgres is merely reporting that it can't access
the file and, that it 'could' be caused by a permissions problem rather than
it 'is' a permissions problem.  The error log information in this case isn't
really very useful since it doesn't accurately report the real cause of the
error.  The OS doesn't report any other errors and there are no other
systems problem or file access problems; the only problem lies within the
database.



To try and reproduce the problem on another machine, I did a new install of
the same version of Postgres (8.1.3) and dump/restored the database onto
this new server.  So far it has been running with the same load and activity
for almost 30 hours and the problem has not surfaced.  In theory, Postgres
and the database are identical and therefore, the fact that it doesn't error
in the same way does confirm this is an OS problem (assuming the problem
doesn't occur at some point in the future).  The two servers are identical
hardware and have the same version of OS, Windows Server 2003 SP1.



None of this helps me because I still have a production server on which I
can't run the database since I can't debug the error.  Any suggestions?



Thank you for your assistance.

Andy

Re: BUG #2419: could not reattach to shared memory

From
Tom Lane
Date:
"Andy" <andy@otelex.co.uk> writes:
> To try and reproduce the problem on another machine, I did a new install of
> the same version of Postgres (8.1.3) and dump/restored the database onto
> this new server.  So far it has been running with the same load and activity
> for almost 30 hours and the problem has not surfaced.  In theory, Postgres
> and the database are identical and therefore, the fact that it doesn't error
> in the same way does confirm this is an OS problem (assuming the problem
> doesn't occur at some point in the future).  The two servers are identical
> hardware and have the same version of OS, Windows Server 2003 SP1.

We've seen reports of intermittent permission failures on Windows being
caused by broken antivirus software.  What security software have you
got on those machines, and does the failure go away if you remove it?

            regards, tom lane

Re: BUG #2419: could not reattach to shared memory

From
"Andy"
Date:
I have removed the AV software and the problem doesn't go away, it does seem
to happen less frequently but is still there.  It is the same error -



"FATAL:  could not reattach to shared memory (key=5432001, addr=01960000):
Invalid argument"



The referenced key and address are always the same, is there some way we can
view the data being stored in the memory, or look at what data was stored in
the memory, and where, to see if the reference is valid?



Regards

Andy





"Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us> writes:

>> We've seen reports of intermittent permission failures on Windows being

>> caused by broken antivirus software.  What security software have you

>> got on those machines, and does the failure go away if you remove it? "