Thread: FATAL: could not reattach to shared memory (Win32)

FATAL: could not reattach to shared memory (Win32)

From
Terry Yapt
Date:
Hello all,

I am having problems with the next postgresql version:

pg version: 8.2.4
OS: Win32 (windows xp sp2)
FS: NTFS

It is a production server, but suddenly the DB stop answering to any sql
command.  It seems dead.  After restart server all starts to works again.

I am looking for system errors and nothing is there.  But I have a lot
of messages on system APP errors.  The error is the same every ten
seconds or so.

This is the main error:
* FATAL:  could not reattach to shared memory (key=5432001,
addr=01D80000): Invalid argument

It is always followed by this another system-app error:
* LOG:  unrecognized win32 error code: 487

I have found this on my intensive internet search:
http://archives.postgresql.org/pgsql-bugs/2007-01/msg00032.php

I need to solve this ASAP.  Anybody have any idea about this ?

Thanks.


Re: FATAL: could not reattach to shared memory (Win32)

From
Alvaro Herrera
Date:
Terry Yapt wrote:

> I am looking for system errors and nothing is there.  But I have a lot of
> messages on system APP errors.  The error is the same every ten seconds or
> so.
>
> This is the main error:
> * FATAL:  could not reattach to shared memory (key=5432001, addr=01D80000):
> Invalid argument

Please run "ipcs" on a command line window and paste the results.

I see a minor problem in that code: we are invoking two system calls
(shmget and shmat) but the log does not say which one failed.  However
in this case it seems only shmget could be returning EINVAL.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: FATAL: could not reattach to shared memory (Win32)

From
Terry Yapt
Date:
Sorry, I have not be able to execute "ipcs" on windows.  it doesn't
exists.  I have tried to find some utility that gives me the same
information or any ipcs porting to win32, but I haven't had any luck.

If I can do something more to get help, please tell me.

Greetings.


Alvaro Herrera escribió:
> Terry Yapt wrote:
>
>
>> I am looking for system errors and nothing is there.  But I have a lot of
>> messages on system APP errors.  The error is the same every ten seconds or
>> so.
>>
>> This is the main error:
>> * FATAL:  could not reattach to shared memory (key=5432001, addr=01D80000):
>> Invalid argument
>>
>
> Please run "ipcs" on a command line window and paste the results.
>
> I see a minor problem in that code: we are invoking two system calls
> (shmget and shmat) but the log does not say which one failed.  However
> in this case it seems only shmget could be returning EINVAL.
>
>


Re: FATAL: could not reattach to shared memory (Win32)

From
Alvaro Herrera
Date:
Terry Yapt wrote:

> This is the main error:
> * FATAL:  could not reattach to shared memory (key=5432001, addr=01D80000):
> Invalid argument
>
> It is always followed by this another system-app error:
> * LOG:  unrecognized win32 error code: 487

FWIW,
http://help.netop.com/support/errorcodes/win32_error_codes.htm

says
487     Attempt to access invalid address.     ERROR_INVALID_ADDRESS

This problem has been reported before, for example in

http://bbs.chinaunix.net/thread-973003-1-1.html
(not that I can read it very well)

and

http://lists.pgfoundry.org/pipermail/brasil-usuarios/20061127/003150.html

No resolution seems to have been found.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: FATAL: could not reattach to shared memory (Win32)

From
Terry Yapt
Date:
Alvaro Herrera escribió:
> Terry Yapt wrote:
>
>
>> This is the main error:
>> * FATAL:  could not reattach to shared memory (key=5432001, addr=01D80000):
>> Invalid argument
>>
>> It is always followed by this another system-app error:
>> * LOG:  unrecognized win32 error code: 487
>>
>
> This problem has been reported before, for example in
>
> http://bbs.chinaunix.net/thread-973003-1-1.html
> (not that I can read it very well)
>
> and
>
> http://lists.pgfoundry.org/pipermail/brasil-usuarios/20061127/003150.html
>
>
Yes, those are the same than here:
http://archives.postgresql.org/pgsql-bugs/2007-01/msg00032.php

> No resolution seems to have been found.
>
Then, I am very worried now.   :-|

Thanks Alvaro.

Re: FATAL: could not reattach to shared memory (Win32)

From
Magnus Hagander
Date:
Alvaro Herrera wrote:
> Terry Yapt wrote:
>
>> This is the main error:
>> * FATAL:  could not reattach to shared memory (key=5432001, addr=01D80000):
>> Invalid argument
>>
>> It is always followed by this another system-app error:
>> * LOG:  unrecognized win32 error code: 487
>
> FWIW,
> http://help.netop.com/support/errorcodes/win32_error_codes.htm
>
> says
> 487     Attempt to access invalid address.     ERROR_INVALID_ADDRESS
>
> This problem has been reported before, for example in
>
> http://bbs.chinaunix.net/thread-973003-1-1.html
> (not that I can read it very well)
>
> and
>
> http://lists.pgfoundry.org/pipermail/brasil-usuarios/20061127/003150.html
>
> No resolution seems to have been found.

8.3 will have a new way to deal with shared mem on win32. It's the same
underlying tech, but we're no longer trying to squeeze it into an
emulation of sysv. With a bit of luck, that'll help :-)

//Magnus


Re: FATAL: could not reattach to shared memory (Win32)

From
Alvaro Herrera
Date:
Magnus Hagander wrote:
> Alvaro Herrera wrote:

> > No resolution seems to have been found.
>
> 8.3 will have a new way to deal with shared mem on win32. It's the same
> underlying tech, but we're no longer trying to squeeze it into an
> emulation of sysv. With a bit of luck, that'll help :-)

So you're saying we won't fix this bug in 8.2?  That seems unfortunate,
given that 8.2 is still supposed to be supported on Windows.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: FATAL: could not reattach to shared memory (Win32)

From
Shelby Cain
Date:
>----- Original Message ----
>From: Magnus Hagander <magnus@hagander.net>
>To: Alvaro Herrera <alvherre@commandprompt.com>
>Cc: Terry Yapt <yapt@technovell.com>; pgsql-general@postgresql.org
>Sent: Thursday, August 23, 2007 3:43:32 PM
>Subject: Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)
>
>
>8.3 will have a new way to deal with shared mem on win32. It's the same
>underlying tech, but we're no longer trying to squeeze it into an
>emulation of sysv. With a bit of luck, that'll help :-)
>
>//Magnus
>

Wild guess on my part... could that error be the result of an attempt to map shared memory into a process at a fixed
locationthat just happens to already be occupied by a dll that Windows had decided to relocate? 

Regards,

Shelby Cain




____________________________________________________________________________________
Pinpoint customers who are looking for what you sell.
http://searchmarketing.yahoo.com/

Re: FATAL: could not reattach to shared memory (Win32)

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Magnus Hagander wrote:
>> 8.3 will have a new way to deal with shared mem on win32. It's the same
>> underlying tech, but we're no longer trying to squeeze it into an
>> emulation of sysv. With a bit of luck, that'll help :-)

> So you're saying we won't fix this bug in 8.2?

Well, we certainly aren't going to back-patch a major rewrite that
(1) hasn't made it through beta testing, and (2) is not actually known
to fix the bug.  When and if those gating conditions stop being true,
maybe we could consider a back-patch.

But at the moment this is all speculation ... I counsel concentrating
on finding out what's really happening on Terry's machine, before trying
to guess whether we already have a fix written.

            regards, tom lane

Re: FATAL: could not reattach to shared memory (Win32)

From
Magnus Hagander
Date:
Shelby Cain wrote:
>> ----- Original Message ---- From: Magnus Hagander
>> <magnus@hagander.net> To: Alvaro Herrera
>> <alvherre@commandprompt.com> Cc: Terry Yapt <yapt@technovell.com>;
>> pgsql-general@postgresql.org Sent: Thursday, August 23, 2007
>> 3:43:32 PM Subject: Re: [GENERAL] FATAL: could not reattach to
>> shared memory (Win32)
>>
>>
>> 8.3 will have a new way to deal with shared mem on win32. It's the
>> same underlying tech, but we're no longer trying to squeeze it into
>> an emulation of sysv. With a bit of luck, that'll help :-)
>>
>> //Magnus
>>
>
> Wild guess on my part... could that error be the result of an attempt
> to map shared memory into a process at a fixed location that just
> happens to already be occupied by a dll that Windows had decided to
> relocate?

Not that wild a guess, really :-) I'd say it's a very good possibility -
but I have no idea why it'd do that, since all backends load the same
DLLs at that stage.

//Magnus


Re: FATAL: could not reattach to shared memory (Win32)

From
"Trevor Talbot"
Date:
On 8/23/07, Magnus Hagander <magnus@hagander.net> wrote:
> Shelby Cain wrote:

> > Wild guess on my part... could that error be the result of an attempt
> > to map shared memory into a process at a fixed location that just
> > happens to already be occupied by a dll that Windows had decided to
> > relocate?
>
> Not that wild a guess, really :-) I'd say it's a very good possibility -
> but I have no idea why it'd do that, since all backends load the same
> DLLs at that stage.

Not a valid assumption; you can't rely on consistent VM space among
multiple [non-cloned] processes without a serious amount of effort.
Anything can use that space, it's not just file views.  Obviously it
happens to work some of the time, but when it doesn't, it doesn't.  I
gather postgres depends on it being at the same address, and fixing
that isn't trivial?

If everything relevant is going through the intriguing
internal_forkexec(), you could probably reserve address space there
before resuming the thread.  You'd want to combine this with picking
address space that's less likely to be used before creating the shared
memory section.  (Actually, if you're doing that, you might as well
just inject the backend variables too instead of going through the
mapped file gymnastics.)

Not a simple change, but would likely make this particular problem go
away (assuming this is the problem).  It's also the first time I've
looked at the source, so perhaps I missed something.

Re: FATAL: could not reattach to shared memory (Win32)

From
Gregory Stark
Date:
"Trevor Talbot" <quension@gmail.com> writes:

> I gather postgres depends on it being at the same address, and fixing that
> isn't trivial?

I haven't been following the rest of the thread so I'm not sure if this is
important. But no, fixing that should be relatively trivial as there are
already some configurations where it's not the case (the EXEC_BACKEND case I
believe). The rest of the system uses a shared memory base pointer and
references everything relative to that.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: FATAL: could not reattach to shared memory (Win32)

From
Bruce Momjian
Date:
Trevor Talbot wrote:
> On 8/23/07, Magnus Hagander <magnus@hagander.net> wrote:
> > Shelby Cain wrote:
>
> > > Wild guess on my part... could that error be the result of an attempt
> > > to map shared memory into a process at a fixed location that just
> > > happens to already be occupied by a dll that Windows had decided to
> > > relocate?
> >
> > Not that wild a guess, really :-) I'd say it's a very good possibility -
> > but I have no idea why it'd do that, since all backends load the same
> > DLLs at that stage.
>
> Not a valid assumption; you can't rely on consistent VM space among
> multiple [non-cloned] processes without a serious amount of effort.
> Anything can use that space, it's not just file views.  Obviously it
> happens to work some of the time, but when it doesn't, it doesn't.  I
> gather postgres depends on it being at the same address, and fixing
> that isn't trivial?
>
> If everything relevant is going through the intriguing
> internal_forkexec(), you could probably reserve address space there
> before resuming the thread.  You'd want to combine this with picking
> address space that's less likely to be used before creating the shared
> memory section.  (Actually, if you're doing that, you might as well
> just inject the backend variables too instead of going through the
> mapped file gymnastics.)
>
> Not a simple change, but would likely make this particular problem go
> away (assuming this is the problem).  It's also the first time I've
> looked at the source, so perhaps I missed something.

I think this is accurate.  When we created the Win32 native port there
was a lot of concern about how to handle shared memory in a BACKEND_EXEC
case, namely that postmaster children were not copies which had the same
shared memory mappings, but rather were new processes that had to attach
to shared memory at a fixed address.

The WIN32 solution was to create the shared memory in the parent, and
then pass that address value down to the children to use in attaching to
the existing segment.  We expected all sorts of problems with this but
in fact it seemed to work fine (most of the time).

As you can see it doesn't work 100% of the time, but it worked more
reliabily than we expected.  What we have been waiting for is someone
who can recreate a failure so we can track down how to best make it 100%
reliable, and as you can see, we haven't had a flood of problem reports
to track this down.

If you want to help make it 100% we will work with you to find the
solution.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: FATAL: could not reattach to shared memory (Win32)

From
Bruce Momjian
Date:
Gregory Stark wrote:
> "Trevor Talbot" <quension@gmail.com> writes:
>
> > I gather postgres depends on it being at the same address, and fixing that
> > isn't trivial?
>
> I haven't been following the rest of the thread so I'm not sure if this is
> important. But no, fixing that should be relatively trivial as there are
> already some configurations where it's not the case (the EXEC_BACKEND case I
> believe). The rest of the system uses a shared memory base pointer and
> references everything relative to that.

This is inaccurate, I believe.  The original Berkeley code did exec()
for backends and hence allowed shared memory to be at different
addresses for different backends, but we started using fork() and
eliminated much of that capability for performance and clarify reasons,
so right now all backends have to have shared memory at the same
address, and changing this will not be simple.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: FATAL: could not reattach to shared memory (Win32)

From
Tom Lane
Date:
"Trevor Talbot" <quension@gmail.com> writes:
> On 8/23/07, Magnus Hagander <magnus@hagander.net> wrote:
>> Not that wild a guess, really :-) I'd say it's a very good possibility -
>> but I have no idea why it'd do that, since all backends load the same
>> DLLs at that stage.

> Not a valid assumption; you can't rely on consistent VM space among
> multiple [non-cloned] processes without a serious amount of effort.

I'm not sure if you have a specific technical meaning of "clone" in mind
here, but these processes are all executing the identical executable,
and taking care to map the shmem early in execution *before* they load
any DLLs.  So it should work.  Apparently, it *does* work for awhile for
the OP, and then stops working, which is even odder.

> I gather postgres depends on it being at the same address, and fixing
> that isn't trivial?

That's correct, and not having to change it is not negotiable ---
finding a way to make this work was one of the gating factors that
made it practical to have a Windows port at all.

If you've got a specific suggestion for making it more reliable,
we're all ears.

            regards, tom lane

Re: FATAL: could not reattach to shared memory (Win32)

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Trevor Talbot" <quension@gmail.com> writes:
>> I gather postgres depends on it being at the same address, and fixing that
>> isn't trivial?

> I haven't been following the rest of the thread so I'm not sure if this is
> important. But no, fixing that should be relatively trivial as there are
> already some configurations where it's not the case (the EXEC_BACKEND case I
> believe). The rest of the system uses a shared memory base pointer and
> references everything relative to that.

That hasn't been the case for quite a few years, and we're not going back.
The pointer-to-offset-and-back gymnastics that that required were
utterly destructive to code readability and maintainability, mainly
because if everything stored in shmem data structures is an "offset"
then you can't get any useful error checking from the compiler about how
you are using the fields.  It's like decreeing that every pointer
must be declared "void *" and cast to something else when it's used.

There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
but I think it's mostly just that no one's bothered to rewrite the code
for SHM_QUEUE linked lists.  The vast majority of our shmem structures
use regular pointers, and have for years.

            regards, tom lane

Re: FATAL: could not reattach to shared memory (Win32)

From
Alvaro Herrera
Date:
Tom Lane escribió:

> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
> but I think it's mostly just that no one's bothered to rewrite the code
> for SHM_QUEUE linked lists.  The vast majority of our shmem structures
> use regular pointers, and have for years.

... except that, not knowing that, I wrote part of the new autovac code
using MAKE_PTR/OFFSET, and it needs to be rewritten eventually :-(

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: FATAL: could not reattach to shared memory (Win32)

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
> but I think it's mostly just that no one's bothered to rewrite the code
> for SHM_QUEUE linked lists.  The vast majority of our shmem structures
> use regular pointers, and have for years.

Ah, I happened to be recently in that code so I was mislead.

So even in EXEC_BACKEND we require that we can attach to the shared memory at
a specified location. hm.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: FATAL: could not reattach to shared memory (Win32)

From
Shelby Cain
Date:
>----- Original Message ----
>From: Magnus Hagander <magnus@hagander.net>
>To: Shelby Cain <alyandon@yahoo.com>
>Cc: Alvaro Herrera <alvherre@commandprompt.com>; Terry Yapt <yapt@technovell.com>; pgsql-general@postgresql.org
>Sent: Friday, August 24, 2007 1:08:44 AM
>Subject: Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)
>
>Not that wild a guess, really :-) I'd say it's a very good possibility -
>but I have no idea why it'd do that, since all backends load the same
>DLLs at that stage.
>
>//Magnus
>

Assuming this is an issue with shared libraries, I think it would
have more to do with the way Windows resolves address conflicts on process
startup than anything caused by explicit calls to LoadLibrary().  Looking
at postgres.exe with the dependency viewer from
Visual Studio 6, I see that the following shared library dependencies
embedded in the executable image that having conflicting base
addresses.  If I'm not mistaken, Windows will automatically relocate
these libraries prior to actual code execution so there would be no
opportunity for that particular instance of postgres.exe to map the shared memory if the address
space is already in use by a relocated dll.







libeay32.dll - 0x10000000



libiconv-2.dll - 0x10000000



libintl-2.dll - 0x10000000



ssleay32.dll - 0x10000000



comerr32.dll - 0x1c000000



krb5_32.dll - 0x1c000000







I also found a KB article that specifically addresses ERROR_INVALID_MEMORY being returned from MapViewOfFileEx().







http://support.microsoft.com/kb/125713





The article specifically addresses the concern where multiple processes
must use
the same address for mappings and how to accomplish that under
Windows.  Search for "Addresses of Mapped Views".  The only thing that
really gives me any pause is the fact the article hasn't been updated
past the NT 3.51/Windows 9x era but the underlying behavior might not have been changed in Windows 2000/XP/etc.







Regards,







Shelby Cain









____________________________________________________________________________________
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow

Re: FATAL: could not reattach to shared memory (Win32)

From
Shelby Cain
Date:
I apologize for resending this but my editor in combination with Yahoo's web mail interface horribly mangled it...

>----- Original Message ----
>From: Magnus Hagander <magnus@hagander.net>
>To: Shelby Cain <alyandon@yahoo.com>
>Cc: Alvaro Herrera <alvherre@commandprompt.com>; Terry Yapt <yapt@technovell.com>; pgsql-general@postgresql.org
>Sent: Friday, August 24, 2007 1:08:44 AM
>Subject: Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)
>
>Not that wild a guess, really :-) I'd say it's a very good possibility -
>but I have no idea why it'd do that, since all backends load the same
>DLLs at that stage.
>
>//Magnus
>

Assuming this is an issue with shared libraries, I think it would have more to do with the way Windows resolves address
conflictson process startup than anything caused by explicit calls to LoadLibrary().  Looking at postgres.exe with the
dependencyviewer from Visual Studio 6, I see that the following shared library dependencies embedded in the executable
imagethat having conflicting base addresses.  If I'm not mistaken, Windows will automatically relocate these libraries
priorto actual code execution so there would be no opportunity for that particular instance of postgres.exe to map the
sharedmemory if the address space is already in use by a relocated dll. 

libeay32.dll - 0x10000000
libiconv-2.dll - 0x10000000
libintl-2.dll - 0x10000000
ssleay32.dll - 0x10000000
comerr32.dll - 0x1c000000
krb5_32.dll - 0x1c000000

I also found a KB article that addresses ERROR_INVALID_MEMORY being returned from MapViewOfFileEx().

http://support.microsoft.com/kb/125713

The article specifically addresses the concern where multiple processes must use the same address for mappings and how
toaccomplish that under Windows.  Search for "Addresses of Mapped Views".  The only thing that really gives me any
pauseis the fact the article hasn't been updated past the NT 3.51/Windows 9x era but the underlying behavior might not
havebeen changed in Windows 2000/XP/etc. 

Regards,

Shelby Cain







____________________________________________________________________________________
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow

Re: FATAL: could not reattach to shared memory (Win32)

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
>> but I think it's mostly just that no one's bothered to rewrite the code
>> for SHM_QUEUE linked lists.  The vast majority of our shmem structures
>> use regular pointers, and have for years.

> Ah, I happened to be recently in that code so I was mislead.

IIRC, the reason for not bothering to change the SHM_QUEUE code (other
than inertia) was that it's a generic linked list package, and so if
it wasn't storing SHMEM_OFFSETs it'd be storing "void *"'s, and so there
didn't seem to be any traction to be gained in terms of compiler error
detection capability.  However, if both you and Alvaro were confused
about the liveness of that coding convention, maybe it'd be worth making
a push to eliminate all trace of MAKE_PTR/MAKE_OFFSET.  TODO for 8.4?

            regards, tom lane

Re: FATAL: could not reattach to shared memory (Win32)

From
Terry Yapt
Date:
Tom Lane escribió:
> I'm not sure if you have a specific technical meaning of "clone" in mind
> here, but these processes are all executing the identical executable,
> and taking care to map the shmem early in execution *before* they load
> any DLLs.  So it should work.  Apparently, it *does* work for awhile for
> the OP, and then stops working, which is even odder.
>
>
Yes, the windows system log (application log section) doesn't show any
error in several days.  Suddenly errors bring back to life and syslog
errors repeats every few time.  But again errors disappears and return
in a few hours.  After few hours the system goes out.

Curiosity:
======
On the log lines I have and I sent to the list:   * FATAL:  could not
reattach to shared memory (key=5432001, addr=01D80000): Invalid argument
, this one: "addr=01D80000" is always the same in spite of  the system
have been shutting down and restarted or the error was out for a days.

Greetings.

Re: FATAL: could not reattach to shared memory (Win32)

From
Tom Lane
Date:
Shelby Cain <alyandon@yahoo.com> writes:
> Assuming this is an issue with shared libraries, I think it would have more=
>  to do with the way Windows resolves address conflicts on process startup t=
> han anything caused by explicit calls to LoadLibrary().  Looking at postgre=
> s.exe with the dependency viewer from Visual Studio 6, I see that the follo=
> wing shared library dependencies embedded in the executable image that havi=
> ng conflicting base addresses.  If I'm not mistaken, Windows will automatic=
> ally relocate these libraries prior to actual code execution so there would=
>  be no opportunity for that particular instance of postgres.exe to map the =
> shared memory if the address space is already in use by a relocated dll.

But the shmem was originally allocated in the postmaster process, which
is the identical executable with the identical set of linked-in DLLs.
So it's really unclear why the child processes would be unable to
reattach at the same address.

            regards, tom lane

Re: FATAL: could not reattach to shared memory (Win32)

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
>>> but I think it's mostly just that no one's bothered to rewrite the code
>>> for SHM_QUEUE linked lists.  The vast majority of our shmem structures
>>> use regular pointers, and have for years.
>
>> Ah, I happened to be recently in that code so I was mislead.
>
> IIRC, the reason for not bothering to change the SHM_QUEUE code (other
> than inertia) was that it's a generic linked list package, and so if
> it wasn't storing SHMEM_OFFSETs it'd be storing "void *"'s, and so there
> didn't seem to be any traction to be gained in terms of compiler error
> detection capability.  However, if both you and Alvaro were confused
> about the liveness of that coding convention, maybe it'd be worth making
> a push to eliminate all trace of MAKE_PTR/MAKE_OFFSET.  TODO for 8.4?

It would also make using gdb to look at the lock queues a bit less of a pain.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: FATAL: could not reattach to shared memory (Win32)

From
"Trevor Talbot"
Date:
On 8/24/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Trevor Talbot" <quension@gmail.com> writes:
> > On 8/23/07, Magnus Hagander <magnus@hagander.net> wrote:
> >> Not that wild a guess, really :-) I'd say it's a very good possibility -
> >> but I have no idea why it'd do that, since all backends load the same
> >> DLLs at that stage.
>
> > Not a valid assumption; you can't rely on consistent VM space among
> > multiple [non-cloned] processes without a serious amount of effort.
>
> I'm not sure if you have a specific technical meaning of "clone" in mind
> here, but these processes are all executing the identical executable,
> and taking care to map the shmem early in execution *before* they load
> any DLLs.  So it should work.  Apparently, it *does* work for awhile for
> the OP, and then stops working, which is even odder.

"Clone" in the same sense as fork(): duplicating a process instead of
regenerating it.  Even ignoring things like DLL replacement and
LD_PRELOAD-style options, there's still a lot of opportunity for
dynamic behavior.  All DLLs have an initialization routine called by
the loader (and on thread creation), which tends to be used to set up
things you don't want the caller to have to explicitly initialize.
DLLs that maintain global state they share with copies of themselves
in other processes can set up shared memory etc to do that.  They can
easily change their behavior based on the environment at the time of
process start.

There are also all the hooks for extension points, such as Winsock
LSPs.  Most such things happen only after an explicit initialization
(e.g. WSAStartup() or socket creation in the Winsock case), but
between the C runtime and third-party libraries, it may be happening
when you don't expect it.

All that said, I don't actually have a real-world example of process
VM layout changing like this, especially since you are using it early
to avoid this very problem.  I'd love to find out exactly what's going
on in Terry's case, but I haven't come up with a good way to do it
that doesn't disturb his production environment.

> If you've got a specific suggestion for making it more reliable,
> we're all ears.

To elaborate on what I said earlier, internal_forkexec() creates the
process suspended; while it has an execution environment set up, the
loader hasn't done all the DLL linking and initialization yet, so the
address space is relatively untouched.  At that point you could use
VirtualAllocEx() to reserve VM space for the shared memory at the
right address, and proceed with the rest of the setup.  When the new
backend starts up, it would then VirtualFree() that space immediately
before calling MapViewOfFileEx() on it.

I can probably set up with the 8.3 tree and MSVC to create an
artificial failure, and play with the above as a fix, but I'm not
quite sure when that will be.  There's still the issue of verifying it
is the problem on Terry's machine, and figuring out a fix for him.


On 8/24/07, Terry Yapt <yapt@technovell.com> wrote:

> Yes, the windows system log (application log section) doesn't show any
> error in several days.  Suddenly errors bring back to life and syslog
> errors repeats every few time.  But again errors disappears and return
> in a few hours.  After few hours the system goes out.
>
> Curiosity:
> ======
> On the log lines I have and I sent to the list:   * FATAL:  could not
> reattach to shared memory (key=5432001, addr=01D80000): Invalid argument
> , this one: "addr=01D80000" is always the same in spite of  the system
> have been shutting down and restarted or the error was out for a days.

The environment is consistent then.  Whatever is going on, when
postgres first starts things are normal, something just changes later
and the change is temporary.  As vague guides, I would look at some
kind of global resource usage/tracking, and scheduled tasks.  Do you
see any patterns about WHEN this happens?  During high load periods?
Any antivirus or other security type tasks running on the machine?
Any third-party VPN type software?  Fast User Switching or Remote
Desktop use?

Re: FATAL: could not reattach to shared memory (Win32)

From
Terry Yapt
Date:
Trevor Talbot escribió:
> The environment is consistent then.  Whatever is going on, when
> postgres first starts things are normal, something just changes later
> and the change is temporary.  As vague guides, I would look at some
> kind of global resource usage/tracking, and scheduled tasks.  Do you
> see any patterns about WHEN this happens?  During high load periods?
> Any antivirus or other security type tasks running on the machine?
> Any third-party VPN type software?  Fast User Switching or Remote
> Desktop use?
I have spent a lot of time looking for patterns on system logs, apache
logs, postgres logs, etc...
I have not found any clue conclusive.

Only I can say I have this kind of errors on postgreSQL-Logs:
'2007-08-21 15:19:21 ERROR:  could not open relation 16692/16694/17295:
Invalid argument'
And next log line/s are the statement-X.  But Statement-X runs ok and
give me right results when I copy+paste on any sql editor connected to
that DB.

That errors are not 'linked on time' with FATAL errors we are speaking
about on this thread.

I am trying to get the opportunity to migrate that DB to another server
and use that server to test anything we want, but the customer is
reluctant to let me that server to try-test-errors process because that
is their mail and web server too.  :-(

In spite of that server is remote far away from my location I have a
console (UltraVNC) to it if you need something to looking for.

Greetings.

Re: FATAL: could not reattach to shared memory (Win32)

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Magnus Hagander wrote:
> Shelby Cain wrote:
> >> ----- Original Message ---- From: Magnus Hagander
> >> <magnus@hagander.net> To: Alvaro Herrera
> >> <alvherre@commandprompt.com> Cc: Terry Yapt <yapt@technovell.com>;
> >> pgsql-general@postgresql.org Sent: Thursday, August 23, 2007
> >> 3:43:32 PM Subject: Re: [GENERAL] FATAL: could not reattach to
> >> shared memory (Win32)
> >>
> >>
> >> 8.3 will have a new way to deal with shared mem on win32. It's the
> >> same underlying tech, but we're no longer trying to squeeze it into
> >> an emulation of sysv. With a bit of luck, that'll help :-)
> >>
> >> //Magnus
> >>
> >
> > Wild guess on my part... could that error be the result of an attempt
> > to map shared memory into a process at a fixed location that just
> > happens to already be occupied by a dll that Windows had decided to
> > relocate?
>
> Not that wild a guess, really :-) I'd say it's a very good possibility -
> but I have no idea why it'd do that, since all backends load the same
> DLLs at that stage.
>
> //Magnus
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: FATAL: could not reattach to shared memory

From
Terry Yapt
Date:
Bruce Momjian escribió:
> This has been saved for the 8.4 release:
>
>     http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Update:

I have installed PostgreSQL 8.2.5 and move database from old to new
server.  This was 2 weeks ago.

New Server is a Windows 2003 Server running other services too.

Until now, this problem has gone out and PosgresSQL is running like a
charm on the new server.  :-)

Greetings.

Re: FATAL: could not reattach to shared memory (Win32)

From
Bruce Momjian
Date:
Added to TODO:

* Remove use of MAKE_PTR and MAKE_OFFSET macros

  http://archives.postgresql.org/pgsql-general/2007-08/msg01510.php


---------------------------------------------------------------------------

Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
> > "Trevor Talbot" <quension@gmail.com> writes:
> >> I gather postgres depends on it being at the same address, and fixing that
> >> isn't trivial?
>
> > I haven't been following the rest of the thread so I'm not sure if this is
> > important. But no, fixing that should be relatively trivial as there are
> > already some configurations where it's not the case (the EXEC_BACKEND case I
> > believe). The rest of the system uses a shared memory base pointer and
> > references everything relative to that.
>
> That hasn't been the case for quite a few years, and we're not going back.
> The pointer-to-offset-and-back gymnastics that that required were
> utterly destructive to code readability and maintainability, mainly
> because if everything stored in shmem data structures is an "offset"
> then you can't get any useful error checking from the compiler about how
> you are using the fields.  It's like decreeing that every pointer
> must be declared "void *" and cast to something else when it's used.
>
> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
> but I think it's mostly just that no one's bothered to rewrite the code
> for SHM_QUEUE linked lists.  The vast majority of our shmem structures
> use regular pointers, and have for years.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: FATAL: could not reattach to shared memory (Win32)

From
Bruce Momjian
Date:
Added to Win32 TODO:

        o Diagnose problem where shared memory can sometimes not be
          attached by postmaster children

          http://archives.postgresql.org/pgsql-general/2007-08/msg01377.php



---------------------------------------------------------------------------

Magnus Hagander wrote:
> Shelby Cain wrote:
> >> ----- Original Message ---- From: Magnus Hagander
> >> <magnus@hagander.net> To: Alvaro Herrera
> >> <alvherre@commandprompt.com> Cc: Terry Yapt <yapt@technovell.com>;
> >> pgsql-general@postgresql.org Sent: Thursday, August 23, 2007
> >> 3:43:32 PM Subject: Re: [GENERAL] FATAL: could not reattach to
> >> shared memory (Win32)
> >>
> >>
> >> 8.3 will have a new way to deal with shared mem on win32. It's the
> >> same underlying tech, but we're no longer trying to squeeze it into
> >> an emulation of sysv. With a bit of luck, that'll help :-)
> >>
> >> //Magnus
> >>
> >
> > Wild guess on my part... could that error be the result of an attempt
> > to map shared memory into a process at a fixed location that just
> > happens to already be occupied by a dll that Windows had decided to
> > relocate?
>
> Not that wild a guess, really :-) I'd say it's a very good possibility -
> but I have no idea why it'd do that, since all backends load the same
> DLLs at that stage.
>
> //Magnus
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +