Re: FATAL: could not reattach to shared memory (Win32) - Mailing list pgsql-general
From | Trevor Talbot |
---|---|
Subject | Re: FATAL: could not reattach to shared memory (Win32) |
Date | |
Msg-id | 90bce5730708260806o2b8afa60q3dd5a33567e2848a@mail.gmail.com Whole thread Raw |
In response to | Re: FATAL: could not reattach to shared memory (Win32) (Terry Yapt <yapt@technovell.com>) |
Responses |
Re: FATAL: could not reattach to shared memory (Win32)
|
List | pgsql-general |
On 8/24/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Trevor Talbot" <quension@gmail.com> writes: > > On 8/23/07, Magnus Hagander <magnus@hagander.net> wrote: > >> Not that wild a guess, really :-) I'd say it's a very good possibility - > >> but I have no idea why it'd do that, since all backends load the same > >> DLLs at that stage. > > > Not a valid assumption; you can't rely on consistent VM space among > > multiple [non-cloned] processes without a serious amount of effort. > > I'm not sure if you have a specific technical meaning of "clone" in mind > here, but these processes are all executing the identical executable, > and taking care to map the shmem early in execution *before* they load > any DLLs. So it should work. Apparently, it *does* work for awhile for > the OP, and then stops working, which is even odder. "Clone" in the same sense as fork(): duplicating a process instead of regenerating it. Even ignoring things like DLL replacement and LD_PRELOAD-style options, there's still a lot of opportunity for dynamic behavior. All DLLs have an initialization routine called by the loader (and on thread creation), which tends to be used to set up things you don't want the caller to have to explicitly initialize. DLLs that maintain global state they share with copies of themselves in other processes can set up shared memory etc to do that. They can easily change their behavior based on the environment at the time of process start. There are also all the hooks for extension points, such as Winsock LSPs. Most such things happen only after an explicit initialization (e.g. WSAStartup() or socket creation in the Winsock case), but between the C runtime and third-party libraries, it may be happening when you don't expect it. All that said, I don't actually have a real-world example of process VM layout changing like this, especially since you are using it early to avoid this very problem. I'd love to find out exactly what's going on in Terry's case, but I haven't come up with a good way to do it that doesn't disturb his production environment. > If you've got a specific suggestion for making it more reliable, > we're all ears. To elaborate on what I said earlier, internal_forkexec() creates the process suspended; while it has an execution environment set up, the loader hasn't done all the DLL linking and initialization yet, so the address space is relatively untouched. At that point you could use VirtualAllocEx() to reserve VM space for the shared memory at the right address, and proceed with the rest of the setup. When the new backend starts up, it would then VirtualFree() that space immediately before calling MapViewOfFileEx() on it. I can probably set up with the 8.3 tree and MSVC to create an artificial failure, and play with the above as a fix, but I'm not quite sure when that will be. There's still the issue of verifying it is the problem on Terry's machine, and figuring out a fix for him. On 8/24/07, Terry Yapt <yapt@technovell.com> wrote: > Yes, the windows system log (application log section) doesn't show any > error in several days. Suddenly errors bring back to life and syslog > errors repeats every few time. But again errors disappears and return > in a few hours. After few hours the system goes out. > > Curiosity: > ====== > On the log lines I have and I sent to the list: * FATAL: could not > reattach to shared memory (key=5432001, addr=01D80000): Invalid argument > , this one: "addr=01D80000" is always the same in spite of the system > have been shutting down and restarted or the error was out for a days. The environment is consistent then. Whatever is going on, when postgres first starts things are normal, something just changes later and the change is temporary. As vague guides, I would look at some kind of global resource usage/tracking, and scheduled tasks. Do you see any patterns about WHEN this happens? During high load periods? Any antivirus or other security type tasks running on the machine? Any third-party VPN type software? Fast User Switching or Remote Desktop use?
pgsql-general by date: