Thread: [HACKERS] OK, so culicidae is *still* broken
Per https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2017-04-15%2004%3A00%3A02 2017-04-15 04:31:21.657 GMT [16792] FATAL: could not reattach to shared memory (key=6280001, addr=0x7f692fece000): Invalidargument Presumably, this is the same issue we've seen on Windows where the shmem address range gets overlapped by code loaded at a randomized address. Is there any real hope of making that work? regards, tom lane
On April 14, 2017 9:42:41 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Per >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2017-04-15%2004%3A00%3A02 > >2017-04-15 04:31:21.657 GMT [16792] FATAL: could not reattach to >shared memory (key=6280001, addr=0x7f692fece000): Invalid argument > >Presumably, this is the same issue we've seen on Windows where the >shmem address range gets overlapped by code loaded at a randomized >address. Is there any real hope of making that work? Seems to work reasonably regularly on other branches... On phone only, so can't dig into details, but it seems there's somechance involved. Let's see what the next few runs will do. Will crank frequency once home. Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On April 14, 2017 9:42:41 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 2017-04-15 04:31:21.657 GMT [16792] FATAL: could not reattach to >> shared memory (key=6280001, addr=0x7f692fece000): Invalid argument >> >> Presumably, this is the same issue we've seen on Windows where the >> shmem address range gets overlapped by code loaded at a randomized >> address. Is there any real hope of making that work? > Seems to work reasonably regularly on other branches... On phone only, so can't dig into details, but it seems there'ssome chance involved. Let's see what the next few runs will do. Will crank frequency once home. I poked at this on a Fedora 25 box, and was able to reproduce failures at a rate of one every half dozen or so runs of the core regression tests, which seems to about match what is happening on culicidae. Looking at the postmaster's memory map, it seems that shmem segments get mapped in the same part of the address space as shared libraries, ie they all end up in 0x00007Fxxxxxxxxxx. So it's not terribly surprising that there's a risk of collision with a shared library. I think what may be the most effective way to proceed is to provide a way to force the shmem segment to be mapped at a chosen address. It looks like, at least on x86_64 Linux, mapping shmem at 0x00007E0000000000 would work reliably. Since we only care about this for testing purposes, I don't think it has to be done in any very clean or even documented way. I'm inclined to propose that we put something into sysv_shmem.c that will check for an environment variable named, say, PG_SHMEM_ADDR, and if it's set will use the value as the address in the initial shmat() call. For a bit of extra safety we could do that only in EXEC_BACKEND builds. Then you'd just need to add PG_SHMEM_ADDR=0x7E0000000000 to culicidae's build_env and you'd be good to go. regards, tom lane
I wrote: > I think what may be the most effective way to proceed is to provide > a way to force the shmem segment to be mapped at a chosen address. > It looks like, at least on x86_64 Linux, mapping shmem at > 0x00007E0000000000 would work reliably. > Since we only care about this for testing purposes, I don't think > it has to be done in any very clean or even documented way. > I'm inclined to propose that we put something into sysv_shmem.c > that will check for an environment variable named, say, PG_SHMEM_ADDR, > and if it's set will use the value as the address in the initial > shmat() call. For a bit of extra safety we could do that only in > EXEC_BACKEND builds. Concretely, I propose the attached patch. We'd have to put it into all supported branches, since culicidae is showing intermittent "could not reattach to shared memory" failures in all the branches. regards, tom lane diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index 77cd8f3..15ae672 100644 *** a/src/backend/port/sysv_shmem.c --- b/src/backend/port/sysv_shmem.c *************** static void * *** 102,109 **** --- 102,129 ---- InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size) { IpcMemoryId shmid; + void *requestedAddress = NULL; void *memAddress; + /* + * Normally we just pass requestedAddress = NULL to shmat(), allowing the + * system to choose where the segment gets mapped. But in an EXEC_BACKEND + * build, it's possible for whatever is chosen in the postmaster to not + * work for backends, due to variations in address space layout. As a + * rather klugy workaround, allow the user to specify the address to use + * via setting the environment variable PG_SHMEM_ADDR. (If this were of + * interest for anything except debugging, we'd probably create a cleaner + * and better-documented way to set it, such as a GUC.) + */ + #ifdef EXEC_BACKEND + { + char *pg_shmem_addr = getenv("PG_SHMEM_ADDR"); + + if (pg_shmem_addr) + requestedAddress = (void *) strtoul(pg_shmem_addr, NULL, 0); + } + #endif + shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection); if (shmid < 0) *************** InternalIpcMemoryCreate(IpcMemoryKey mem *** 203,209 **** on_shmem_exit(IpcMemoryDelete, Int32GetDatum(shmid)); /* OK, should be able to attach to the segment */ ! memAddress = shmat(shmid, NULL, PG_SHMAT_FLAGS); if (memAddress == (void *) -1) elog(FATAL, "shmat(id=%d) failed: %m", shmid); --- 223,229 ---- on_shmem_exit(IpcMemoryDelete, Int32GetDatum(shmid)); /* OK, should be able to attach to the segment */ ! memAddress = shmat(shmid, requestedAddress, PG_SHMAT_FLAGS); if (memAddress == (void *) -1) elog(FATAL, "shmat(id=%d) failed: %m", shmid); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-04-15 16:48:05 -0400, Tom Lane wrote: > I wrote: > > I think what may be the most effective way to proceed is to provide > > a way to force the shmem segment to be mapped at a chosen address. > > It looks like, at least on x86_64 Linux, mapping shmem at > > 0x00007E0000000000 would work reliably. > > > Since we only care about this for testing purposes, I don't think > > it has to be done in any very clean or even documented way. > > I'm inclined to propose that we put something into sysv_shmem.c > > that will check for an environment variable named, say, PG_SHMEM_ADDR, > > and if it's set will use the value as the address in the initial > > shmat() call. For a bit of extra safety we could do that only in > > EXEC_BACKEND builds. > > Concretely, I propose the attached patch. We'd have to put it into > all supported branches, since culicidae is showing intermittent > "could not reattach to shared memory" failures in all the branches. That seems quite reasonable. I'm afraid we're going to have to figure out something similar, but more robust, for windows soon-ish :/ > /* OK, should be able to attach to the segment */ > - memAddress = shmat(shmid, NULL, PG_SHMAT_FLAGS); > + memAddress = shmat(shmid, requestedAddress, PG_SHMAT_FLAGS); > > if (memAddress == (void *) -1) > elog(FATAL, "shmat(id=%d) failed: %m", shmid); As a minor point, it'd probably be good to add addr=%zu, requestedAddress or such. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > That seems quite reasonable. I'm afraid we're going to have to figure > out something similar, but more robust, for windows soon-ish :/ Why doesn't Windows' ability to map the segment into the new process before it executes take care of that? > As a minor point, it'd probably be good to add addr=%zu, requestedAddress > or such. Yeah, I'd come to the same conclusion right after sending that. regards, tom lane
On 2017-04-15 17:09:38 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > That seems quite reasonable. I'm afraid we're going to have to figure > > out something similar, but more robust, for windows soon-ish :/ > > Why doesn't Windows' ability to map the segment into the new process > before it executes take care of that? Because of ASLR of the main executable (i.e. something like PIE). It'll supposedly become harder (as in only running in compatibility modes) if binaries don't enable that. It's currently disabled somewhere in the VC project generated. Besides that, it'd also be good for security purposes if we didn't have to disable PIE (which also prevents it from being used for the initial backend). Andres
Andres Freund <andres@anarazel.de> writes: > On 2017-04-15 17:09:38 -0400, Tom Lane wrote: >> Why doesn't Windows' ability to map the segment into the new process >> before it executes take care of that? > Because of ASLR of the main executable (i.e. something like PIE). Not following. Are you saying that the main executable gets mapped into the process address space immediately, but shared libraries are not? I wonder whether we could work around that by just destroying the created process and trying again if we get a collision. It'd be a tad inefficient, but hopefully collisions wouldn't happen often enough to be a big problem. regards, tom lane
On 2017-04-15 17:24:54 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2017-04-15 17:09:38 -0400, Tom Lane wrote: > >> Why doesn't Windows' ability to map the segment into the new process > >> before it executes take care of that? > > > Because of ASLR of the main executable (i.e. something like PIE). > > Not following. Are you saying that the main executable gets mapped into > the process address space immediately, but shared libraries are not? Without PIE/ASLR we can somewhat rely on pgwin32_ReserveSharedMemoryRegion to find the space that PGSharedMemoryCreate allocated still unoccupied. If the main binary also uses ASLR, not just libraries/stack/other mappings, that's not guaranteed to be the case. But this probably needs somebody with actual windows expertise commenting. > I wonder whether we could work around that by just destroying the created > process and trying again if we get a collision. It'd be a tad > inefficient, but hopefully collisions wouldn't happen often enough to be a > big problem. That might work, although it's obviously not pretty. We could also just default to some out-of-the-way address for MapViewOfFileEx, that might also work. - Andres
Andres Freund <andres@anarazel.de> writes: > On 2017-04-15 16:48:05 -0400, Tom Lane wrote: >> Concretely, I propose the attached patch. We'd have to put it into >> all supported branches, since culicidae is showing intermittent >> "could not reattach to shared memory" failures in all the branches. > That seems quite reasonable. Pushed, please update culicidae's settings. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2017-04-15 17:24:54 -0400, Tom Lane wrote: >> I wonder whether we could work around that by just destroying the created >> process and trying again if we get a collision. It'd be a tad >> inefficient, but hopefully collisions wouldn't happen often enough to be a >> big problem. > That might work, although it's obviously not pretty. We could also just > default to some out-of-the-way address for MapViewOfFileEx, that might > also work. Could be. Does Microsoft publish any documentation about the range of addresses their ASLR uses? Obviously, any such fix would be a lot more likely to be reliable in 64-bit machines. There's probably not enough daylight to be sure of making it work in 32-bit Windows, so I suspect we'd need some retry logic anyway for that case. regards, tom lane
On 2017-04-15 17:30:21 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2017-04-15 16:48:05 -0400, Tom Lane wrote: > >> Concretely, I propose the attached patch. We'd have to put it into > >> all supported branches, since culicidae is showing intermittent > >> "could not reattach to shared memory" failures in all the branches. > > > That seems quite reasonable. > > Pushed, please update culicidae's settings. Done, although there were already builds running by the time I got to it, so there'll be a few unaffected runs. I've increased build frequency of all branches to be forced once-an-hour, so we can quickly see how reliable it is. Will turn down Monday or such, if everything looks good. - Andres
On 2017-04-15 14:34:28 -0700, Andres Freund wrote: > On 2017-04-15 17:30:21 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > On 2017-04-15 16:48:05 -0400, Tom Lane wrote: > > >> Concretely, I propose the attached patch. We'd have to put it into > > >> all supported branches, since culicidae is showing intermittent > > >> "could not reattach to shared memory" failures in all the branches. > > > > > That seems quite reasonable. > > > > Pushed, please update culicidae's settings. > > Done, although there were already builds running by the time I got to > it, so there'll be a few unaffected runs. I've increased build > frequency of all branches to be forced once-an-hour, so we can quickly > see how reliable it is. Will turn down Monday or such, if everything > looks good. Looking through all the branches, it seems to have done the trick - the previous irregular failures seem to be gone. Nice. Unless somebody complains, I'll reduce the forced build frequency to something more normal again. - Andres
On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2017-04-15 17:24:54 -0400, Tom Lane wrote: >>> I wonder whether we could work around that by just destroying the created >>> process and trying again if we get a collision. It'd be a tad >>> inefficient, but hopefully collisions wouldn't happen often enough to be a >>> big problem. > >> That might work, although it's obviously not pretty. We could also just >> default to some out-of-the-way address for MapViewOfFileEx, that might >> also work. > > Could be. Does Microsoft publish any documentation about the range of > addresses their ASLR uses? > I have look around to find some information to see if there is any such address range which could be used for our purpose. I am not able to see any such predictable address range. You might want to read the article [1] especially the text around "What is the memory address space range in virtual memory map where system DLLs and user DLLs could load?" It seems to indicate that there is no such address unless I have misunderstood it. I don't deny the possibility of having such an address range, but I could not find any info on the same. > Obviously, any such fix would be a lot more likely to be reliable in > 64-bit machines. There's probably not enough daylight to be sure of > making it work in 32-bit Windows, so I suspect we'd need some retry > logic anyway for that case. > Yeah, that kind of thing can work assuming we don't get conflicts too often, but it could be possible that conflicts are not reported from ASLR enabled environments because of commit 7f3e17b4. [1] - https://blogs.msdn.microsoft.com/winsdk/2009/11/30/how-to-disable-address-space-layout-randomization-aslr/ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Obviously, any such fix would be a lot more likely to be reliable in >> 64-bit machines. There's probably not enough daylight to be sure of >> making it work in 32-bit Windows, so I suspect we'd need some retry >> logic anyway for that case. > Yeah, that kind of thing can work assuming we don't get conflicts too > often, but it could be possible that conflicts are not reported from > ASLR enabled environments because of commit 7f3e17b4. Right, but Andres' point is that we should make an effort to undo that hack and instead allow ASLR to happen. Not just because it's allegedly more secure, but because we may have no choice in future Windows versions. regards, tom lane
On 2017-04-19 10:15:31 -0400, Tom Lane wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: > > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Obviously, any such fix would be a lot more likely to be reliable in > >> 64-bit machines. There's probably not enough daylight to be sure of > >> making it work in 32-bit Windows, so I suspect we'd need some retry > >> logic anyway for that case. > > > Yeah, that kind of thing can work assuming we don't get conflicts too > > often, but it could be possible that conflicts are not reported from > > ASLR enabled environments because of commit 7f3e17b4. > > Right, but Andres' point is that we should make an effort to undo that > hack and instead allow ASLR to happen. Not just because it's allegedly > more secure, but because we may have no choice in future Windows versions. FWIW, I think it *also* might make us more secure, because addresses in the postgres binary won't be predictable anymore. Since most of the windows binaries are built by one source, that's some advantage on its own. - Andres
On Wed, Apr 19, 2017 at 9:01 PM, Andres Freund <andres@anarazel.de> wrote: > On 2017-04-19 10:15:31 -0400, Tom Lane wrote: >> Amit Kapila <amit.kapila16@gmail.com> writes: >> > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Obviously, any such fix would be a lot more likely to be reliable in >> >> 64-bit machines. There's probably not enough daylight to be sure of >> >> making it work in 32-bit Windows, so I suspect we'd need some retry >> >> logic anyway for that case. >> >> > Yeah, that kind of thing can work assuming we don't get conflicts too >> > often, but it could be possible that conflicts are not reported from >> > ASLR enabled environments because of commit 7f3e17b4. >> >> Right, but Andres' point is that we should make an effort to undo that >> hack and instead allow ASLR to happen. Not just because it's allegedly >> more secure, but because we may have no choice in future Windows versions. > > FWIW, I think it *also* might make us more secure, because addresses in > the postgres binary won't be predictable anymore. > Agreed. I have done some further study by using VMMap tool in Windows and it seems to me that all 64-bit processes use address range (0000000000010000 ~ 000007FFFFFE0000). I have attached two screen shots to show the address range (lower range and upper range). You need to refer the lower half of the window in attached screenshots. At this stage, I am not completely sure whether we can assume some address out of this range to use in MapViewOfFileEx. Let me know your thoughts. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2017-04-20 16:57:03 +0530, Amit Kapila wrote: > On Wed, Apr 19, 2017 at 9:01 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2017-04-19 10:15:31 -0400, Tom Lane wrote: > >> Amit Kapila <amit.kapila16@gmail.com> writes: > >> > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> >> Obviously, any such fix would be a lot more likely to be reliable in > >> >> 64-bit machines. There's probably not enough daylight to be sure of > >> >> making it work in 32-bit Windows, so I suspect we'd need some retry > >> >> logic anyway for that case. > >> > >> > Yeah, that kind of thing can work assuming we don't get conflicts too > >> > often, but it could be possible that conflicts are not reported from > >> > ASLR enabled environments because of commit 7f3e17b4. > >> > >> Right, but Andres' point is that we should make an effort to undo that > >> hack and instead allow ASLR to happen. Not just because it's allegedly > >> more secure, but because we may have no choice in future Windows versions. > > > > FWIW, I think it *also* might make us more secure, because addresses in > > the postgres binary won't be predictable anymore. > > > > Agreed. I have done some further study by using VMMap tool in Windows > and it seems to me that all 64-bit processes use address range > (0000000000010000 ~ 000007FFFFFE0000). I have attached two screen > shots to show the address range (lower range and upper range). You > need to refer the lower half of the window in attached screenshots. > At this stage, I am not completely sure whether we can assume some > address out of this range to use in MapViewOfFileEx. Let me know your > thoughts. That seems like a fairly restricted range (good!), and squares with memories of reading about this (can't find the reference anymore). Just using something like 0x00000F0000000000 as the address might work sufficiently well. What happens if you just hardcode that in the first MapViewOfFileEx call, and change RandomizedBaseAddress="FALSE" in src/tools/msvc/VCBuildProject.pm to true? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2017-04-20 16:57:03 +0530, Amit Kapila wrote: >> Agreed. I have done some further study by using VMMap tool in Windows >> and it seems to me that all 64-bit processes use address range >> (0000000000010000 ~ 000007FFFFFE0000). I have attached two screen >> shots to show the address range (lower range and upper range). You >> need to refer the lower half of the window in attached screenshots. >> At this stage, I am not completely sure whether we can assume some >> address out of this range to use in MapViewOfFileEx. Let me know your >> thoughts. > That seems like a fairly restricted range (good!), and squares with > memories of reading about this (can't find the reference anymore). Just > using something like 0x00000F0000000000 as the address might work > sufficiently well. What happens if you just hardcode that in the first > MapViewOfFileEx call, and change RandomizedBaseAddress="FALSE" in > src/tools/msvc/VCBuildProject.pm to true? The material I found about Linux x86_64 addressing explains that the underlying hardware address space is only 48 bits. Linux chooses to normalize this by sign-extending into the upper 16 bits, so that valid 64-bit addresses are0000000000000000 .. 00007FFFFFFFFFFF andFFFF800000000000 .. FFFFFFFFFFFFFFFF If I'm counting the bits correctly, Windows is choosing to use only 1/16th of the lower half of the available address space, which seems a bit odd. However, the main point here is that there should be quite a bit of available daylight, if they will let us use addresses above 000007FFFFFE0000 at all. I'd be inclined to do our own low-tech ASLR by using addresses with a random component, say 00000Fxxxxxx0000. regards, tom lane
On Fri, Apr 21, 2017 at 12:55 AM, Andres Freund <andres@anarazel.de> wrote: > On 2017-04-20 16:57:03 +0530, Amit Kapila wrote: >> On Wed, Apr 19, 2017 at 9:01 PM, Andres Freund <andres@anarazel.de> wrote: >> > On 2017-04-19 10:15:31 -0400, Tom Lane wrote: >> >> Amit Kapila <amit.kapila16@gmail.com> writes: >> >> > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> >> Obviously, any such fix would be a lot more likely to be reliable in >> >> >> 64-bit machines. There's probably not enough daylight to be sure of >> >> >> making it work in 32-bit Windows, so I suspect we'd need some retry >> >> >> logic anyway for that case. >> >> >> >> > Yeah, that kind of thing can work assuming we don't get conflicts too >> >> > often, but it could be possible that conflicts are not reported from >> >> > ASLR enabled environments because of commit 7f3e17b4. >> >> >> >> Right, but Andres' point is that we should make an effort to undo that >> >> hack and instead allow ASLR to happen. Not just because it's allegedly >> >> more secure, but because we may have no choice in future Windows versions. >> > >> > FWIW, I think it *also* might make us more secure, because addresses in >> > the postgres binary won't be predictable anymore. >> > >> >> Agreed. I have done some further study by using VMMap tool in Windows >> and it seems to me that all 64-bit processes use address range >> (0000000000010000 ~ 000007FFFFFE0000). I have attached two screen >> shots to show the address range (lower range and upper range). You >> need to refer the lower half of the window in attached screenshots. >> At this stage, I am not completely sure whether we can assume some >> address out of this range to use in MapViewOfFileEx. Let me know your >> thoughts. > > That seems like a fairly restricted range (good!), and squares with > memories of reading about this (can't find the reference anymore). Just > using something like 0x00000F0000000000 as the address might work > sufficiently well. What happens if you just hardcode that in the first > MapViewOfFileEx call, and change RandomizedBaseAddress="FALSE" in > src/tools/msvc/VCBuildProject.pm to true? > The server start generates an error: 2017-04-24 12:02:05.771 IST [8940] FATAL: could not create shared memory segment: error code 87 2017-04-24 12:02:05.771 IST [8940] DETAIL: Failed system call was MapViewOfFileEx. Error code 87 means "invalid parameter". Some googling [1] indicates such an error occurs if we pass the out-of-range address to MapViewOfFileEx. Another possible theory is that we must pass the address as multiple of the system's memory allocation granularity as that is expected by specs of MapViewOfFileEx. I can try doing that but I am not confident that this is the right approach because of below text mentioned in docs (msdn) of MapViewOfFileEx. "While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address.". I have printed the addresses that system automatically picks for MapViewOfFileEx (3a20000, 3770000, 3520000, 3720000, 3730000) and they seem to be in a range which I have posted up thread for 64-bit processes. Another thing I have tried is to just start the server by setting RandomizedBaseAddress="TRUE". I have tried about 15-20 times but could not reproduce the problem related to shared memory attach. We have tried the same on one of my colleagues (Ashutosh Sharma) machine as well, there we could see that error once or twice out of many tries but couldn't get it consistently. I think if the chances of this problem to occur are so less, then probably the suggestion made by Tom to retry if we get collision doesn't sound bad. [1] - https://support.microsoft.com/en-us/help/125713/common-file-mapping-problems-and-platform-differences -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 16 April 2017 at 05:18, Andres Freund <andres@anarazel.de> wrote: > Because of ASLR of the main executable (i.e. something like PIE). It'll > supposedly become harder (as in only running in compatibility modes) if > binaries don't enable that. It's currently disabled somewhere in the VC > project generated. I thought we passed /DYNAMICBASE:NO directly , but I don't see that in the code. A look at the git logs shows that we disabled it in 7f3e17b48 by emitting <RandomizedBaseAddress>false</RandomizedBaseAddress> in the MSBuild project. That'll pass /DYNAMICBASE:NO to the linker. See https://msdn.microsoft.com/en-us/library/bb384887.aspx It's rather better than the old registry hack, but it's a compat option we're likely to lose at some point. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 24 April 2017 at 16:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > Another thing I have tried is to just start the server by setting > RandomizedBaseAddress="TRUE". I have tried about 15-20 times but > could not reproduce the problem related to shared memory attach. We > have tried the same on one of my colleagues (Ashutosh Sharma) machine > as well, there we could see that error once or twice out of many tries > but couldn't get it consistently. I think if the chances of this > problem to occur are so less, then probably the suggestion made by Tom > to retry if we get collision doesn't sound bad. It's pretty uncommon, and honestly, we might well be best off just trying again if we lose the lottery. Most of what I read last time I looked into this essentially assumed that you'd "fix" your code by reinventing far pointers[1], like the good old Win16 days. Assume all addresses in shmem are relative to the shmem base, and offset them when accessing/storing them. Fun and efficient for everyone! That seems to be what Boost recommends[2]. Given that Pg doesn't make much effort to differentiate between pointers to shmem and local memory, and has no pointer transformations between shared and local pointers, adopting that would be a horrifyingly intrusive change as well as incredibly tedious to implement. We'd probably land up using size_t or ptrdiff_t for shmem pointers and some kind of macro that was a noop on !windows. For once I'd be thoroughly in agreement with Tom's complaints about Windows-droppings. Other people who've faced and worked around this[3] have come up with solutions that look way scarier than just retrying if we lose the random numbers game. BTW, some Windows users face issues with large contiguous allocations/mappings even without the process sharing side[4] due to memory fragmentation created by ASLR, though this may only be a concern for 32-bit executables. The relevant option /LARGEADDRESSAWARE is enabled by default for 64-bit builds. We might want to /DELAYLOAD [5] DLLs where possible to improve our chances of winning the dice roll, but if we're going to support retrying at all we don't need to care too much. I looked at image binding (prelinking), but it's disabled if ASLR is in use. In the long run we'll probably be forced toward threading or far pointers. [1] https://en.wikipedia.org/wiki/Far_pointer, https://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer%5Fsizes [2] http://www.boost.org/doc/libs/1_64_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_region.mapped_region_address_mapping [3] http://stackoverflow.com/a/36145019/398670 [4] https://github.com/golang/go/issues/2323 [5] On 24 April 2017 at 16:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > Another thing I have tried is to just start the server by setting > RandomizedBaseAddress="TRUE". I have tried about 15-20 times but > could not reproduce the problem related to shared memory attach. We > have tried the same on one of my colleagues (Ashutosh Sharma) machine > as well, there we could see that error once or twice out of many tries > but couldn't get it consistently. I think if the chances of this > problem to occur are so less, then probably the suggestion made by Tom > to retry if we get collision doesn't sound bad. It's pretty uncommon, and honestly, we might well be best off just trying again if we lose the lottery. Most of what I read last time I looked into this essentially assumed that you'd "fix" your code by reinventing far pointers[1], like the good old Win16 days. Assume all addresses in shmem are relative to the shmem base, and offset them when accessing/storing them. Fun and efficient for everyone! That seems to be what Boost recommends[2]. Given that Pg doesn't make much effort to differentiate between pointers to shmem and local memory, and has no pointer transformations between shared and local pointers, adopting that would be a horrifyingly intrusive change as well as incredibly tedious to implement. We'd probably land up using size_t or ptrdiff_t for shmem pointers and some kind of macro that was a noop on !windows. For once I'd be thoroughly in agreement with Tom's complaints about Windows-droppings. Other people who've faced and worked around this[3] have come up with solutions that look way scarier than just retrying if we lose the random numbers game. BTW, some Windows users face issues with large contiguous allocations/mappings even without the process sharing side[4] due to memory fragmentation created by ASLR, though this may only be a concern for 32-bit executables. The relevant option /LARGEADDRESSAWARE is enabled by default for 64-bit builds. We should /DELAYLOAD as many DDLs as possible to improve our chances. [1] https://en.wikipedia.org/wiki/Far_pointer, https://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer%5Fsizes [2] http://www.boost.org/doc/libs/1_64_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_region.mapped_region_address_mapping [3]http://stackoverflow.com/a/36145019/398670 [4] https://github.com/golang/go/issues/2323 -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2017-04-24 14:25:34 +0530, Amit Kapila wrote: > Error code 87 means "invalid parameter". Some googling [1] indicates > such an error occurs if we pass the out-of-range address to > MapViewOfFileEx. Another possible theory is that we must pass the > address as multiple of the system's memory allocation granularity as > that is expected by specs of MapViewOfFileEx. I can try doing that > but I am not confident that this is the right approach because of > below text mentioned in docs (msdn) of MapViewOfFileEx. > "While it is possible to specify an address that is safe now (not used > by the operating system), there is no guarantee that the address will > remain safe over time. Therefore, it is better to let the operating > system choose the address.". Sure, that's the same as mmap() etc say. I'd not be overly deterred by that. On 2017-04-24 23:14:40 +0800, Craig Ringer wrote: > In the long run we'll probably be forced toward threading or far pointers. I'll vote for removing the windows port, before going for that. And I'm not even joking. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2017-04-24 23:14:40 +0800, Craig Ringer wrote: >> In the long run we'll probably be forced toward threading or far pointers. > I'll vote for removing the windows port, before going for that. And I'm > not even joking. Me too. We used to *have* that kind of code, ie relative pointers into the shmem segment, and it was a tremendous notational mess and very bug-prone. I do not wish to go back. (We have accepted that kind of overhead for DSM segments, but the intention I think is to allow only very trivial data structures in the DSM segments. Losing compiler pointer type checking for data structures like the lock or PGPROC tables would be horrid.) regards, tom lane
On 2017-04-24 14:43:11 -0400, Tom Lane wrote: > (We have accepted that kind of overhead for DSM segments, but the > intention I think is to allow only very trivial data structures in > the DSM segments. Losing compiler pointer type checking for data > structures like the lock or PGPROC tables would be horrid.) The relptr.h infrastructure brings some of the type-checking back, but it's still pretty painful. And just as important, it's quite noticeable performance-wise. So we have to do it for dynamic shm (until/unless we go to using threads), but that doesn't mean we should do it for some of the most performance critical data structures in PG... - Andres
On 2017-04-24 11:08:48 -0700, Andres Freund wrote: > On 2017-04-24 23:14:40 +0800, Craig Ringer wrote: > > In the long run we'll probably be forced toward threading or far pointers. > > I'll vote for removing the windows port, before going for that. And I'm > not even joking. Just to clarify: I'm talking about far pointers here, not threading.
On 25 Apr. 2017 02:51, "Andres Freund" <andres@anarazel.de> wrote:
On 2017-04-24 11:08:48 -0700, Andres Freund wrote:Just to clarify: I'm talking about far pointers here, not threading.
> On 2017-04-24 23:14:40 +0800, Craig Ringer wrote:
> > In the long run we'll probably be forced toward threading or far pointers.
>
> I'll vote for removing the windows port, before going for that. And I'm
> not even joking.
Yeah, I'm pretty unimpressed that the accepted solution seems to be to return to the early '90s.
Why don't platforms offer any sensible way to reserve a virtual address range at process creation time?
It looks like you need a minimal loader process that rebases its self in memory if it finds its self loaded in the desired area, then maps the required memory range and loads the real process. Hard to imagine that not causing more problems than it solves.
On 04/24/2017 09:50 PM, Andres Freund wrote: > On 2017-04-24 14:43:11 -0400, Tom Lane wrote: >> (We have accepted that kind of overhead for DSM segments, but the >> intention I think is to allow only very trivial data structures in >> the DSM segments. Losing compiler pointer type checking for data >> structures like the lock or PGPROC tables would be horrid.) > > The relptr.h infrastructure brings some of the type-checking back, but > it's still pretty painful. And just as important, it's quite noticeable > performance-wise. So we have to do it for dynamic shm (until/unless we > go to using threads), but that doesn't mean we should do it for some of > the most performance critical data structures in PG... Agreed. For some data shared memory structures, that store no pointers, we wouldn't need to insist that they are mapped to the same address in every backend, though. In particular, shared_buffers. It wouldn't eliminate the problem, though, only make it less likely, so we'd still need to retry when it does happen. - Heikki
On 25 Apr. 2017 13:37, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:
For some data shared memory structures, that store no pointers, we wouldn't need to insist that they are mapped to the same address in every backend, though. In particular, shared_buffers. It wouldn't eliminate the problem, though, only make it less likely, so we'd still need to retry when it does happen.
Good point. Simply splitting out shared_buffers into a moveable segment would make a massive difference. Much less chance of losing the dice roll for mapping the fixed segment.
Should look at what else could be made cheaply relocatable too.
Craig Ringer <craig@2ndquadrant.com> writes: > On 25 Apr. 2017 13:37, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote: >> For some data shared memory structures, that store no pointers, we wouldn't >> need to insist that they are mapped to the same address in every backend, >> though. In particular, shared_buffers. It wouldn't eliminate the problem, >> though, only make it less likely, so we'd still need to retry when it does >> happen. > Good point. Simply splitting out shared_buffers into a moveable segment > would make a massive difference. Much less chance of losing the dice roll > for mapping the fixed segment. > Should look at what else could be made cheaply relocatable too. I don't think it's worth spending any effort on. We need the retry code anyway, and making it near impossible to hit that would only mean that it's very poorly tested. The results upthread say that it isn't going to be hit often enough to be a performance problem, so why worry? regards, tom lane
On 25 April 2017 at 22:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig@2ndquadrant.com> writes: >> On 25 Apr. 2017 13:37, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote: >>> For some data shared memory structures, that store no pointers, we wouldn't >>> need to insist that they are mapped to the same address in every backend, >>> though. In particular, shared_buffers. It wouldn't eliminate the problem, >>> though, only make it less likely, so we'd still need to retry when it does >>> happen. > >> Good point. Simply splitting out shared_buffers into a moveable segment >> would make a massive difference. Much less chance of losing the dice roll >> for mapping the fixed segment. > >> Should look at what else could be made cheaply relocatable too. > > I don't think it's worth spending any effort on. We need the retry > code anyway, and making it near impossible to hit that would only mean > that it's very poorly tested. The results upthread say that it isn't > going to be hit often enough to be a performance problem, so why worry? Good point. Deal with it if it becomes an issue. That said, I didn't see if any of those tests covered really big shared_buffers. That could become an issue down the track at least. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Apr 25, 2017 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig@2ndquadrant.com> writes: >> On 25 Apr. 2017 13:37, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote: >>> For some data shared memory structures, that store no pointers, we wouldn't >>> need to insist that they are mapped to the same address in every backend, >>> though. In particular, shared_buffers. It wouldn't eliminate the problem, >>> though, only make it less likely, so we'd still need to retry when it does >>> happen. > >> Good point. Simply splitting out shared_buffers into a moveable segment >> would make a massive difference. Much less chance of losing the dice roll >> for mapping the fixed segment. > >> Should look at what else could be made cheaply relocatable too. > > I don't think it's worth spending any effort on. We need the retry > code anyway, and making it near impossible to hit that would only mean > that it's very poorly tested. The results upthread say that it isn't > going to be hit often enough to be a performance problem, so why worry? > Yeah, that's right. Today, I have spent some time to analyze how and where retry logic is required. I think there are two places where we need this retry logic, one is if we fail to reserve the memory (pgwin32_ReserveSharedMemoryRegion) and second is if we fail to reattach (PGSharedMemoryReAttach). If we fail during reserve memory operation, then we can simply terminate the process and recreate it again, basically, add some kind of loop in internal_forkexec(). OTOH, if we fail during reattach, I think we need an exit from the process which means it can lead to a restart of all the processes unless we want to add some special logic for handling process exit or alternatively we might want to just try reattach operation in a loop before giving up. Do we want to keep this retry logic for a certain number of times say (10 times) or we want to try indefinitely? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > Yeah, that's right. Today, I have spent some time to analyze how and > where retry logic is required. I think there are two places where we > need this retry logic, one is if we fail to reserve the memory > (pgwin32_ReserveSharedMemoryRegion) and second is if we fail to > reattach (PGSharedMemoryReAttach). I'm not following. The point of the reserve operation is to ensure that the reattach will work. What makes you think we need to add more code at that end? regards, tom lane
On Mon, May 1, 2017 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> Yeah, that's right. Today, I have spent some time to analyze how and >> where retry logic is required. I think there are two places where we >> need this retry logic, one is if we fail to reserve the memory >> (pgwin32_ReserveSharedMemoryRegion) and second is if we fail to >> reattach (PGSharedMemoryReAttach). > > I'm not following. The point of the reserve operation is to ensure > that the reattach will work. What makes you think we need to add more > code at that end? > We use VirtualAllocEx to allocate memory at a pre-defined address (reserve operation) which can fail due to ASLR. One recent example of such a symptom is seen on pgsql-bugs [1], that failure is during reserve operation and seems like something related to ASLR. Another point is the commit 7f3e17b4827b61ad84e0774e3e43da4c57c4487f (It doesn't fail every time (which is explained by the Random part in ASLR), but can fail with errors abut failing to reserve shared memory region) also indicates that we can fail to reserve memory due to ASLR. [1] - https://www.postgresql.org/message-id/14121.1485360296%40sss.pgh.pa.us -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 15, 2017 at 02:30:18PM -0700, Andres Freund wrote: > On 2017-04-15 17:24:54 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > On 2017-04-15 17:09:38 -0400, Tom Lane wrote: > > >> Why doesn't Windows' ability to map the segment into the new process > > >> before it executes take care of that? > > > > > Because of ASLR of the main executable (i.e. something like PIE). > > > > Not following. Are you saying that the main executable gets mapped into > > the process address space immediately, but shared libraries are not? At the time of the pgwin32_ReserveSharedMemoryRegion() call, the child process contains only ntdll.dll and the executable. > Without PIE/ASLR we can somewhat rely on pgwin32_ReserveSharedMemoryRegion > to find the space that PGSharedMemoryCreate allocated still unoccupied. I've never had access to a Windows system that can reproduce the fork failures. My best theory is that antivirus or similar software injects an additional DLL at that early stage. > > I wonder whether we could work around that by just destroying the created > > process and trying again if we get a collision. It'd be a tad > > inefficient, but hopefully collisions wouldn't happen often enough to be a > > big problem. > > That might work, although it's obviously not pretty. I didn't like that idea when Michael proposed it in 2015. Since disabling ASLR on the exe proved insufficient, I do like it now. It degrades nicely; if something raises the collision rate from 1% to 10%, that just looks like fork latency degrading. > We could also just > default to some out-of-the-way address for MapViewOfFileEx, that might > also work. Dave Vitek proposed that: https://www.postgresql.org/message-id/flat/5046CAEB.4010600%40grammatech.com I estimate more risk than retrying forks. There's no guarantee that a fixed address helpful today won't make the collisions worse after the next Windows security patch.