Thread: [HACKERS] OK, so culicidae is *still* broken

[HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Per
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2017-04-15%2004%3A00%3A02

2017-04-15 04:31:21.657 GMT [16792] FATAL:  could not reattach to shared memory (key=6280001, addr=0x7f692fece000):
Invalidargument 

Presumably, this is the same issue we've seen on Windows where the
shmem address range gets overlapped by code loaded at a randomized
address.  Is there any real hope of making that work?
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:

On April 14, 2017 9:42:41 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Per
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2017-04-15%2004%3A00%3A02
>
>2017-04-15 04:31:21.657 GMT [16792] FATAL:  could not reattach to
>shared memory (key=6280001, addr=0x7f692fece000): Invalid argument
>
>Presumably, this is the same issue we've seen on Windows where the
>shmem address range gets overlapped by code loaded at a randomized
>address.  Is there any real hope of making that work?

Seems to work reasonably regularly on other branches... On phone only, so can't dig into details, but it seems there's
somechance involved.  Let's see what the next few runs will do.  Will crank frequency once home. 

Andres

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On April 14, 2017 9:42:41 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2017-04-15 04:31:21.657 GMT [16792] FATAL:  could not reattach to
>> shared memory (key=6280001, addr=0x7f692fece000): Invalid argument
>>
>> Presumably, this is the same issue we've seen on Windows where the
>> shmem address range gets overlapped by code loaded at a randomized
>> address.  Is there any real hope of making that work?

> Seems to work reasonably regularly on other branches... On phone only, so can't dig into details, but it seems
there'ssome chance involved.  Let's see what the next few runs will do.  Will crank frequency once home. 

I poked at this on a Fedora 25 box, and was able to reproduce failures at
a rate of one every half dozen or so runs of the core regression tests,
which seems to about match what is happening on culicidae.

Looking at the postmaster's memory map, it seems that shmem segments
get mapped in the same part of the address space as shared libraries,
ie they all end up in 0x00007Fxxxxxxxxxx.  So it's not terribly
surprising that there's a risk of collision with a shared library.

I think what may be the most effective way to proceed is to provide
a way to force the shmem segment to be mapped at a chosen address.
It looks like, at least on x86_64 Linux, mapping shmem at
0x00007E0000000000 would work reliably.

Since we only care about this for testing purposes, I don't think
it has to be done in any very clean or even documented way.
I'm inclined to propose that we put something into sysv_shmem.c
that will check for an environment variable named, say, PG_SHMEM_ADDR,
and if it's set will use the value as the address in the initial
shmat() call.  For a bit of extra safety we could do that only in
EXEC_BACKEND builds.

Then you'd just need to add PG_SHMEM_ADDR=0x7E0000000000 to
culicidae's build_env and you'd be good to go.
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
I wrote:
> I think what may be the most effective way to proceed is to provide
> a way to force the shmem segment to be mapped at a chosen address.
> It looks like, at least on x86_64 Linux, mapping shmem at
> 0x00007E0000000000 would work reliably.

> Since we only care about this for testing purposes, I don't think
> it has to be done in any very clean or even documented way.
> I'm inclined to propose that we put something into sysv_shmem.c
> that will check for an environment variable named, say, PG_SHMEM_ADDR,
> and if it's set will use the value as the address in the initial
> shmat() call.  For a bit of extra safety we could do that only in
> EXEC_BACKEND builds.

Concretely, I propose the attached patch.  We'd have to put it into
all supported branches, since culicidae is showing intermittent
"could not reattach to shared memory" failures in all the branches.

            regards, tom lane

diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 77cd8f3..15ae672 100644
*** a/src/backend/port/sysv_shmem.c
--- b/src/backend/port/sysv_shmem.c
*************** static void *
*** 102,109 ****
--- 102,129 ----
  InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size)
  {
      IpcMemoryId shmid;
+     void       *requestedAddress = NULL;
      void       *memAddress;

+     /*
+      * Normally we just pass requestedAddress = NULL to shmat(), allowing the
+      * system to choose where the segment gets mapped.  But in an EXEC_BACKEND
+      * build, it's possible for whatever is chosen in the postmaster to not
+      * work for backends, due to variations in address space layout.  As a
+      * rather klugy workaround, allow the user to specify the address to use
+      * via setting the environment variable PG_SHMEM_ADDR.  (If this were of
+      * interest for anything except debugging, we'd probably create a cleaner
+      * and better-documented way to set it, such as a GUC.)
+      */
+ #ifdef EXEC_BACKEND
+     {
+         char       *pg_shmem_addr = getenv("PG_SHMEM_ADDR");
+
+         if (pg_shmem_addr)
+             requestedAddress = (void *) strtoul(pg_shmem_addr, NULL, 0);
+     }
+ #endif
+
      shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection);

      if (shmid < 0)
*************** InternalIpcMemoryCreate(IpcMemoryKey mem
*** 203,209 ****
      on_shmem_exit(IpcMemoryDelete, Int32GetDatum(shmid));

      /* OK, should be able to attach to the segment */
!     memAddress = shmat(shmid, NULL, PG_SHMAT_FLAGS);

      if (memAddress == (void *) -1)
          elog(FATAL, "shmat(id=%d) failed: %m", shmid);
--- 223,229 ----
      on_shmem_exit(IpcMemoryDelete, Int32GetDatum(shmid));

      /* OK, should be able to attach to the segment */
!     memAddress = shmat(shmid, requestedAddress, PG_SHMAT_FLAGS);

      if (memAddress == (void *) -1)
          elog(FATAL, "shmat(id=%d) failed: %m", shmid);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-15 16:48:05 -0400, Tom Lane wrote:
> I wrote:
> > I think what may be the most effective way to proceed is to provide
> > a way to force the shmem segment to be mapped at a chosen address.
> > It looks like, at least on x86_64 Linux, mapping shmem at
> > 0x00007E0000000000 would work reliably.
> 
> > Since we only care about this for testing purposes, I don't think
> > it has to be done in any very clean or even documented way.
> > I'm inclined to propose that we put something into sysv_shmem.c
> > that will check for an environment variable named, say, PG_SHMEM_ADDR,
> > and if it's set will use the value as the address in the initial
> > shmat() call.  For a bit of extra safety we could do that only in
> > EXEC_BACKEND builds.
> 
> Concretely, I propose the attached patch.  We'd have to put it into
> all supported branches, since culicidae is showing intermittent
> "could not reattach to shared memory" failures in all the branches.

That seems quite reasonable.  I'm afraid we're going to have to figure
out something similar, but more robust, for windows soon-ish :/


>      /* OK, should be able to attach to the segment */
> -    memAddress = shmat(shmid, NULL, PG_SHMAT_FLAGS);
> +    memAddress = shmat(shmid, requestedAddress, PG_SHMAT_FLAGS);
>  
>      if (memAddress == (void *) -1)
>          elog(FATAL, "shmat(id=%d) failed: %m", shmid);

As a minor point, it'd probably be good to add addr=%zu, requestedAddress
or such.

Greetings,

Andres Freund



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> That seems quite reasonable.  I'm afraid we're going to have to figure
> out something similar, but more robust, for windows soon-ish :/

Why doesn't Windows' ability to map the segment into the new process
before it executes take care of that?

> As a minor point, it'd probably be good to add addr=%zu, requestedAddress
> or such.

Yeah, I'd come to the same conclusion right after sending that.
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-15 17:09:38 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > That seems quite reasonable.  I'm afraid we're going to have to figure
> > out something similar, but more robust, for windows soon-ish :/
> 
> Why doesn't Windows' ability to map the segment into the new process
> before it executes take care of that?

Because of ASLR of the main executable (i.e. something like PIE).  It'll
supposedly become harder (as in only running in compatibility modes) if
binaries don't enable that.  It's currently disabled somewhere in the VC
project generated.  Besides that, it'd also be good for security
purposes if we didn't have to disable PIE (which also prevents it from
being used for the initial backend).

Andres



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-15 17:09:38 -0400, Tom Lane wrote:
>> Why doesn't Windows' ability to map the segment into the new process
>> before it executes take care of that?

> Because of ASLR of the main executable (i.e. something like PIE).

Not following.  Are you saying that the main executable gets mapped into
the process address space immediately, but shared libraries are not?

I wonder whether we could work around that by just destroying the created
process and trying again if we get a collision.  It'd be a tad
inefficient, but hopefully collisions wouldn't happen often enough to be a
big problem.
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-15 17:24:54 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-04-15 17:09:38 -0400, Tom Lane wrote:
> >> Why doesn't Windows' ability to map the segment into the new process
> >> before it executes take care of that?
> 
> > Because of ASLR of the main executable (i.e. something like PIE).
> 
> Not following.  Are you saying that the main executable gets mapped into
> the process address space immediately, but shared libraries are not?

Without PIE/ASLR we can somewhat rely on pgwin32_ReserveSharedMemoryRegion
to find the space that PGSharedMemoryCreate allocated still unoccupied.
If the main binary also uses ASLR, not just libraries/stack/other
mappings, that's not guaranteed to be the case.

But this probably needs somebody with actual windows expertise
commenting.


> I wonder whether we could work around that by just destroying the created
> process and trying again if we get a collision.  It'd be a tad
> inefficient, but hopefully collisions wouldn't happen often enough to be a
> big problem.

That might work, although it's obviously not pretty.  We could also just
default to some out-of-the-way address for MapViewOfFileEx, that might
also work.

- Andres



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-15 16:48:05 -0400, Tom Lane wrote:
>> Concretely, I propose the attached patch.  We'd have to put it into
>> all supported branches, since culicidae is showing intermittent
>> "could not reattach to shared memory" failures in all the branches.

> That seems quite reasonable.

Pushed, please update culicidae's settings.
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-15 17:24:54 -0400, Tom Lane wrote:
>> I wonder whether we could work around that by just destroying the created
>> process and trying again if we get a collision.  It'd be a tad
>> inefficient, but hopefully collisions wouldn't happen often enough to be a
>> big problem.

> That might work, although it's obviously not pretty.  We could also just
> default to some out-of-the-way address for MapViewOfFileEx, that might
> also work.

Could be.  Does Microsoft publish any documentation about the range of
addresses their ASLR uses?

Obviously, any such fix would be a lot more likely to be reliable in
64-bit machines.  There's probably not enough daylight to be sure of
making it work in 32-bit Windows, so I suspect we'd need some retry
logic anyway for that case.
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-15 17:30:21 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-04-15 16:48:05 -0400, Tom Lane wrote:
> >> Concretely, I propose the attached patch.  We'd have to put it into
> >> all supported branches, since culicidae is showing intermittent
> >> "could not reattach to shared memory" failures in all the branches.
> 
> > That seems quite reasonable.
> 
> Pushed, please update culicidae's settings.

Done, although there were already builds running by the time I got to
it, so there'll be a few unaffected runs.  I've increased build
frequency of all branches to be forced once-an-hour, so we can quickly
see how reliable it is.  Will turn down Monday or such, if everything
looks good.

- Andres



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-15 14:34:28 -0700, Andres Freund wrote:
> On 2017-04-15 17:30:21 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2017-04-15 16:48:05 -0400, Tom Lane wrote:
> > >> Concretely, I propose the attached patch.  We'd have to put it into
> > >> all supported branches, since culicidae is showing intermittent
> > >> "could not reattach to shared memory" failures in all the branches.
> > 
> > > That seems quite reasonable.
> > 
> > Pushed, please update culicidae's settings.
> 
> Done, although there were already builds running by the time I got to
> it, so there'll be a few unaffected runs.  I've increased build
> frequency of all branches to be forced once-an-hour, so we can quickly
> see how reliable it is.  Will turn down Monday or such, if everything
> looks good.

Looking through all the branches, it seems to have done the trick - the
previous irregular failures seem to be gone.  Nice.

Unless somebody complains, I'll reduce the forced build frequency to
something more normal again.

- Andres



Re: [HACKERS] OK, so culicidae is *still* broken

From
Amit Kapila
Date:
On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2017-04-15 17:24:54 -0400, Tom Lane wrote:
>>> I wonder whether we could work around that by just destroying the created
>>> process and trying again if we get a collision.  It'd be a tad
>>> inefficient, but hopefully collisions wouldn't happen often enough to be a
>>> big problem.
>
>> That might work, although it's obviously not pretty.  We could also just
>> default to some out-of-the-way address for MapViewOfFileEx, that might
>> also work.
>
> Could be.  Does Microsoft publish any documentation about the range of
> addresses their ASLR uses?
>

I have look around to find some information to see if there is any
such address range which could be used for our purpose.  I am not able
to see any such predictable address range.  You might want to read the
article [1] especially the text around "What is the memory address
space range in virtual memory map where system DLLs and user DLLs
could load?"   It seems to indicate that there is no such address
unless I have misunderstood it.  I don't deny the possibility of
having such an address range, but I could not find any info on the
same.

> Obviously, any such fix would be a lot more likely to be reliable in
> 64-bit machines.  There's probably not enough daylight to be sure of
> making it work in 32-bit Windows, so I suspect we'd need some retry
> logic anyway for that case.
>

Yeah, that kind of thing can work assuming we don't get conflicts too
often, but it could be possible that conflicts are not reported from
ASLR enabled environments because of commit 7f3e17b4.

[1] - https://blogs.msdn.microsoft.com/winsdk/2009/11/30/how-to-disable-address-space-layout-randomization-aslr/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Obviously, any such fix would be a lot more likely to be reliable in
>> 64-bit machines.  There's probably not enough daylight to be sure of
>> making it work in 32-bit Windows, so I suspect we'd need some retry
>> logic anyway for that case.

> Yeah, that kind of thing can work assuming we don't get conflicts too
> often, but it could be possible that conflicts are not reported from
> ASLR enabled environments because of commit 7f3e17b4.

Right, but Andres' point is that we should make an effort to undo that
hack and instead allow ASLR to happen.  Not just because it's allegedly
more secure, but because we may have no choice in future Windows versions.
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-19 10:15:31 -0400, Tom Lane wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Obviously, any such fix would be a lot more likely to be reliable in
> >> 64-bit machines.  There's probably not enough daylight to be sure of
> >> making it work in 32-bit Windows, so I suspect we'd need some retry
> >> logic anyway for that case.
> 
> > Yeah, that kind of thing can work assuming we don't get conflicts too
> > often, but it could be possible that conflicts are not reported from
> > ASLR enabled environments because of commit 7f3e17b4.
> 
> Right, but Andres' point is that we should make an effort to undo that
> hack and instead allow ASLR to happen.  Not just because it's allegedly
> more secure, but because we may have no choice in future Windows versions.

FWIW, I think it *also* might make us more secure, because addresses in
the postgres binary won't be predictable anymore.  Since most of the
windows binaries are built by one source, that's some advantage on its
own.

- Andres



Re: [HACKERS] OK, so culicidae is *still* broken

From
Amit Kapila
Date:
On Wed, Apr 19, 2017 at 9:01 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-19 10:15:31 -0400, Tom Lane wrote:
>> Amit Kapila <amit.kapila16@gmail.com> writes:
>> > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> Obviously, any such fix would be a lot more likely to be reliable in
>> >> 64-bit machines.  There's probably not enough daylight to be sure of
>> >> making it work in 32-bit Windows, so I suspect we'd need some retry
>> >> logic anyway for that case.
>>
>> > Yeah, that kind of thing can work assuming we don't get conflicts too
>> > often, but it could be possible that conflicts are not reported from
>> > ASLR enabled environments because of commit 7f3e17b4.
>>
>> Right, but Andres' point is that we should make an effort to undo that
>> hack and instead allow ASLR to happen.  Not just because it's allegedly
>> more secure, but because we may have no choice in future Windows versions.
>
> FWIW, I think it *also* might make us more secure, because addresses in
> the postgres binary won't be predictable anymore.
>

Agreed.  I have done some further study by using VMMap tool in Windows
and it seems to me that all 64-bit processes use address range
(0000000000010000 ~ 000007FFFFFE0000).  I have attached two screen
shots to show the address range (lower range and upper range).  You
need to refer the lower half of the window in attached screenshots.
At this stage, I am not completely sure whether we can assume some
address out of this range to use in MapViewOfFileEx.  Let me know your
thoughts.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-20 16:57:03 +0530, Amit Kapila wrote:
> On Wed, Apr 19, 2017 at 9:01 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-04-19 10:15:31 -0400, Tom Lane wrote:
> >> Amit Kapila <amit.kapila16@gmail.com> writes:
> >> > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> >> Obviously, any such fix would be a lot more likely to be reliable in
> >> >> 64-bit machines.  There's probably not enough daylight to be sure of
> >> >> making it work in 32-bit Windows, so I suspect we'd need some retry
> >> >> logic anyway for that case.
> >>
> >> > Yeah, that kind of thing can work assuming we don't get conflicts too
> >> > often, but it could be possible that conflicts are not reported from
> >> > ASLR enabled environments because of commit 7f3e17b4.
> >>
> >> Right, but Andres' point is that we should make an effort to undo that
> >> hack and instead allow ASLR to happen.  Not just because it's allegedly
> >> more secure, but because we may have no choice in future Windows versions.
> >
> > FWIW, I think it *also* might make us more secure, because addresses in
> > the postgres binary won't be predictable anymore.
> >
> 
> Agreed.  I have done some further study by using VMMap tool in Windows
> and it seems to me that all 64-bit processes use address range
> (0000000000010000 ~ 000007FFFFFE0000).  I have attached two screen
> shots to show the address range (lower range and upper range).  You
> need to refer the lower half of the window in attached screenshots.
> At this stage, I am not completely sure whether we can assume some
> address out of this range to use in MapViewOfFileEx.  Let me know your
> thoughts.

That seems like a fairly restricted range (good!), and squares with
memories of reading about this (can't find the reference anymore).  Just
using something like 0x00000F0000000000 as the address might work
sufficiently well.  What happens if you just hardcode that in the first
MapViewOfFileEx call, and change RandomizedBaseAddress="FALSE" in
src/tools/msvc/VCBuildProject.pm to true?

Greetings,

Andres Freund



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-20 16:57:03 +0530, Amit Kapila wrote:
>> Agreed.  I have done some further study by using VMMap tool in Windows
>> and it seems to me that all 64-bit processes use address range
>> (0000000000010000 ~ 000007FFFFFE0000).  I have attached two screen
>> shots to show the address range (lower range and upper range).  You
>> need to refer the lower half of the window in attached screenshots.
>> At this stage, I am not completely sure whether we can assume some
>> address out of this range to use in MapViewOfFileEx.  Let me know your
>> thoughts.

> That seems like a fairly restricted range (good!), and squares with
> memories of reading about this (can't find the reference anymore).  Just
> using something like 0x00000F0000000000 as the address might work
> sufficiently well.  What happens if you just hardcode that in the first
> MapViewOfFileEx call, and change RandomizedBaseAddress="FALSE" in
> src/tools/msvc/VCBuildProject.pm to true?

The material I found about Linux x86_64 addressing explains that the
underlying hardware address space is only 48 bits.  Linux chooses to
normalize this by sign-extending into the upper 16 bits, so that
valid 64-bit addresses are0000000000000000 .. 00007FFFFFFFFFFF
andFFFF800000000000 .. FFFFFFFFFFFFFFFF

If I'm counting the bits correctly, Windows is choosing to use only
1/16th of the lower half of the available address space, which seems
a bit odd.  However, the main point here is that there should be
quite a bit of available daylight, if they will let us use addresses
above 000007FFFFFE0000 at all.  I'd be inclined to do our own low-tech
ASLR by using addresses with a random component, say 00000Fxxxxxx0000.
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Amit Kapila
Date:
On Fri, Apr 21, 2017 at 12:55 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-20 16:57:03 +0530, Amit Kapila wrote:
>> On Wed, Apr 19, 2017 at 9:01 PM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2017-04-19 10:15:31 -0400, Tom Lane wrote:
>> >> Amit Kapila <amit.kapila16@gmail.com> writes:
>> >> > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> >> Obviously, any such fix would be a lot more likely to be reliable in
>> >> >> 64-bit machines.  There's probably not enough daylight to be sure of
>> >> >> making it work in 32-bit Windows, so I suspect we'd need some retry
>> >> >> logic anyway for that case.
>> >>
>> >> > Yeah, that kind of thing can work assuming we don't get conflicts too
>> >> > often, but it could be possible that conflicts are not reported from
>> >> > ASLR enabled environments because of commit 7f3e17b4.
>> >>
>> >> Right, but Andres' point is that we should make an effort to undo that
>> >> hack and instead allow ASLR to happen.  Not just because it's allegedly
>> >> more secure, but because we may have no choice in future Windows versions.
>> >
>> > FWIW, I think it *also* might make us more secure, because addresses in
>> > the postgres binary won't be predictable anymore.
>> >
>>
>> Agreed.  I have done some further study by using VMMap tool in Windows
>> and it seems to me that all 64-bit processes use address range
>> (0000000000010000 ~ 000007FFFFFE0000).  I have attached two screen
>> shots to show the address range (lower range and upper range).  You
>> need to refer the lower half of the window in attached screenshots.
>> At this stage, I am not completely sure whether we can assume some
>> address out of this range to use in MapViewOfFileEx.  Let me know your
>> thoughts.
>
> That seems like a fairly restricted range (good!), and squares with
> memories of reading about this (can't find the reference anymore).  Just
> using something like 0x00000F0000000000 as the address might work
> sufficiently well.  What happens if you just hardcode that in the first
> MapViewOfFileEx call, and change RandomizedBaseAddress="FALSE" in
> src/tools/msvc/VCBuildProject.pm to true?
>

The server start generates an error:
2017-04-24 12:02:05.771 IST [8940] FATAL:  could not create shared
memory segment: error code 87
2017-04-24 12:02:05.771 IST [8940] DETAIL:  Failed system call was
MapViewOfFileEx.

Error code 87 means "invalid parameter".  Some googling [1] indicates
such an error occurs if we pass the out-of-range address to
MapViewOfFileEx. Another possible theory is that we must pass the
address as multiple of the system's memory allocation granularity as
that is expected by specs of MapViewOfFileEx.  I can try doing that
but I am not confident that this is the right approach because of
below text mentioned in docs (msdn) of MapViewOfFileEx.
"While it is possible to specify an address that is safe now (not used
by the operating system), there is no guarantee that the address will
remain safe over time. Therefore, it is better to let the operating
system choose the address.".

I have printed the addresses that system automatically picks for
MapViewOfFileEx (3a20000, 3770000, 3520000, 3720000, 3730000) and they
seem to be in a range which I have posted up thread for 64-bit
processes.

Another thing I have tried is to just start the server by setting
RandomizedBaseAddress="TRUE".  I have tried about 15-20 times but
could not reproduce the problem related to shared memory attach.  We
have tried the same on one of my colleagues (Ashutosh Sharma) machine
as well, there we could see that error once or twice out of many tries
but couldn't get it consistently.  I think if the chances of this
problem to occur are so less, then probably the suggestion made by Tom
to retry if we get collision doesn't sound bad.

[1] - https://support.microsoft.com/en-us/help/125713/common-file-mapping-problems-and-platform-differences

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] OK, so culicidae is *still* broken

From
Craig Ringer
Date:
On 16 April 2017 at 05:18, Andres Freund <andres@anarazel.de> wrote:

> Because of ASLR of the main executable (i.e. something like PIE).  It'll
> supposedly become harder (as in only running in compatibility modes) if
> binaries don't enable that.  It's currently disabled somewhere in the VC
> project generated.

I thought we passed /DYNAMICBASE:NO directly , but I don't see that in
the code. A look at the git logs shows that we disabled it in
7f3e17b48 by emitting
<RandomizedBaseAddress>false</RandomizedBaseAddress> in the MSBuild
project. That'll pass /DYNAMICBASE:NO to the linker.

See https://msdn.microsoft.com/en-us/library/bb384887.aspx

It's rather better than the old registry hack, but it's a compat
option we're likely to lose at some point.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] OK, so culicidae is *still* broken

From
Craig Ringer
Date:
On 24 April 2017 at 16:55, Amit Kapila <amit.kapila16@gmail.com> wrote:

> Another thing I have tried is to just start the server by setting
> RandomizedBaseAddress="TRUE".  I have tried about 15-20 times but
> could not reproduce the problem related to shared memory attach.  We
> have tried the same on one of my colleagues (Ashutosh Sharma) machine
> as well, there we could see that error once or twice out of many tries
> but couldn't get it consistently.  I think if the chances of this
> problem to occur are so less, then probably the suggestion made by Tom
> to retry if we get collision doesn't sound bad.

It's pretty uncommon, and honestly, we might well be best off just
trying again if we lose the lottery.

Most of what I read last time I looked into this essentially assumed
that you'd "fix" your code by reinventing far pointers[1], like the
good old Win16 days. Assume all addresses in shmem are relative to the
shmem base, and offset them when accessing/storing them. Fun and
efficient for everyone! That seems to be what Boost recommends[2].

Given that Pg doesn't make much effort to differentiate between
pointers to shmem and local memory, and has no pointer transformations
between shared and local pointers, adopting that would be a
horrifyingly intrusive change as well as incredibly tedious to
implement. We'd probably land up using size_t or ptrdiff_t for shmem
pointers and some kind of macro that was a noop on !windows. For once
I'd be thoroughly in agreement with Tom's complaints about
Windows-droppings.

Other people who've faced and worked around this[3] have come up with
solutions that look way scarier than just retrying if we lose the
random numbers game.

BTW, some Windows users face issues with large contiguous
allocations/mappings even without the process sharing side[4] due to
memory fragmentation created by ASLR, though this may only be a
concern for 32-bit executables. The relevant option /LARGEADDRESSAWARE
is enabled by default for 64-bit builds.

We might want to /DELAYLOAD [5] DLLs where possible to improve our
chances of winning the dice roll, but if we're going to support
retrying at all we don't need to care too much.

I looked at image binding (prelinking), but it's disabled if ASLR is in use.

In the long run we'll probably be forced toward threading or far pointers.



[1] https://en.wikipedia.org/wiki/Far_pointer,
https://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer%5Fsizes

[2]
http://www.boost.org/doc/libs/1_64_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_region.mapped_region_address_mapping

[3] http://stackoverflow.com/a/36145019/398670

[4] https://github.com/golang/go/issues/2323

[5] On 24 April 2017 at 16:55, Amit Kapila <amit.kapila16@gmail.com>
wrote:   > Another thing I have tried is to just start the server by
setting  > RandomizedBaseAddress="TRUE".  I have tried about 15-20
times but  > could not reproduce the problem related to shared memory
attach.  We  > have tried the same on one of my colleagues (Ashutosh
Sharma) machine  > as well, there we could see that error once or
twice out of many tries  > but couldn't get it consistently.  I think
if the chances of this  > problem to occur are so less, then probably
the suggestion made by Tom  > to retry if we get collision doesn't
sound bad.   It's pretty uncommon, and honestly, we might well be best
off just trying again if we lose the lottery.   Most of what I read
last time I looked into this essentially assumed that you'd "fix" your
code by reinventing far pointers[1], like the good old Win16 days.
Assume all addresses in shmem are relative to the shmem base, and
offset them when accessing/storing them. Fun and efficient for
everyone! That seems to be what Boost recommends[2].   Given that Pg
doesn't make much effort to differentiate between pointers to shmem
and local memory, and has no pointer transformations between shared
and local pointers, adopting that would be a horrifyingly intrusive
change as well as incredibly tedious to implement. We'd probably land
up using size_t or ptrdiff_t for shmem pointers and some kind of macro
that was a noop on !windows. For once I'd be thoroughly in agreement
with Tom's complaints about Windows-droppings.   Other people who've
faced and worked around this[3] have come up with solutions that look
way scarier than just retrying if we lose the random numbers game.
BTW, some Windows users face issues with large contiguous
allocations/mappings even without the process sharing side[4] due to
memory fragmentation created by ASLR, though this may only be a
concern for 32-bit executables. The relevant option /LARGEADDRESSAWARE
is enabled by default for 64-bit builds.  We should /DELAYLOAD as many
DDLs as possible to improve our chances.   [1]
https://en.wikipedia.org/wiki/Far_pointer,
https://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer%5Fsizes   [2]

http://www.boost.org/doc/libs/1_64_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_region.mapped_region_address_mapping
[3]http://stackoverflow.com/a/36145019/398670   [4]
 
https://github.com/golang/go/issues/2323   --  Craig Ringer      http://www.2ndQuadrant.com/  PostgreSQL Development,
24x7
Support, Training & Services

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
Hi,

On 2017-04-24 14:25:34 +0530, Amit Kapila wrote:
> Error code 87 means "invalid parameter".  Some googling [1] indicates
> such an error occurs if we pass the out-of-range address to
> MapViewOfFileEx. Another possible theory is that we must pass the
> address as multiple of the system's memory allocation granularity as
> that is expected by specs of MapViewOfFileEx.  I can try doing that
> but I am not confident that this is the right approach because of
> below text mentioned in docs (msdn) of MapViewOfFileEx.
> "While it is possible to specify an address that is safe now (not used
> by the operating system), there is no guarantee that the address will
> remain safe over time. Therefore, it is better to let the operating
> system choose the address.".

Sure, that's the same as mmap() etc say. I'd not be overly deterred by
that.


On 2017-04-24 23:14:40 +0800, Craig Ringer wrote:
> In the long run we'll probably be forced toward threading or far pointers.

I'll vote for removing the windows port, before going for that.  And I'm
not even joking.


Greetings,

Andres Freund



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-24 23:14:40 +0800, Craig Ringer wrote:
>> In the long run we'll probably be forced toward threading or far pointers.

> I'll vote for removing the windows port, before going for that.  And I'm
> not even joking.

Me too.  We used to *have* that kind of code, ie relative pointers into
the shmem segment, and it was a tremendous notational mess and very
bug-prone.  I do not wish to go back.

(We have accepted that kind of overhead for DSM segments, but the
intention I think is to allow only very trivial data structures in
the DSM segments.  Losing compiler pointer type checking for data
structures like the lock or PGPROC tables would be horrid.)
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-24 14:43:11 -0400, Tom Lane wrote:
> (We have accepted that kind of overhead for DSM segments, but the
> intention I think is to allow only very trivial data structures in
> the DSM segments.  Losing compiler pointer type checking for data
> structures like the lock or PGPROC tables would be horrid.)

The relptr.h infrastructure brings some of the type-checking back, but
it's still pretty painful.  And just as important, it's quite noticeable
performance-wise.  So we have to do it for dynamic shm (until/unless we
go to using threads), but that doesn't mean we should do it for some of
the most performance critical data structures in PG...

- Andres



Re: [HACKERS] OK, so culicidae is *still* broken

From
Andres Freund
Date:
On 2017-04-24 11:08:48 -0700, Andres Freund wrote:
> On 2017-04-24 23:14:40 +0800, Craig Ringer wrote:
> > In the long run we'll probably be forced toward threading or far pointers.
> 
> I'll vote for removing the windows port, before going for that.  And I'm
> not even joking.

Just to clarify: I'm talking about far pointers here, not threading.



Re: [HACKERS] OK, so culicidae is *still* broken

From
Craig Ringer
Date:


On 25 Apr. 2017 02:51, "Andres Freund" <andres@anarazel.de> wrote:
On 2017-04-24 11:08:48 -0700, Andres Freund wrote:
> On 2017-04-24 23:14:40 +0800, Craig Ringer wrote:
> > In the long run we'll probably be forced toward threading or far pointers.
>
> I'll vote for removing the windows port, before going for that.  And I'm
> not even joking.

Just to clarify: I'm talking about far pointers here, not threading.

Yeah, I'm pretty unimpressed that the accepted solution seems to be to return to the early '90s.

Why don't platforms offer any sensible way to reserve a virtual address range at process creation time?

It looks like you need a minimal loader process that rebases its self in memory if it finds its self loaded in the desired area, then maps the required memory range and loads the real process. Hard to imagine that not causing more problems than it solves.

Re: [HACKERS] OK, so culicidae is *still* broken

From
Heikki Linnakangas
Date:
On 04/24/2017 09:50 PM, Andres Freund wrote:
> On 2017-04-24 14:43:11 -0400, Tom Lane wrote:
>> (We have accepted that kind of overhead for DSM segments, but the
>> intention I think is to allow only very trivial data structures in
>> the DSM segments.  Losing compiler pointer type checking for data
>> structures like the lock or PGPROC tables would be horrid.)
>
> The relptr.h infrastructure brings some of the type-checking back, but
> it's still pretty painful.  And just as important, it's quite noticeable
> performance-wise.  So we have to do it for dynamic shm (until/unless we
> go to using threads), but that doesn't mean we should do it for some of
> the most performance critical data structures in PG...

Agreed.

For some data shared memory structures, that store no pointers, we 
wouldn't need to insist that they are mapped to the same address in 
every backend, though. In particular, shared_buffers. It wouldn't 
eliminate the problem, though, only make it less likely, so we'd still 
need to retry when it does happen.

- Heikki




Re: [HACKERS] OK, so culicidae is *still* broken

From
Craig Ringer
Date:


On 25 Apr. 2017 13:37, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:

For some data shared memory structures, that store no pointers, we wouldn't need to insist that they are mapped to the same address in every backend, though. In particular, shared_buffers. It wouldn't eliminate the problem, though, only make it less likely, so we'd still need to retry when it does happen.

Good point. Simply splitting out shared_buffers into a moveable segment would make a massive difference. Much less chance of losing the dice roll for mapping the fixed segment. 

Should look at what else could be made cheaply relocatable too.

Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> On 25 Apr. 2017 13:37, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:
>> For some data shared memory structures, that store no pointers, we wouldn't
>> need to insist that they are mapped to the same address in every backend,
>> though. In particular, shared_buffers. It wouldn't eliminate the problem,
>> though, only make it less likely, so we'd still need to retry when it does
>> happen.

> Good point. Simply splitting out shared_buffers into a moveable segment
> would make a massive difference. Much less chance of losing the dice roll
> for mapping the fixed segment.

> Should look at what else could be made cheaply relocatable too.

I don't think it's worth spending any effort on.  We need the retry
code anyway, and making it near impossible to hit that would only mean
that it's very poorly tested.  The results upthread say that it isn't
going to be hit often enough to be a performance problem, so why worry?
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Craig Ringer
Date:
On 25 April 2017 at 22:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Craig Ringer <craig@2ndquadrant.com> writes:
>> On 25 Apr. 2017 13:37, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:
>>> For some data shared memory structures, that store no pointers, we wouldn't
>>> need to insist that they are mapped to the same address in every backend,
>>> though. In particular, shared_buffers. It wouldn't eliminate the problem,
>>> though, only make it less likely, so we'd still need to retry when it does
>>> happen.
>
>> Good point. Simply splitting out shared_buffers into a moveable segment
>> would make a massive difference. Much less chance of losing the dice roll
>> for mapping the fixed segment.
>
>> Should look at what else could be made cheaply relocatable too.
>
> I don't think it's worth spending any effort on.  We need the retry
> code anyway, and making it near impossible to hit that would only mean
> that it's very poorly tested.  The results upthread say that it isn't
> going to be hit often enough to be a performance problem, so why worry?

Good point. Deal with it if it becomes an issue.

That said, I didn't see if any of those tests covered really big
shared_buffers. That could become an issue down the track at least.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] OK, so culicidae is *still* broken

From
Amit Kapila
Date:
On Tue, Apr 25, 2017 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Craig Ringer <craig@2ndquadrant.com> writes:
>> On 25 Apr. 2017 13:37, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:
>>> For some data shared memory structures, that store no pointers, we wouldn't
>>> need to insist that they are mapped to the same address in every backend,
>>> though. In particular, shared_buffers. It wouldn't eliminate the problem,
>>> though, only make it less likely, so we'd still need to retry when it does
>>> happen.
>
>> Good point. Simply splitting out shared_buffers into a moveable segment
>> would make a massive difference. Much less chance of losing the dice roll
>> for mapping the fixed segment.
>
>> Should look at what else could be made cheaply relocatable too.
>
> I don't think it's worth spending any effort on.  We need the retry
> code anyway, and making it near impossible to hit that would only mean
> that it's very poorly tested.  The results upthread say that it isn't
> going to be hit often enough to be a performance problem, so why worry?
>

Yeah, that's right.  Today, I have spent some time to analyze how and
where retry logic is required.  I think there are two places where we
need this retry logic, one is if we fail to reserve the memory
(pgwin32_ReserveSharedMemoryRegion) and second is if we fail to
reattach (PGSharedMemoryReAttach). If we fail during reserve memory
operation, then we can simply terminate the process and recreate it
again, basically, add some kind of loop in internal_forkexec(). OTOH,
if we fail during reattach, I think we need an exit from the process
which means it can lead to a restart of all the processes unless we
want to add some special logic for handling process exit or
alternatively we might want to just try reattach operation in a loop
before giving up. Do we want to keep this retry logic for a certain
number of times say (10 times) or we want to try indefinitely?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] OK, so culicidae is *still* broken

From
Tom Lane
Date:
Amit Kapila <amit.kapila16@gmail.com> writes:
> Yeah, that's right.  Today, I have spent some time to analyze how and
> where retry logic is required.  I think there are two places where we
> need this retry logic, one is if we fail to reserve the memory
> (pgwin32_ReserveSharedMemoryRegion) and second is if we fail to
> reattach (PGSharedMemoryReAttach).

I'm not following.  The point of the reserve operation is to ensure
that the reattach will work.  What makes you think we need to add more
code at that end?
        regards, tom lane



Re: [HACKERS] OK, so culicidae is *still* broken

From
Amit Kapila
Date:
On Mon, May 1, 2017 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> Yeah, that's right.  Today, I have spent some time to analyze how and
>> where retry logic is required.  I think there are two places where we
>> need this retry logic, one is if we fail to reserve the memory
>> (pgwin32_ReserveSharedMemoryRegion) and second is if we fail to
>> reattach (PGSharedMemoryReAttach).
>
> I'm not following.  The point of the reserve operation is to ensure
> that the reattach will work.  What makes you think we need to add more
> code at that end?
>

We use VirtualAllocEx to allocate memory at a pre-defined address
(reserve operation) which can fail due to ASLR.  One recent example of
such a symptom is seen on pgsql-bugs [1], that failure is during
reserve operation and seems like something related to ASLR.  Another
point is the commit 7f3e17b4827b61ad84e0774e3e43da4c57c4487f (It
doesn't fail every time (which is explained by the Random part in
ASLR), but can fail with errors abut failing to reserve shared memory
region) also indicates that we can fail to reserve memory due to ASLR.


[1] - https://www.postgresql.org/message-id/14121.1485360296%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] OK, so culicidae is *still* broken

From
Noah Misch
Date:
On Sat, Apr 15, 2017 at 02:30:18PM -0700, Andres Freund wrote:
> On 2017-04-15 17:24:54 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2017-04-15 17:09:38 -0400, Tom Lane wrote:
> > >> Why doesn't Windows' ability to map the segment into the new process
> > >> before it executes take care of that?
> > 
> > > Because of ASLR of the main executable (i.e. something like PIE).
> > 
> > Not following.  Are you saying that the main executable gets mapped into
> > the process address space immediately, but shared libraries are not?

At the time of the pgwin32_ReserveSharedMemoryRegion() call, the child process
contains only ntdll.dll and the executable.

> Without PIE/ASLR we can somewhat rely on pgwin32_ReserveSharedMemoryRegion
> to find the space that PGSharedMemoryCreate allocated still unoccupied.

I've never had access to a Windows system that can reproduce the fork
failures.  My best theory is that antivirus or similar software injects an
additional DLL at that early stage.

> > I wonder whether we could work around that by just destroying the created
> > process and trying again if we get a collision.  It'd be a tad
> > inefficient, but hopefully collisions wouldn't happen often enough to be a
> > big problem.
> 
> That might work, although it's obviously not pretty.

I didn't like that idea when Michael proposed it in 2015.  Since disabling
ASLR on the exe proved insufficient, I do like it now.  It degrades nicely; if
something raises the collision rate from 1% to 10%, that just looks like fork
latency degrading.

> We could also just
> default to some out-of-the-way address for MapViewOfFileEx, that might
> also work.

Dave Vitek proposed that:
https://www.postgresql.org/message-id/flat/5046CAEB.4010600%40grammatech.com

I estimate more risk than retrying forks.  There's no guarantee that a fixed
address helpful today won't make the collisions worse after the next Windows
security patch.