Thread: Windows service is not starting so there’smessage in log: FATAL: "could not create shared memory segment“Global/PostgreSQL.851401618”: Permissiondenied”

 I’ve created 2 unprivileged users: user1 and user2; and registered 2
services thorough pg_ctl register respectively under these 2 users. If
first service is running second could not start and vice versa.


----
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




I think that function dsm_impl_windows() with EACCES error should not
do ereport() with FATAL level. It works, but it is likely to make an
infinite loop if the user will receive EACCES error.

On Чт, 2015-10-15 at 17:46 +0300, Dmitry Vasilyev wrote:
>  I’ve created 2 unprivileged users: user1 and user2; and registered 2
> services thorough pg_ctl register respectively under these 2 users.
> If
> first service is running second could not start and vice versa.
>
>
> ----
> Dmitry Vasilyev
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>
>
>
--


----
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment
On Thu, Oct 15, 2015 at 8:35 PM, Dmitry Vasilyev <d.vasilyev@postgrespro.ru> wrote:
I think that function dsm_impl_windows() with EACCES error should not
do ereport() with FATAL level. It works, but it is likely to make an
infinite loop if the user will receive EACCES error.


Currently we are using error level as ERROR for creating dsm during
postmaster startup which is not right and rather we should use error
level as LOG.  Can you please try with the attached patch and see
if the issue is fixed for you.

Another some what related point is currently we are using random()
function to ensure a unique name for dsm and it seems to me that
it is always going to generate same number on first invocation (at least
thats what happening on windows) due to which you are seeing the
error.  Another options could be to append current pid or data directory
path as we are doing in win32_shmem.c.  I think this could be an
optimization which can be done in addition to the fix attached (we can
do this as a separate patch as well, if we agreed to do anything).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment
Hello,

> Currently we are using error level as ERROR for creating dsm during
> postmaster startup which is not right and rather we should use error
> level as LOG.  Can you please try with the attached patch and see
> if the issue is fixed for you.

It seems somewhat strange. Looking into the create operation in
dms_impl_posix(), it should return false but not emit error when
the shared memory object already exists.

So, to make the windows version behave as the same,
dsm_impl_windows should return false if GetLastError() ==
ERROR_ALREADY_EXISTS regardless of hmap is valid. The current
behavior is wrong because two or more postmaster *share* the same
DSM segment instead of having their own segments.

On the other hand, in the case of GetLastError() ==
ERROR_ACCESS_DENIED, which occurs between postmasters exexuted as
service, it can reasonablly be assumed that the segment is
already created by someone else. I think this is no problem
because postgres processes don't need to access someone else's
segments.

Finally, the valid fix would be that make it return false if
GetLastError() == ERROR_ALREADY_EXISTS or ERROR_ACCESS_DENIED.

The patch attached will fix *both of* the problems.


regards,


At Fri, 16 Oct 2015 09:02:41 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1+pjtDouF-Lc9y0UgFDmWHnaf4KwiM1Y3Anq0wZ1gwsAA@mail.gmail.com>
> > I think that function dsm_impl_windows() with EACCES error should not
> > do ereport() with FATAL level. It works, but it is likely to make an
> > infinite loop if the user will receive EACCES error.
> >
> >
> Currently we are using error level as ERROR for creating dsm during
> postmaster startup which is not right and rather we should use error
> level as LOG.  Can you please try with the attached patch and see
> if the issue is fixed for you.
>
> Another some what related point is currently we are using random()
> function to ensure a unique name for dsm and it seems to me that
> it is always going to generate same number on first invocation (at least
> thats what happening on windows) due to which you are seeing the
> error.  Another options could be to append current pid or data directory
> path as we are doing in win32_shmem.c.  I think this could be an
> optimization which can be done in addition to the fix attached (we can
> do this as a separate patch as well, if we agreed to do anything).

--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 921f029..1070812 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -671,6 +671,7 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
     {
         DWORD        size_high;
         DWORD        size_low;
+        DWORD        errcode;

         /* Shifts >= the width of the type are undefined. */
 #ifdef _WIN64
@@ -686,9 +687,21 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
                                  size_high,        /* Upper 32 bits of size */
                                  size_low,        /* Lower 32 bits of size */
                                  name);
+
+        /*
+         * If the mapping already exists, CreateFileMapping returns a valid
+         * handle but sets ERROR_ALREADY_EXISTS. Return false for the case.
+         * CreateFileMapping may return ERROR_ACCESS_DENIED for the cases when
+         * running as a service. It also tells that the mapping is already
+         * created by someone else, so return false.
+         */
+        errcode = GetLastError();
+        if (errcode == ERROR_ALREADY_EXISTS || errcode == ERROR_ACCESS_DENIED)
+          return false;
+
         if (!hmap)
         {
-            _dosmaperr(GetLastError());
+            _dosmaperr(errcode);
             ereport(elevel,
                     (errcode_for_dynamic_shared_memory(),
                   errmsg("could not create shared memory segment \"%s\": %m",

Sorry, forgot to close the valid handle on return.



> > Currently we are using error level as ERROR for creating dsm during
> > postmaster startup which is not right and rather we should use error
> > level as LOG.  Can you please try with the attached patch and see
> > if the issue is fixed for you.
>
> It seems somewhat strange. Looking into the create operation in
> dms_impl_posix(), it should return false but not emit error when
> the shared memory object already exists.
>
> So, to make the windows version behave as the same,
> dsm_impl_windows should return false if GetLastError() ==
> ERROR_ALREADY_EXISTS regardless of hmap is valid. The current
> behavior is wrong because two or more postmaster *share* the same
> DSM segment instead of having their own segments.
>
> On the other hand, in the case of GetLastError() ==
> ERROR_ACCESS_DENIED, which occurs between postmasters exexuted as
> service, it can reasonablly be assumed that the segment is
> already created by someone else. I think this is no problem
> because postgres processes don't need to access someone else's
> segments.
>
> Finally, the valid fix would be that make it return false if
> GetLastError() == ERROR_ALREADY_EXISTS or ERROR_ACCESS_DENIED.
>
> The patch attached will fix *both of* the problems.
>
>
> regards,
>
>
> At Fri, 16 Oct 2015 09:02:41 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1+pjtDouF-Lc9y0UgFDmWHnaf4KwiM1Y3Anq0wZ1gwsAA@mail.gmail.com>
> > > I think that function dsm_impl_windows() with EACCES error should not
> > > do ereport() with FATAL level. It works, but it is likely to make an
> > > infinite loop if the user will receive EACCES error.
> > >
> > >
> > Currently we are using error level as ERROR for creating dsm during
> > postmaster startup which is not right and rather we should use error
> > level as LOG.  Can you please try with the attached patch and see
> > if the issue is fixed for you.
> >
> > Another some what related point is currently we are using random()
> > function to ensure a unique name for dsm and it seems to me that
> > it is always going to generate same number on first invocation (at least
> > thats what happening on windows) due to which you are seeing the
> > error.  Another options could be to append current pid or data directory
> > path as we are doing in win32_shmem.c.  I think this could be an
> > optimization which can be done in addition to the fix attached (we can
> > do this as a separate patch as well, if we agreed to do anything).

--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 921f029..8ef9898 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -671,6 +671,7 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
     {
         DWORD        size_high;
         DWORD        size_low;
+        DWORD        errcode;

         /* Shifts >= the width of the type are undefined. */
 #ifdef _WIN64
@@ -686,27 +687,31 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
                                  size_high,        /* Upper 32 bits of size */
                                  size_low,        /* Lower 32 bits of size */
                                  name);
+
+        /*
+         * If the mapping already exists, CreateFileMapping returns a valid
+         * handle but sets ERROR_ALREADY_EXISTS. Return false for the case.
+         * CreateFileMapping may return ERROR_ACCESS_DENIED for the cases when
+         * running as a service. It also tells that the mapping is already
+         * created by someone else, so return false.
+         */
+        errcode = GetLastError();
+        if (errcode == ERROR_ALREADY_EXISTS || errcode == ERROR_ACCESS_DENIED)
+        {
+            if (hmap)
+                CloseHandle(hmap);
+          return false;
+        }
+
         if (!hmap)
         {
-            _dosmaperr(GetLastError());
+            _dosmaperr(errcode);
             ereport(elevel,
                     (errcode_for_dynamic_shared_memory(),
                   errmsg("could not create shared memory segment \"%s\": %m",
                          name)));
             return false;
         }
-        _dosmaperr(GetLastError());
-        if (errno == EEXIST)
-        {
-            /*
-             * On Windows, when the segment already exists, a handle for the
-             * existing segment is returned.  We must close it before
-             * returning.  We don't do _dosmaperr here, so errno won't be
-             * modified.
-             */
-            CloseHandle(hmap);
-            return false;
-        }
     }
     else
     {

This is wrong, current code does well for this case. I should
broke the code during investigating the problem.

> > So, to make the windows version behave as the same,
> > dsm_impl_windows should return false if GetLastError() ==
> > ERROR_ALREADY_EXISTS regardless of hmap is valid. The current
> > behavior is wrong because two or more postmaster *share* the same
> > DSM segment instead of having their own segments.

> > The patch attached will fix *both of* the problems.

So, the patch fixes only the "Permission denied" case.

regards,


At Fri, 16 Oct 2015 14:37:47 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20151016.143747.121283630.horiguchi.kyotaro@lab.ntt.co.jp>
> > > Currently we are using error level as ERROR for creating dsm during
> > > postmaster startup which is not right and rather we should use error
> > > level as LOG.  Can you please try with the attached patch and see
> > > if the issue is fixed for you.
> >
> > It seems somewhat strange. Looking into the create operation in
> > dms_impl_posix(), it should return false but not emit error when
> > the shared memory object already exists.
> >
> > So, to make the windows version behave as the same,
> > dsm_impl_windows should return false if GetLastError() ==
> > ERROR_ALREADY_EXISTS regardless of hmap is valid. The current
> > behavior is wrong because two or more postmaster *share* the same
> > DSM segment instead of having their own segments.
> >
> > On the other hand, in the case of GetLastError() ==
> > ERROR_ACCESS_DENIED, which occurs between postmasters exexuted as
> > service, it can reasonablly be assumed that the segment is
> > already created by someone else. I think this is no problem
> > because postgres processes don't need to access someone else's
> > segments.
> >
> > Finally, the valid fix would be that make it return false if
> > GetLastError() == ERROR_ALREADY_EXISTS or ERROR_ACCESS_DENIED.
> >
> > The patch attached will fix *both of* the problems.
> >
> >
> > regards,
> >
> >
> > At Fri, 16 Oct 2015 09:02:41 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1+pjtDouF-Lc9y0UgFDmWHnaf4KwiM1Y3Anq0wZ1gwsAA@mail.gmail.com>
> > > > I think that function dsm_impl_windows() with EACCES error should not
> > > > do ereport() with FATAL level. It works, but it is likely to make an
> > > > infinite loop if the user will receive EACCES error.
> > > >
> > > >
> > > Currently we are using error level as ERROR for creating dsm during
> > > postmaster startup which is not right and rather we should use error
> > > level as LOG.  Can you please try with the attached patch and see
> > > if the issue is fixed for you.
> > >
> > > Another some what related point is currently we are using random()
> > > function to ensure a unique name for dsm and it seems to me that
> > > it is always going to generate same number on first invocation (at least
> > > thats what happening on windows) due to which you are seeing the
> > > error.  Another options could be to append current pid or data directory
> > > path as we are doing in win32_shmem.c.  I think this could be an
> > > optimization which can be done in addition to the fix attached (we can
> > > do this as a separate patch as well, if we agreed to do anything).
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center




On Fri, Oct 16, 2015 at 12:16 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> This is wrong, current code does well for this case. I should
> broke the code during investigating the problem.
>
> > > So, to make the windows version behave as the same,
> > > dsm_impl_windows should return false if GetLastError() ==
> > > ERROR_ALREADY_EXISTS regardless of hmap is valid. The current
> > > behavior is wrong because two or more postmaster *share* the same
> > > DSM segment instead of having their own segments.
>
> > > The patch attached will fix *both of* the problems.
>
> So, the patch fixes only the "Permission denied" case.
>

Why do you think it is bad to display even log for "Permission denied"
case?
It seems all other implementations does the same and I think it is
useful information and we should log it.  Don't you think the patch
sent by me is good enough to fix the reported issue?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Пт, 2015-10-16 at 09:02 +0530, Amit Kapila wrote:
> On Thu, Oct 15, 2015 at 8:35 PM, Dmitry Vasilyev <d.vasilyev@postgres
> pro.ru> wrote:
> > I think that function dsm_impl_windows() with EACCES error should
> > not
> > do ereport() with FATAL level. It works, but it is likely to make
> > an
> > infinite loop if the user will receive EACCES error.
> >
> >
> Currently we are using error level as ERROR for creating dsm during
> postmaster startup which is not right and rather we should use error
> level as LOG.  Can you please try with the attached patch and see
> if the issue is fixed for you.
>
> Another some what related point is currently we are using random()
> function to ensure a unique name for dsm and it seems to me that
> it is always going to generate same number on first invocation (at
> least
> thats what happening on windows) due to which you are seeing the
> error.  Another options could be to append current pid or data
> directory
> path as we are doing in win32_shmem.c.  I think this could be an
> optimization which can be done in addition to the fix attached (we
> can
> do this as a separate patch as well, if we agreed to do anything).
>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com


In that case your patch is working. There was LOG level message with Permission denied (from the operation point it’s
notclear how to respond to message like this). 
But as I understand if operating system can’t create shared memory we will try to distinguish that chunk up to
infinity.
As I see it’s better to stop instance in this case.
So I guess it’s a good idea to remain all as it is now (FATAL level) but add PostmasterPid to name (as you suggest).
There’spatch in attachments. 

----
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment
On Thu, Oct 15, 2015 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Another some what related point is currently we are using random()
> function to ensure a unique name for dsm and it seems to me that
> it is always going to generate same number on first invocation (at least
> thats what happening on windows) due to which you are seeing the
> error.  Another options could be to append current pid or data directory
> path as we are doing in win32_shmem.c.  I think this could be an
> optimization which can be done in addition to the fix attached (we can
> do this as a separate patch as well, if we agreed to do anything).

Maybe we need to be using PostmasterRandom() rather than random() for
the control segment name.

But regardless, this patch isn't the right fix.
dsm_impl_op(DSM_OP_CREATE, ...) is supposed to return false in the
event of a segment-already-exists condition, without ereporting.  If
it hits any OTHER error, then it should ereport().  In the Windows
implementation, the code that caters to this is here:

        if (errno == EEXIST)
        {
            /*
             * On Windows, when the segment already exists, a handle for the
             * existing segment is returned.  We must close it before
             * returning.  We don't do _dosmaperr here, so errno won't be
             * modified.
             */
            CloseHandle(hmap);
            return false;
        }

Kyotaro Horiguchi's analysis seems to me to be going in about the
right direction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Sat, Oct 17, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Oct 15, 2015 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Another some what related point is currently we are using random()
> > function to ensure a unique name for dsm and it seems to me that
> > it is always going to generate same number on first invocation (at least
> > thats what happening on windows) due to which you are seeing the
> > error.  Another options could be to append current pid or data directory
> > path as we are doing in win32_shmem.c.  I think this could be an
> > optimization which can be done in addition to the fix attached (we can
> > do this as a separate patch as well, if we agreed to do anything).
>
> Maybe we need to be using PostmasterRandom() rather than random() for
> the control segment name.
>

+1.  Though I think it is better to investigate the actual cause before doing
this.

> But regardless, this patch isn't the right fix.
> dsm_impl_op(DSM_OP_CREATE, ...) is supposed to return false in the
> event of a segment-already-exists condition, without ereporting.
>

Thats right, but in this case, it seems from what is reported, that we are
hitting Access Denied error which on retry seems to disappear (which
ideally shouldn't happen).  It's not clear to me why that is happening.

By reading code, it appears that previously when we get such errors
for main shared memory, we replaced the usage of 'Global\PostreSQL:..'
to 'Global/PostgreSQL:.. (refer GetSharedMemName()), but in case of
dsm we have already taken care of same, so not sure what else could
be reason for Access Denied error.

Dmitry, can we try to see what is the exact value of GetLastError()
when it fails (one way is to check the logs at DEBUG5 level,
_dosmaperr() will print that value or you can modify the code to see it.)


Which Windows version you are using?

With Regards,
Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Sat, Oct 17, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Maybe we need to be using PostmasterRandom() rather than random() for
>> the control segment name.

> +1.  Though I think it is better to investigate the actual cause before
> doing this.

BackendRun() deliberately prevents that from working.  And it also sets
srandom() to a new value for each subprocess, so that AFAICS this idea
would be a net negative.  If you are seeing duplicate key values getting
selected, the problem is elsewhere.

            regards, tom lane



On Sun, Oct 18, 2015 at 1:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> On Sat, Oct 17, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Maybe we need to be using PostmasterRandom() rather than random() for
>>> the control segment name.
>
>> +1.  Though I think it is better to investigate the actual cause before
>> doing this.
>
> BackendRun() deliberately prevents that from working.  And it also sets
> srandom() to a new value for each subprocess, so that AFAICS this idea
> would be a net negative.  If you are seeing duplicate key values getting
> selected, the problem is elsewhere.

Coming back to an old thread, recently I got a problem in starting two
PostgreSQL services with a user that is not an administrator. The error
message is as follows.

FATAL:  could not create shared memory segment
"Global/PostgreSQL.851401618": Permission denied

The issue is happening only with the processes that are running as service.
I observed that the handle received in creating the dynamic shared memory
is same for two services, because of which the Access denied error is thrown
by the operating system and thus it leads to failure.

The PG shared memory name is always includes the data directory path as
below, because of which it doesn't match with two services.

"Global/PostgreSQL:C:/work/FEP/installation/bin/data"

But whereas the dynamic shared memory is formed with a random number
and this number getting generated same for two service thus it leads to
failure.

"Global/PostgreSQL.85140161"

I tried replacing the random() with PostmaterRandom() for a test and it worked.
This is generating different random values, so the issue is not occurring.

"Global/PostgreSQL.2115609797"

I feel, we should add the the data directory path + the random number to
generate the name for dynamic shared memory, this can fix problem.

comments?

Regards,
Hari Babu
Fujitsu Australia



On Wed, Mar 9, 2016 at 11:46 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> I tried replacing the random() with PostmaterRandom() for a test and it worked.
> This is generating different random values, so the issue is not occurring.
>
> "Global/PostgreSQL.2115609797"
>
> I feel, we should add the the data directory path + the random number to
> generate the name for dynamic shared memory, this can fix problem.
>

As mentioned above, I think if we can investigate why this error is generated, that will be helpful.  Currently the code ensures that if the segment already exists, it should retry to create a segment with other name (refer dsm_impl_windows()), so the point of investigation is, why it is not going via that path?  I am guessing due to some reason CreateFileMapping() is returning NULL in this case whereas ideally it should return the existing handle with an error ERROR_ALREADY_EXISTS.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 9, 2016 at 10:06 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Mar 9, 2016 at 11:46 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>>
>> I tried replacing the random() with PostmaterRandom() for a test and it
>> worked.
>> This is generating different random values, so the issue is not occurring.
>>
>> "Global/PostgreSQL.2115609797"
>>
>> I feel, we should add the the data directory path + the random number to
>> generate the name for dynamic shared memory, this can fix problem.
>>
>
> As mentioned above, I think if we can investigate why this error is
> generated, that will be helpful.  Currently the code ensures that if the
> segment already exists, it should retry to create a segment with other name
> (refer dsm_impl_windows()), so the point of investigation is, why it is not
> going via that path?  I am guessing due to some reason CreateFileMapping()
> is returning NULL in this case whereas ideally it should return the existing
> handle with an error ERROR_ALREADY_EXISTS.

DEBUG:  mapped win32 error code 5 to 13

Yes, the CreateFileMapping() is returning NULL with an error of
ERROR_ACCESS_DENIED.
I am not able to find the reason for this error. This error is occurring only
when the PostgreSQL is started as a service only.

Regards,
Hari Babu
Fujitsu Australia



On Wed, Mar 9, 2016 at 7:16 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> On Wed, Mar 9, 2016 at 10:06 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Wed, Mar 9, 2016 at 11:46 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
>> wrote:
>>>
>>>
>>> I tried replacing the random() with PostmaterRandom() for a test and it
>>> worked.
>>> This is generating different random values, so the issue is not occurring.
>>>
>>> "Global/PostgreSQL.2115609797"
>>>
>>> I feel, we should add the the data directory path + the random number to
>>> generate the name for dynamic shared memory, this can fix problem.
>>>
>>
>> As mentioned above, I think if we can investigate why this error is
>> generated, that will be helpful.  Currently the code ensures that if the
>> segment already exists, it should retry to create a segment with other name
>> (refer dsm_impl_windows()), so the point of investigation is, why it is not
>> going via that path?  I am guessing due to some reason CreateFileMapping()
>> is returning NULL in this case whereas ideally it should return the existing
>> handle with an error ERROR_ALREADY_EXISTS.
>
> DEBUG:  mapped win32 error code 5 to 13
>
> Yes, the CreateFileMapping() is returning NULL with an error of
> ERROR_ACCESS_DENIED.
> I am not able to find the reason for this error. This error is occurring only
> when the PostgreSQL is started as a service only.

Another question is: why are both postmasters returning the same
random number?  That's not very, uh, random.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Mar 10, 2016 at 5:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 9, 2016 at 7:16 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> On Wed, Mar 9, 2016 at 10:06 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Wed, Mar 9, 2016 at 11:46 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
>>> wrote:
>>>>
>>>>
>>>> I tried replacing the random() with PostmaterRandom() for a test and it
>>>> worked.
>>>> This is generating different random values, so the issue is not occurring.
>>>>
>>>> "Global/PostgreSQL.2115609797"
>>>>
>>>> I feel, we should add the the data directory path + the random number to
>>>> generate the name for dynamic shared memory, this can fix problem.
>>>>
>>>
>>> As mentioned above, I think if we can investigate why this error is
>>> generated, that will be helpful.  Currently the code ensures that if the
>>> segment already exists, it should retry to create a segment with other name
>>> (refer dsm_impl_windows()), so the point of investigation is, why it is not
>>> going via that path?  I am guessing due to some reason CreateFileMapping()
>>> is returning NULL in this case whereas ideally it should return the existing
>>> handle with an error ERROR_ALREADY_EXISTS.
>>
>> DEBUG:  mapped win32 error code 5 to 13
>>
>> Yes, the CreateFileMapping() is returning NULL with an error of
>> ERROR_ACCESS_DENIED.
>> I am not able to find the reason for this error. This error is occurring only
>> when the PostgreSQL is started as a service only.
>
> Another question is: why are both postmasters returning the same
> random number?  That's not very, uh, random.

The random number is generated from our own implementation of
random function. The random function internally calls the pg_lrand48
function to get the random value. This function generates the random
number based on specified random seed and pre-defined calculations.
Because of this reason, the same random number is getting generated
every time.

In LInux, the random function is used from the glibc, there also it is
generating the same random number as the first number, but if the
number is used by some process then it is generating a different random
number for the next PostgreSQL process.

Regards,
Hari Babu
Fujitsu Australia



On Wed, Mar 9, 2016 at 5:46 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> On Wed, Mar 9, 2016 at 10:06 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Mar 9, 2016 at 11:46 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
> > wrote:
> >>
> >>
> >> I tried replacing the random() with PostmaterRandom() for a test and it
> >> worked.
> >> This is generating different random values, so the issue is not occurring.
> >>
> >> "Global/PostgreSQL.2115609797"
> >>
> >> I feel, we should add the the data directory path + the random number to
> >> generate the name for dynamic shared memory, this can fix problem.
> >>
> >
> > As mentioned above, I think if we can investigate why this error is
> > generated, that will be helpful.  Currently the code ensures that if the
> > segment already exists, it should retry to create a segment with other name
> > (refer dsm_impl_windows()), so the point of investigation is, why it is not
> > going via that path?  I am guessing due to some reason CreateFileMapping()
> > is returning NULL in this case whereas ideally it should return the existing
> > handle with an error ERROR_ALREADY_EXISTS.
>
> DEBUG:  mapped win32 error code 5 to 13
>
> Yes, the CreateFileMapping() is returning NULL with an error of
> ERROR_ACCESS_DENIED.
>

Okay, so one probable theory for such an error could be that when there is already an object with same name exists, this API requests access to the that existing object and found that it can't access it due to some reason.  On googling, I found some people suggesting to try by disabling UAC [1] on your m/c, can you once try that to see what is the result (this experiment is just to find out the actual reason of failure, rather than a permanent change suggestion).

 
>
> I am not able to find the reason for this error. This error is occurring only
> when the PostgreSQL is started as a service only.
>

Did you use pg_ctl register/unregister to register different services.  Can you share the detail steps and OS version on which you saw this behaviour?

On Fri, Mar 11, 2016 at 12:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Mar 9, 2016 at 5:46 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>> On Wed, Mar 9, 2016 at 10:06 PM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > On Wed, Mar 9, 2016 at 11:46 AM, Haribabu Kommi
>> > <kommi.haribabu@gmail.com>
>> > wrote:
>> >>
>> >>
>> >> I tried replacing the random() with PostmaterRandom() for a test and it
>> >> worked.
>> >> This is generating different random values, so the issue is not
>> >> occurring.
>> >>
>> >> "Global/PostgreSQL.2115609797"
>> >>
>> >> I feel, we should add the the data directory path + the random number
>> >> to
>> >> generate the name for dynamic shared memory, this can fix problem.
>> >>
>> >
>> > As mentioned above, I think if we can investigate why this error is
>> > generated, that will be helpful.  Currently the code ensures that if the
>> > segment already exists, it should retry to create a segment with other
>> > name
>> > (refer dsm_impl_windows()), so the point of investigation is, why it is
>> > not
>> > going via that path?  I am guessing due to some reason
>> > CreateFileMapping()
>> > is returning NULL in this case whereas ideally it should return the
>> > existing
>> > handle with an error ERROR_ALREADY_EXISTS.
>>
>> DEBUG:  mapped win32 error code 5 to 13
>>
>> Yes, the CreateFileMapping() is returning NULL with an error of
>> ERROR_ACCESS_DENIED.
>>
>
> Okay, so one probable theory for such an error could be that when there is
> already an object with same name exists, this API requests access to the
> that existing object and found that it can't access it due to some reason.
> On googling, I found some people suggesting to try by disabling UAC [1] on
> your m/c, can you once try that to see what is the result (this experiment
> is just to find out the actual reason of failure, rather than a permanent
> change suggestion).

Thanks for the details. Currently I am unable to change the UAC settings in my
laptop. I will try to do it in a different system and let you know the
result later.


>> I am not able to find the reason for this error. This error is occurring
>> only
>> when the PostgreSQL is started as a service only.
>>
>
> Did you use pg_ctl register/unregister to register different services.  Can
> you share the detail steps and OS version on which you saw this behaviour?

Operating system - windows 7
Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the problem)

1. Create two standard users in the system (test_user1 and test_user2)
2. Create two databases belongs each user listed above.
3. Now using pg_ctl register the services for the two users.
4. Provide logon permissions to these users to run the services by changing
service properties.
5. Now try to start the services, the second service fails with the
error message.
6. Error details can be found out in Event log viewer.

Regards,
Hari Babu
Fujitsu Australia



On Fri, Mar 11, 2016 at 5:21 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
> On Fri, Mar 11, 2016 at 12:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Okay, so one probable theory for such an error could be that when there is
> > already an object with same name exists, this API requests access to the
> > that existing object and found that it can't access it due to some reason.
> > On googling, I found some people suggesting to try by disabling UAC [1] on
> > your m/c, can you once try that to see what is the result (this experiment
> > is just to find out the actual reason of failure, rather than a permanent
> > change suggestion).
>
> Thanks for the details. Currently I am unable to change the UAC settings in my
> laptop. I will try to do it in a different system and let you know the
> result later.
>  
>
>
> >> I am not able to find the reason for this error. This error is occurring
> >> only
> >> when the PostgreSQL is started as a service only.
> >>
> >
> > Did you use pg_ctl register/unregister to register different services.  Can
> > you share the detail steps and OS version on which you saw this behaviour?
>
> Operating system - windows 7
> Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the problem)
>
> 1. Create two standard users in the system (test_user1 and test_user2)

I think one possibility is that one user is not able to access the object created by another user, if possible can you as well try with just one user (Have same user for both the services).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Mar 11, 2016 at 11:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Mar 11, 2016 at 5:21 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Fri, Mar 11, 2016 at 12:00 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> >
>> > Okay, so one probable theory for such an error could be that when there
>> > is
>> > already an object with same name exists, this API requests access to the
>> > that existing object and found that it can't access it due to some
>> > reason.
>> > On googling, I found some people suggesting to try by disabling UAC [1]
>> > on
>> > your m/c, can you once try that to see what is the result (this
>> > experiment
>> > is just to find out the actual reason of failure, rather than a
>> > permanent
>> > change suggestion).
>>
>> Thanks for the details. Currently I am unable to change the UAC settings
>> in my
>> laptop. I will try to do it in a different system and let you know the
>> result later.
>>
>>
>>
>> >> I am not able to find the reason for this error. This error is
>> >> occurring
>> >> only
>> >> when the PostgreSQL is started as a service only.
>> >>
>> >
>> > Did you use pg_ctl register/unregister to register different services.
>> > Can
>> > you share the detail steps and OS version on which you saw this
>> > behaviour?
>>
>> Operating system - windows 7
>> Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the
>> problem)
>>
>> 1. Create two standard users in the system (test_user1 and test_user2)
>
> I think one possibility is that one user is not able to access the object
> created by another user, if possible can you as well try with just one user
> (Have same user for both the services).

Yes, it is working as same user services. The main problem is, PostgreSQL
as a service for two different users in the same system is not working because
of same random getting generated for two services.

Regards,
Hari Babu
Fujitsu Australia



On Fri, Mar 11, 2016 at 5:21 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
> On Fri, Mar 11, 2016 at 12:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> >> I am not able to find the reason for this error. This error is occurring
> >> only
> >> when the PostgreSQL is started as a service only.
> >>
> >
> > Did you use pg_ctl register/unregister to register different services.  Can
> > you share the detail steps and OS version on which you saw this behaviour?
>
> Operating system - windows 7
> Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the problem)
>
> 1. Create two standard users in the system (test_user1 and test_user2)
> 2. Create two databases belongs each user listed above.
> 3. Now using pg_ctl register the services for the two users.
> 4. Provide logon permissions to these users to run the services by changing
> service properties.

Did you mean to say that you changed Log on as: Local System Account in service properties or something else?

> 5. Now try to start the services, the second service fails with the
> error message.
> 6. Error details can be found out in Event log viewer.
>

If I follow above steps and do as I mentioned for step-4, I am not able to reproduce the issue on Windows-7 m/c using code of HEAD.

> Yes, it is working as same user services. The main problem is, PostgreSQL
> as a service for two different users in the same system is not working because
> of same random getting generated for two services.
>

I am not sure why you think same random number is problem, as mentioned above, even if the dsm name is same due to same random number, the code has logic to process it appropriately (regenerate the name of dsm).  Having said that, I don't mean to say that we shouldn't have logic to generate unique name and I think we might want to add data dir path to name generation as we do for main shared memory, however it is better to first completely understand the underneath issue.

If I understand correctly, here the problem is due to the reason that the second user doesn't have appropriate access rights to access the object created by first user.  On reading the documentation of CreateFileMapping(), it seems that user should have SeCreateGlobalPrivilege privilege to create an object in Global namespace.  Can you once try giving that privilege to the users created by you?  To give this privilege, go to control panel->System And Security->Administrative Tools->Local Security Policy->Local Policies->User Rights Assignment, in the right window, select Create global objects and double-click the same and add the newly created users. Rerun your test after these steps.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 14, 2016 at 4:51 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Mar 11, 2016 at 5:21 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Fri, Mar 11, 2016 at 12:00 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>>
>>
>> >> I am not able to find the reason for this error. This error is
>> >> occurring
>> >> only
>> >> when the PostgreSQL is started as a service only.
>> >>
>> >
>> > Did you use pg_ctl register/unregister to register different services.
>> > Can
>> > you share the detail steps and OS version on which you saw this
>> > behaviour?
>>
>> Operating system - windows 7
>> Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the
>> problem)
>>
>> 1. Create two standard users in the system (test_user1 and test_user2)
>> 2. Create two databases belongs each user listed above.
>> 3. Now using pg_ctl register the services for the two users.
>> 4. Provide logon permissions to these users to run the services by
>> changing
>> service properties.
>
> Did you mean to say that you changed Log on as: Local System Account in
> service properties or something else?

No. Not as local service. The user should be the new standard user
that is created
in the system.

>> 5. Now try to start the services, the second service fails with the
>> error message.
>> 6. Error details can be found out in Event log viewer.
>>
>
> If I follow above steps and do as I mentioned for step-4, I am not able to
> reproduce the issue on Windows-7 m/c using code of HEAD.

I am not able to start a service with HEAD code in the same machine, where
as it is working for 9.5. I will look into it later and update it.

>> Yes, it is working as same user services. The main problem is, PostgreSQL
>> as a service for two different users in the same system is not working
>> because
>> of same random getting generated for two services.
>>
>
> I am not sure why you think same random number is problem, as mentioned
> above, even if the dsm name is same due to same random number, the code has
> logic to process it appropriately (regenerate the name of dsm).  Having said
> that, I don't mean to say that we shouldn't have logic to generate unique
> name and I think we might want to add data dir path to name generation as we
> do for main shared memory, however it is better to first completely
> understand the underneath issue.

Yes, same random number generation is not the problem. In windows apart
from EEXIST error, EACCES also needs to be validated and returned for
new random number generation, instead of throwing an error.

> If I understand correctly, here the problem is due to the reason that the
> second user doesn't have appropriate access rights to access the object
> created by first user.  On reading the documentation of CreateFileMapping(),
> it seems that user should have SeCreateGlobalPrivilege privilege to create
> an object in Global namespace.  Can you once try giving that privilege to
> the users created by you?  To give this privilege, go to control
> panel->System And Security->Administrative Tools->Local Security
> Policy->Local Policies->User Rights Assignment, in the right window, select
> Create global objects and double-click the same and add the newly created
> users. Rerun your test after these steps.

Thanks for providing details. I added the two newly created objects into
create global objects, still the same error occurred.


Regards,
Hari Babu
Fujitsu Australia



On Mon, Mar 21, 2016 at 6:16 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
> On Mon, Mar 14, 2016 at 4:51 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> Operating system - windows 7
> >> Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the
> >> problem)
> >>
> >> 1. Create two standard users in the system (test_user1 and test_user2)
> >> 2. Create two databases belongs each user listed above.
> >> 3. Now using pg_ctl register the services for the two users.
> >> 4. Provide logon permissions to these users to run the services by
> >> changing
> >> service properties.
> >
> > Did you mean to say that you changed Log on as: Local System Account in
> > service properties or something else?
>
> No. Not as local service. The user should be the new standard user
> that is created
> in the system.
>

So what do you exactly mean by "Provide logon permissions to these users", can you describe in detail what exactly you have done to give those permissions.  If I try to do with a new user, it gives me error "could not open service manager"  at start of service.

>
> >> 5. Now try to start the services, the second service fails with the
> >> error message.
> >> 6. Error details can be found out in Event log viewer.
> >>
> >
> > If I follow above steps and do as I mentioned for step-4, I am not able to
> > reproduce the issue on Windows-7 m/c using code of HEAD.
>
> I am not able to start a service with HEAD code in the same machine, where
> as it is working for 9.5. I will look into it later and update it.
>

Okay.  But it is confusing for me because you told earlier that you are able to reproduce problem in 9.5.

> >> Yes, it is working as same user services. The main problem is, PostgreSQL
> >> as a service for two different users in the same system is not working
> >> because
> >> of same random getting generated for two services.
> >>
> >
> > I am not sure why you think same random number is problem, as mentioned
> > above, even if the dsm name is same due to same random number, the code has
> > logic to process it appropriately (regenerate the name of dsm).  Having said
> > that, I don't mean to say that we shouldn't have logic to generate unique
> > name and I think we might want to add data dir path to name generation as we
> > do for main shared memory, however it is better to first completely
> > understand the underneath issue.
>
> Yes, same random number generation is not the problem. In windows apart
> from EEXIST error, EACCES also needs to be validated and returned for
> new random number generation, instead of throwing an error.
>

Doing the same handling for EACCES doesn't seem to be sane because if EACCES came for reason other than duplicate dsm name, then we want to report the error instead of trying to regenerate the name.  I think here fix should be to append data_dir path as we do for main shared memory. 

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 22, 2016 at 2:19 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Mar 21, 2016 at 6:16 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Mon, Mar 14, 2016 at 4:51 PM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> >> Operating system - windows 7
>> >> Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the
>> >> problem)
>> >>
>> >> 1. Create two standard users in the system (test_user1 and test_user2)
>> >> 2. Create two databases belongs each user listed above.
>> >> 3. Now using pg_ctl register the services for the two users.
>> >> 4. Provide logon permissions to these users to run the services by
>> >> changing
>> >> service properties.
>> >
>> > Did you mean to say that you changed Log on as: Local System Account in
>> > service properties or something else?
>>
>> No. Not as local service. The user should be the new standard user
>> that is created
>> in the system.
>>
>
> So what do you exactly mean by "Provide logon permissions to these users",
> can you describe in detail what exactly you have done to give those
> permissions.  If I try to do with a new user, it gives me error "could not
> open service manager"  at start of service.

1. Start the cmd with administrator user and add the new postgresql service
with a standard user that is created.
2. Start the services window with the user having administrator privileges and
go to the corresponding added service.
3. Right click on the service provides an properties option.
4. In the properties, there is an logon tab. Click it
5. Provide the password for the new user that is used for creating the service.
6. This adds the user to log on permissions.


>>
>> >> 5. Now try to start the services, the second service fails with the
>> >> error message.
>> >> 6. Error details can be found out in Event log viewer.
>> >>
>> >
>> > If I follow above steps and do as I mentioned for step-4, I am not able
>> > to
>> > reproduce the issue on Windows-7 m/c using code of HEAD.
>>
>> I am not able to start a service with HEAD code in the same machine, where
>> as it is working for 9.5. I will look into it later and update it.
>>
>
> Okay.  But it is confusing for me because you told earlier that you are able
> to reproduce problem in 9.5.

I am able to reproduce the problem with 9.5 binary. I am getting Access Denied
problem when i try to start the 9.6 binary service with the local user.


>> >> Yes, it is working as same user services. The main problem is,
>> >> PostgreSQL
>> >> as a service for two different users in the same system is not working
>> >> because
>> >> of same random getting generated for two services.
>> >>
>> >
>> > I am not sure why you think same random number is problem, as mentioned
>> > above, even if the dsm name is same due to same random number, the code
>> > has
>> > logic to process it appropriately (regenerate the name of dsm).  Having
>> > said
>> > that, I don't mean to say that we shouldn't have logic to generate
>> > unique
>> > name and I think we might want to add data dir path to name generation
>> > as we
>> > do for main shared memory, however it is better to first completely
>> > understand the underneath issue.
>>
>> Yes, same random number generation is not the problem. In windows apart
>> from EEXIST error, EACCES also needs to be validated and returned for
>> new random number generation, instead of throwing an error.
>>
>
> Doing the same handling for EACCES doesn't seem to be sane because if EACCES
> came for reason other than duplicate dsm name, then we want to report the
> error instead of trying to regenerate the name.  I think here fix should be
> to append data_dir path as we do for main shared memory.

Yes, EACCES may be possible other than duplicate dsm name.


Regards,
Hari Babu
Fujitsu Australia



On 21 March 2016 at 20:46, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
 

No. Not as local service. The user should be the new standard user
that is created
in the system.

Which was done how, exactly?

Commands run? Steps taken?

PostgreSQL drops privileges once it starts, so it's actually pretty OK to run it as an admin user, NetworkService, etc.

Otherwise you should really make a service user, not a regular user account. Don't allow the account to log in interactively, do allow it to log in as a service. Don't make it a domain account unless you need domain integration for SSPI etc.

The best option on newer Windows should be to use a managed service account (https://technet.microsoft.com/en-us/library/ff641731(v=ws.10).aspx) or virtual account. Eventually the installer should switch to doing that automatically instead of using NetworkService.
 
>> 5. Now try to start the services, the second service fails with the
>> error message.
>> 6. Error details can be found out in Event log viewer.

Can you get a Process Monitor trace of startup and check exactly where it's getting access denied, doing what?

You may have to dig through a *lot* of output to find it.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 22, 2016 at 9:13 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
> On Tue, Mar 22, 2016 at 2:19 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, Mar 21, 2016 at 6:16 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
> > wrote:
> >>
> >> On Mon, Mar 14, 2016 at 4:51 PM, Amit Kapila <amit.kapila16@gmail.com>
> >> wrote:
> >> >> Operating system - windows 7
> >> >> Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the
> >> >> problem)
> >> >>
> >> >> 1. Create two standard users in the system (test_user1 and test_user2)
> >> >> 2. Create two databases belongs each user listed above.
> >> >> 3. Now using pg_ctl register the services for the two users.
> >> >> 4. Provide logon permissions to these users to run the services by
> >> >> changing
> >> >> service properties.
> >> >
> >> > Did you mean to say that you changed Log on as: Local System Account in
> >> > service properties or something else?
> >>
> >> No. Not as local service. The user should be the new standard user
> >> that is created
> >> in the system.
> >>
> >
> > So what do you exactly mean by "Provide logon permissions to these users",
> > can you describe in detail what exactly you have done to give those
> > permissions.  If I try to do with a new user, it gives me error "could not
> > open service manager"  at start of service.
>
> 1. Start the cmd with administrator user and add the new postgresql service
> with a standard user that is created.
> 2. Start the services window with the user having administrator privileges and
> go to the corresponding added service.
> 3. Right click on the service provides an properties option.
> 4. In the properties, there is an logon tab. Click it
> 5. Provide the password for the new user that is used for creating the service.
> 6. This adds the user to log on permissions.
>

I am also able to reproduce the issue with these steps.
 
> >>
> >> Yes, same random number generation is not the problem. In windows apart
> >> from EEXIST error, EACCES also needs to be validated and returned for
> >> new random number generation, instead of throwing an error.
> >>
> >
> > Doing the same handling for EACCES doesn't seem to be sane because if EACCES
> > came for reason other than duplicate dsm name, then we want to report the
> > error instead of trying to regenerate the name.  I think here fix should be
> > to append data_dir path as we do for main shared memory.
>
> Yes, EACCES may be possible other than duplicate dsm name.
>

So as far as I can see there are two ways to resolve this issue, one is to retry generation of dsm name if CreateFileMapping returns EACCES and second is to append data_dir name to dsm name as the same is done for main shared memory, that will avoid the error to occur.  First approach has minor flaw that if CreateFileMapping returns EACCES due to reason other then duplicate dsm name which I am not sure is possible to identify, then we should report error instead try to regenerate the name

Robert and or others, can you share your opinion on what is the best way to proceed for this issue.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 22, 2016 at 1:56 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> So as far as I can see there are two ways to resolve this issue, one is to
> retry generation of dsm name if CreateFileMapping returns EACCES and second
> is to append data_dir name to dsm name as the same is done for main shared
> memory, that will avoid the error to occur.  First approach has minor flaw
> that if CreateFileMapping returns EACCES due to reason other then duplicate
> dsm name which I am not sure is possible to identify, then we should report
> error instead try to regenerate the name
>
> Robert and or others, can you share your opinion on what is the best way to
> proceed for this issue.

For my 2c here, the approach using GetSharedMemName to identify the
origin of a dynamic shared memory segment looks more solid in terms of
robustness and collision handling. Retrying a segment is never going
to be completely water-proof.
--
Michael



On Mon, May 9, 2016 at 4:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Mar 22, 2016 at 1:56 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> So as far as I can see there are two ways to resolve this issue, one is to
>> retry generation of dsm name if CreateFileMapping returns EACCES and second
>> is to append data_dir name to dsm name as the same is done for main shared
>> memory, that will avoid the error to occur.  First approach has minor flaw
>> that if CreateFileMapping returns EACCES due to reason other then duplicate
>> dsm name which I am not sure is possible to identify, then we should report
>> error instead try to regenerate the name
>>
>> Robert and or others, can you share your opinion on what is the best way to
>> proceed for this issue.
>
> For my 2c here, the approach using GetSharedMemName to identify the
> origin of a dynamic shared memory segment looks more solid in terms of
> robustness and collision handling. Retrying a segment is never going
> to be completely water-proof.

So, I have been hacking that a bit more and finished with the
attached, which seem to address the issue here. Some of the code paths
of dsm_impl.c are done in such a way that we can fail a dsm allocation
and still continue to move on. I have taken that into account by using
palloc_extended with NO_OOM. palloc instead of malloc is a good fit
anyway to prevent leaks in error code paths.

Thoughts?
--
Michael

Attachment
On Tue, Mar 22, 2016 at 12:56 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >> Yes, same random number generation is not the problem. In windows apart
>> >> from EEXIST error, EACCES also needs to be validated and returned for
>> >> new random number generation, instead of throwing an error.
>> >
>> > Doing the same handling for EACCES doesn't seem to be sane because if
>> > EACCES
>> > came for reason other than duplicate dsm name, then we want to report
>> > the
>> > error instead of trying to regenerate the name.  I think here fix should
>> > be
>> > to append data_dir path as we do for main shared memory.
>>
>> Yes, EACCES may be possible other than duplicate dsm name.
>
> So as far as I can see there are two ways to resolve this issue, one is to
> retry generation of dsm name if CreateFileMapping returns EACCES and second
> is to append data_dir name to dsm name as the same is done for main shared
> memory, that will avoid the error to occur.  First approach has minor flaw
> that if CreateFileMapping returns EACCES due to reason other then duplicate
> dsm name which I am not sure is possible to identify, then we should report
> error instead try to regenerate the name
>
> Robert and or others, can you share your opinion on what is the best way to
> proceed for this issue.

I think we should retry on EACCES.  Possibly we should do other things
too, but at least that.  It completely misses the point of retrying on
EEXIST if we don't retry on other error codes that can also be
generated when the segment already exists.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Mon, May 9, 2016 at 3:17 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Mar 22, 2016 at 1:56 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> So as far as I can see there are two ways to resolve this issue, one is to
>> retry generation of dsm name if CreateFileMapping returns EACCES and second
>> is to append data_dir name to dsm name as the same is done for main shared
>> memory, that will avoid the error to occur.  First approach has minor flaw
>> that if CreateFileMapping returns EACCES due to reason other then duplicate
>> dsm name which I am not sure is possible to identify, then we should report
>> error instead try to regenerate the name
>>
>> Robert and or others, can you share your opinion on what is the best way to
>> proceed for this issue.
>
> For my 2c here, the approach using GetSharedMemName to identify the
> origin of a dynamic shared memory segment looks more solid in terms of
> robustness and collision handling. Retrying a segment is never going
> to be completely water-proof.

Why not?  I mean, there are ~2^32 possible segment handles, and not
all that many of them can be in use.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Sat, May 14, 2016 at 7:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Mar 22, 2016 at 12:56 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> >> Yes, same random number generation is not the problem. In windows apart
> >> >> from EEXIST error, EACCES also needs to be validated and returned for
> >> >> new random number generation, instead of throwing an error.
> >> >
> >> > Doing the same handling for EACCES doesn't seem to be sane because if
> >> > EACCES
> >> > came for reason other than duplicate dsm name, then we want to report
> >> > the
> >> > error instead of trying to regenerate the name.  I think here fix should
> >> > be
> >> > to append data_dir path as we do for main shared memory.
> >>
> >> Yes, EACCES may be possible other than duplicate dsm name.
> >
> > So as far as I can see there are two ways to resolve this issue, one is to
> > retry generation of dsm name if CreateFileMapping returns EACCES and second
> > is to append data_dir name to dsm name as the same is done for main shared
> > memory, that will avoid the error to occur.  First approach has minor flaw
> > that if CreateFileMapping returns EACCES due to reason other then duplicate
> > dsm name which I am not sure is possible to identify, then we should report
> > error instead try to regenerate the name
> >
> > Robert and or others, can you share your opinion on what is the best way to
> > proceed for this issue.
>
> I think we should retry on EACCES.  Possibly we should do other things
> too, but at least that.  It completely misses the point of retrying on
> EEXIST if we don't retry on other error codes that can also be
> generated when the segment already exists.
>

Sounds sensible, but if we want to that route, shall we have some mechanism such that if retrying it for 10 times (10 is somewhat arbitrary, but we retry 10 times in PGSharedMemoryCreate, so may be there is some consistency) doesn't give us unique name and we are getting EACCES error, then just throw the error instead of more retries.  This is to ensure that if the API is returning EACCES due to reason other than duplicate handle, then we won't retry indefinitely.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sun, May 15, 2016 at 3:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, May 14, 2016 at 7:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Tue, Mar 22, 2016 at 12:56 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> >> >> Yes, same random number generation is not the problem. In windows
>> >> >> apart
>> >> >> from EEXIST error, EACCES also needs to be validated and returned
>> >> >> for
>> >> >> new random number generation, instead of throwing an error.
>> >> >
>> >> > Doing the same handling for EACCES doesn't seem to be sane because if
>> >> > EACCES
>> >> > came for reason other than duplicate dsm name, then we want to report
>> >> > the
>> >> > error instead of trying to regenerate the name.  I think here fix
>> >> > should
>> >> > be
>> >> > to append data_dir path as we do for main shared memory.
>> >>
>> >> Yes, EACCES may be possible other than duplicate dsm name.
>> >
>> > So as far as I can see there are two ways to resolve this issue, one is
>> > to
>> > retry generation of dsm name if CreateFileMapping returns EACCES and
>> > second
>> > is to append data_dir name to dsm name as the same is done for main
>> > shared
>> > memory, that will avoid the error to occur.  First approach has minor
>> > flaw
>> > that if CreateFileMapping returns EACCES due to reason other then
>> > duplicate
>> > dsm name which I am not sure is possible to identify, then we should
>> > report
>> > error instead try to regenerate the name
>> >
>> > Robert and or others, can you share your opinion on what is the best way
>> > to
>> > proceed for this issue.
>>
>> I think we should retry on EACCES.  Possibly we should do other things
>> too, but at least that.  It completely misses the point of retrying on
>> EEXIST if we don't retry on other error codes that can also be
>> generated when the segment already exists.
>>

Well, if we don't care about segment uniqueness more than that... I
guess I will just throw the white flag. By retrying with a new segment
name at each loop, there is no risk to retry infinitely and remain
stuck, so let's just use something like
1444921511.3661.13.camel@postgrespro.ru as a fix and call that a deal
(with a fatter comment). CreateFileMapping would return a handle only
with ERROR_ALREADY_EXISTS per the docs.

> Sounds sensible, but if we want to that route, shall we have some mechanism
> such that if retrying it for 10 times (10 is somewhat arbitrary, but we
> retry 10 times in PGSharedMemoryCreate, so may be there is some consistency)
> doesn't give us unique name and we are getting EACCES error, then just throw
> the error instead of more retries.  This is to ensure that if the API is
> returning EACCES due to reason other than duplicate handle, then we won't
> retry indefinitely.

The logic in win32_shmem.c relies on the fact that a segment will be
recycled, and the retry is here because it may take time at OS level.
On top of that it relies on the segment names being unique across
systems. So it seems to me that it is not worth the complication to
duplicate that logic in the dsm implementation.
--
Michael



On Mon, May 16, 2016 at 9:45 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Sun, May 15, 2016 at 3:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Sounds sensible, but if we want to that route, shall we have some mechanism
> > such that if retrying it for 10 times (10 is somewhat arbitrary, but we
> > retry 10 times in PGSharedMemoryCreate, so may be there is some consistency)
> > doesn't give us unique name and we are getting EACCES error, then just throw
> > the error instead of more retries.  This is to ensure that if the API is
> > returning EACCES due to reason other than duplicate handle, then we won't
> > retry indefinitely.
>
> The logic in win32_shmem.c relies on the fact that a segment will be
> recycled, and the retry is here because it may take time at OS level.
> On top of that it relies on the segment names being unique across
> systems. So it seems to me that it is not worth the complication to
> duplicate that logic in the dsm implementation.

If we don't do retry for fixed number of times, then how will we handle the case if EACCES is due to the reason other than duplicate handle?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, May 17, 2016 at 4:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, May 16, 2016 at 9:45 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>> On Sun, May 15, 2016 at 3:34 PM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > Sounds sensible, but if we want to that route, shall we have some
>> > mechanism
>> > such that if retrying it for 10 times (10 is somewhat arbitrary, but we
>> > retry 10 times in PGSharedMemoryCreate, so may be there is some
>> > consistency)
>> > doesn't give us unique name and we are getting EACCES error, then just
>> > throw
>> > the error instead of more retries.  This is to ensure that if the API is
>> > returning EACCES due to reason other than duplicate handle, then we
>> > won't
>> > retry indefinitely.
>>
>> The logic in win32_shmem.c relies on the fact that a segment will be
>> recycled, and the retry is here because it may take time at OS level.
>> On top of that it relies on the segment names being unique across
>> systems. So it seems to me that it is not worth the complication to
>> duplicate that logic in the dsm implementation.
>
> If we don't do retry for fixed number of times, then how will we handle the
> case if EACCES is due to the reason other than duplicate handle?

EACCES is a bit too low-level... I had in mind to check GetLastError
with only ERROR_ACCESS_DENIED, and retry only in this case, which is
the case where one postmaster is trying to access the segment of
another.
--
Michael



On Tue, May 17, 2016 at 6:01 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> EACCES is a bit too low-level... I had in mind to check GetLastError
> with only ERROR_ACCESS_DENIED, and retry only in this case, which is
> the case where one postmaster is trying to access the segment of
> another.

s/low/high/.
--
Michael



On Tue, May 17, 2016 at 2:31 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Tue, May 17, 2016 at 4:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, May 16, 2016 at 9:45 AM, Michael Paquier <michael.paquier@gmail.com>
> > wrote:
> >>
> >> On Sun, May 15, 2016 at 3:34 PM, Amit Kapila <amit.kapila16@gmail.com>
> >> wrote:
> >> > Sounds sensible, but if we want to that route, shall we have some
> >> > mechanism
> >> > such that if retrying it for 10 times (10 is somewhat arbitrary, but we
> >> > retry 10 times in PGSharedMemoryCreate, so may be there is some
> >> > consistency)
> >> > doesn't give us unique name and we are getting EACCES error, then just
> >> > throw
> >> > the error instead of more retries.  This is to ensure that if the API is
> >> > returning EACCES due to reason other than duplicate handle, then we
> >> > won't
> >> > retry indefinitely.
> >>
> >> The logic in win32_shmem.c relies on the fact that a segment will be
> >> recycled, and the retry is here because it may take time at OS level.
> >> On top of that it relies on the segment names being unique across
> >> systems. So it seems to me that it is not worth the complication to
> >> duplicate that logic in the dsm implementation.
> >
> > If we don't do retry for fixed number of times, then how will we handle the
> > case if EACCES is due to the reason other than duplicate handle?
>
> EACCES is a bit too low-level... I had in mind to check GetLastError
> with only ERROR_ACCESS_DENIED, and retry only in this case, which is
> the case where one postmaster is trying to access the segment of
> another.
>

Okay, attached patch just does that and I have verified that it allows to start multiple services in windows.  In off list discussion with Robert, he suggested not to complicate the patch by retrying for fixed number of times as there is no proof that ERROR_ACCESS_DENIED can occur due to any other reason in this code path.  This patch is based on Kyotaro san's patch posted upthread with just minor changes in comments and indentation.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, May 25, 2016 at 12:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, May 17, 2016 at 2:31 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>> On Tue, May 17, 2016 at 4:16 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > On Mon, May 16, 2016 at 9:45 AM, Michael Paquier
>> > <michael.paquier@gmail.com>
>> > wrote:
>> >>
>> >> On Sun, May 15, 2016 at 3:34 PM, Amit Kapila <amit.kapila16@gmail.com>
>> >> wrote:
>> >> > Sounds sensible, but if we want to that route, shall we have some
>> >> > mechanism
>> >> > such that if retrying it for 10 times (10 is somewhat arbitrary, but
>> >> > we
>> >> > retry 10 times in PGSharedMemoryCreate, so may be there is some
>> >> > consistency)
>> >> > doesn't give us unique name and we are getting EACCES error, then
>> >> > just
>> >> > throw
>> >> > the error instead of more retries.  This is to ensure that if the API
>> >> > is
>> >> > returning EACCES due to reason other than duplicate handle, then we
>> >> > won't
>> >> > retry indefinitely.
>> >>
>> >> The logic in win32_shmem.c relies on the fact that a segment will be
>> >> recycled, and the retry is here because it may take time at OS level.
>> >> On top of that it relies on the segment names being unique across
>> >> systems. So it seems to me that it is not worth the complication to
>> >> duplicate that logic in the dsm implementation.
>> >
>> > If we don't do retry for fixed number of times, then how will we handle
>> > the
>> > case if EACCES is due to the reason other than duplicate handle?
>>
>> EACCES is a bit too low-level... I had in mind to check GetLastError
>> with only ERROR_ACCESS_DENIED, and retry only in this case, which is
>> the case where one postmaster is trying to access the segment of
>> another.
>>
>
> Okay, attached patch just does that and I have verified that it allows to
> start multiple services in windows.  In off list discussion with Robert, he
> suggested not to complicate the patch by retrying for fixed number of times
> as there is no proof that ERROR_ACCESS_DENIED can occur due to any other
> reason in this code path.  This patch is based on Kyotaro san's patch posted
> upthread with just minor changes in comments and indentation.

Thanks for catching Robert and getting confirmation on the matter. In
the same line of thoughts, I think as well that it is definitely not
worth complicating the retry logic in dsm.c, but you knew that already
per last week's discussion.

Regarding the patch, this looks good to me.
--
Michael



On Wed, May 25, 2016 at 9:44 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Wed, May 25, 2016 at 12:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Okay, attached patch just does that and I have verified that it allows to
> > start multiple services in windows.  In off list discussion with Robert, he
> > suggested not to complicate the patch by retrying for fixed number of times
> > as there is no proof that ERROR_ACCESS_DENIED can occur due to any other
> > reason in this code path.  This patch is based on Kyotaro san's patch posted
> > upthread with just minor changes in comments and indentation.
>
> Thanks for catching Robert and getting confirmation on the matter. In
> the same line of thoughts, I think as well that it is definitely not
> worth complicating the retry logic in dsm.c, but you knew that already
> per last week's discussion.
>
> Regarding the patch, this looks good to me.
>

Thanks for reviewing the patch.  I have added the entry for this patch in next CF (https://commitfest.postgresql.org/10/636/),  feel free to mark it as Ready for committer if you think patch is good.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, May 26, 2016 at 3:01 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, May 25, 2016 at 9:44 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>> On Wed, May 25, 2016 at 12:11 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> >
>> > Okay, attached patch just does that and I have verified that it allows
>> > to
>> > start multiple services in windows.  In off list discussion with Robert,
>> > he
>> > suggested not to complicate the patch by retrying for fixed number of
>> > times
>> > as there is no proof that ERROR_ACCESS_DENIED can occur due to any other
>> > reason in this code path.  This patch is based on Kyotaro san's patch
>> > posted
>> > upthread with just minor changes in comments and indentation.
>>
>> Thanks for catching Robert and getting confirmation on the matter. In
>> the same line of thoughts, I think as well that it is definitely not
>> worth complicating the retry logic in dsm.c, but you knew that already
>> per last week's discussion.
>>
>> Regarding the patch, this looks good to me.
>>
>
> Thanks for reviewing the patch.  I have added the entry for this patch in
> next CF (https://commitfest.postgresql.org/10/636/),  feel free to mark it
> as Ready for committer if you think patch is good.

Yeah, it is definitely better to register it. And I have switched the
patch as ready for committer just now.
--
Michael



On Thu, May 26, 2016 at 7:44 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> Thanks for reviewing the patch.  I have added the entry for this patch in
>> next CF (https://commitfest.postgresql.org/10/636/),  feel free to mark it
>> as Ready for committer if you think patch is good.
>
> Yeah, it is definitely better to register it. And I have switched the
> patch as ready for committer just now.

Committed and back-patched to 9.4, where DSM was introduced.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Oct 15, 2015 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Oct 15, 2015 at 8:35 PM, Dmitry Vasilyev <d.vasilyev@postgrespro.ru>
> wrote:
>>
>> I think that function dsm_impl_windows() with EACCES error should not
>> do ereport() with FATAL level. It works, but it is likely to make an
>> infinite loop if the user will receive EACCES error.
>>
>
> Currently we are using error level as ERROR for creating dsm during
> postmaster startup which is not right and rather we should use error
> level as LOG.  Can you please try with the attached patch and see
> if the issue is fixed for you.
>
> Another some what related point is currently we are using random()
> function to ensure a unique name for dsm and it seems to me that
> it is always going to generate same number on first invocation (at least
> thats what happening on windows) due to which you are seeing the
> error.

Yeah, random() is the wrong thing.  It should use PostmasterRandom().
Fixed to do that instead.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> Yeah, random() is the wrong thing.  It should use PostmasterRandom().
> Fixed to do that instead.

I am not very happy about this patch; have you considered the security
implications of what you just did?  If you haven't, I'll tell you:
you just made the postmaster's selection of "random" cancel keys and
password salts a lot more predictable.  Formerly, the srandom() seed
for those depended on both the postmaster start time and the time of
the first connection request, but this change removes the first
connection request from the equation.  If you know the postmaster start
time --- which we will happily tell any asker --- it will not take too
many trials to find the seed that's in use.

I'd be the first to agree that this point is inadequately documented
in the code, but PostmasterRandom should be reserved for its existing
security-related uses, not exposed to the world for (ahem) random other
uses.

            regards, tom lane



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 26, 2016 at 7:44 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Yeah, it is definitely better to register it. And I have switched the
>> patch as ready for committer just now.

> Committed and back-patched to 9.4, where DSM was introduced.

ISTM both the previous coding and this version can fail for no good
reason, that is what if GetLastError() happens to return one of these
error codes as a result of some unrelated failure from before this
subroutine is entered?  That is, wouldn't it be a good idea to
do SetLastError(0) before calling CreateFileMapping?  Or is the latter
guaranteed to do that on success?  I don't see that stated in its
man page.

            regards, tom lane



On Tue, Sep 20, 2016 at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Yeah, random() is the wrong thing.  It should use PostmasterRandom().
>> Fixed to do that instead.
>
> I am not very happy about this patch; have you considered the security
> implications of what you just did?

Hmm. No.

> If you haven't, I'll tell you:
> you just made the postmaster's selection of "random" cancel keys and
> password salts a lot more predictable.  Formerly, the srandom() seed
> for those depended on both the postmaster start time and the time of
> the first connection request, but this change removes the first
> connection request from the equation.  If you know the postmaster start
> time --- which we will happily tell any asker --- it will not take too
> many trials to find the seed that's in use.

Realistically, in some large percentage of the real-world installs,
that's not going to take too many trials anyway.  People don't
generally start a postmaster so that they can NOT connect to it, and
there are plenty of real-world installations where you can count on
the first connection happening in well under 1s.  I'd suggest that if
you're relying on that time being a secret for anything very
security-critical, you're already in trouble.

> I'd be the first to agree that this point is inadequately documented
> in the code, but PostmasterRandom should be reserved for its existing
> security-related uses, not exposed to the world for (ahem) random other
> uses.

So, we could have dsm_postmaster_startup() seed the random number
generator itself, and then let PostmasterRandom() override the seed
later.  Like maybe:

struct timeval tv;
gettimeofday(&tv, NULL);
srandom(tv.tv_sec);
...
dsm_control_handle = random();

dsm_postmaster_startup() doesn't care very much about whether an
adversary can predict the chosen DSM control segment ID, but it
doesn't want to keep picking the same one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tue, Sep 20, 2016 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, May 26, 2016 at 7:44 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> Yeah, it is definitely better to register it. And I have switched the
>>> patch as ready for committer just now.
>
>> Committed and back-patched to 9.4, where DSM was introduced.
>
> ISTM both the previous coding and this version can fail for no good
> reason, that is what if GetLastError() happens to return one of these
> error codes as a result of some unrelated failure from before this
> subroutine is entered?  That is, wouldn't it be a good idea to
> do SetLastError(0) before calling CreateFileMapping?  Or is the latter
> guaranteed to do that on success?  I don't see that stated in its
> man page.

I'll leave it to people who know more about Windows than I do to opine
on that.  I suspect it's not too seriously broken because we've
managed to cut two (almost three) major releases since this code was
written without any complaints about that.  But there may well be
something that can be tightened up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tue, Sep 20, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, May 26, 2016 at 7:44 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> Yeah, it is definitely better to register it. And I have switched the
>>> patch as ready for committer just now.
>
>> Committed and back-patched to 9.4, where DSM was introduced.
>
> ISTM both the previous coding and this version can fail for no good
> reason, that is what if GetLastError() happens to return one of these
> error codes as a result of some unrelated failure from before this
> subroutine is entered?  That is, wouldn't it be a good idea to
> do SetLastError(0) before calling CreateFileMapping?
>

Yes, that seems like a good idea.  Do you need a patch with some
testing on windows environment?

>  Or is the latter
> guaranteed to do that on success?
>

I don't see any such guarantee from the msdn page and it appears from
GetLastError()/SetLastError() specs [1][2] that functions that set
last error code to zero on success, do mention it in their
documentation.


[1] - https://msdn.microsoft.com/en-us/library/windows/desktop/ms679360(v=vs.85).aspx
[2] - https://msdn.microsoft.com/en-us/library/windows/desktop/ms680627(v=vs.85).aspx



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Amit Kapila <amit.kapila16@gmail.com> writes:
> On Tue, Sep 20, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ISTM both the previous coding and this version can fail for no good
>> reason, that is what if GetLastError() happens to return one of these
>> error codes as a result of some unrelated failure from before this
>> subroutine is entered?  That is, wouldn't it be a good idea to
>> do SetLastError(0) before calling CreateFileMapping?

> Yes, that seems like a good idea.  Do you need a patch with some
> testing on windows environment?

Please; I can't test it.

            regards, tom lane



Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Sep 20, 2016 at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'd be the first to agree that this point is inadequately documented
>> in the code, but PostmasterRandom should be reserved for its existing
>> security-related uses, not exposed to the world for (ahem) random other
>> uses.

> So, we could have dsm_postmaster_startup() seed the random number
> generator itself, and then let PostmasterRandom() override the seed
> later.  Like maybe:

Works for me, at least as a temporary solution.  The disturbing thing
here is that this still only does what we want if dsm_postmaster_startup
happens before the first PostmasterRandom call --- which is OK today but
seems pretty fragile.  Still, redesigning PostmasterRandom's seeding
technique is not something to do right before 9.6 release.  Let's revert
the prior patch and do it as you have here:

> struct timeval tv;
> gettimeofday(&tv, NULL);
> srandom(tv.tv_sec);
> ...
> dsm_control_handle = random();

for the time being.

            regards, tom lane



On Fri, Sep 23, 2016 at 1:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Sep 20, 2016 at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'd be the first to agree that this point is inadequately documented
>>> in the code, but PostmasterRandom should be reserved for its existing
>>> security-related uses, not exposed to the world for (ahem) random other
>>> uses.
>
>> So, we could have dsm_postmaster_startup() seed the random number
>> generator itself, and then let PostmasterRandom() override the seed
>> later.  Like maybe:
>
> Works for me, at least as a temporary solution.  The disturbing thing
> here is that this still only does what we want if dsm_postmaster_startup
> happens before the first PostmasterRandom call --- which is OK today but
> seems pretty fragile.  Still, redesigning PostmasterRandom's seeding
> technique is not something to do right before 9.6 release.  Let's revert
> the prior patch and do it as you have here:
>
>> struct timeval tv;
>> gettimeofday(&tv, NULL);
>> srandom(tv.tv_sec);
>> ...
>> dsm_control_handle = random();
>
> for the time being.
>

Isn't it better if we use the same technique in dsm_create() as well
which uses random() for handle?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Thu, Sep 22, 2016 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> On Tue, Sep 20, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ISTM both the previous coding and this version can fail for no good
>>> reason, that is what if GetLastError() happens to return one of these
>>> error codes as a result of some unrelated failure from before this
>>> subroutine is entered?  That is, wouldn't it be a good idea to
>>> do SetLastError(0) before calling CreateFileMapping?
>
>> Yes, that seems like a good idea.  Do you need a patch with some
>> testing on windows environment?
>
> Please; I can't test it.
>

Attached patch tightens the error handling.  However, on debugging, I
found that CreateFileMapping() always set error code to 0 on success.
Basically, before calling CreateFileMapping(), I have set the error
code as 10 (SetLastError(10)) and then after CreateFileMapping(), it
sets the error code to 0 on success and appropriate error code on
failure.  I also verified that error code is set to 10 by calling
GetLastError() before CreateFileMapping().  Now, it is quite possible
that error code is set to 0 on success in my windows environment
(Win7) and doesn't work in some other environment.  In any case, if we
want to go ahead and don't want to rely on CreateFileMapping(), then
attached patch should suffice the need.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Fri, Sep 23, 2016 at 1:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> So, we could have dsm_postmaster_startup() seed the random number
>>> generator itself, and then let PostmasterRandom() override the seed
>>> later.  Like maybe:
>>
>> Works for me, at least as a temporary solution.

> Isn't it better if we use the same technique in dsm_create() as well
> which uses random() for handle?

dsm_create() is executed in backends, not the postmaster, and they
already have their own random seeds (cf BackendRun).  Adding more
srandom calls at random places will *not* make things better.

However, it's certainly true that dsm_postmaster_startup() might not
be the only place in postmaster startup that wants to use random().
What seems like the best thing after sleeping on it is to put
"srandom(time(NULL))" somewhere early in PostmasterMain, so that one
such call suffices for all uses during postmaster startup.

            regards, tom lane



Amit Kapila <amit.kapila16@gmail.com> writes:
> Attached patch tightens the error handling.  However, on debugging, I
> found that CreateFileMapping() always set error code to 0 on success.

Ah, interesting.

> Now, it is quite possible
> that error code is set to 0 on success in my windows environment
> (Win7) and doesn't work in some other environment.  In any case, if we
> want to go ahead and don't want to rely on CreateFileMapping(), then
> attached patch should suffice the need.

Yeah, seeing that this is not mentioned in MS' documentation I don't
think we should rely on it.  The patch looks fine to me, pushed.

            regards, tom lane



On Fri, Sep 23, 2016 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> Now, it is quite possible
>> that error code is set to 0 on success in my windows environment
>> (Win7) and doesn't work in some other environment.  In any case, if we
>> want to go ahead and don't want to rely on CreateFileMapping(), then
>> attached patch should suffice the need.
>
> Yeah, seeing that this is not mentioned in MS' documentation I don't
> think we should rely on it.  The patch looks fine to me, pushed.
>

Thanks.



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com