Thread: BUG #16259: Cannot Use "pg_ctl start -l logfile" on Clean Install on Windows Server 2012/2016

BUG #16259: Cannot Use "pg_ctl start -l logfile" on Clean Install on Windows Server 2012/2016

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16259
Logged by:          Jonathan Katz
Email address:      jkatz@postgresql.org
PostgreSQL version: 12.2
Operating system:   Windows Server 2012, Windows Server 2016
Description:

Hi,

When testing PostgreSQL 12.2 on Windows built using MSVC (this potentially
affects mingw too, though untested) on both Windows Server 2012 & 2016, we
are having issues with a newly generated log file on a fresh PostgreSQL
cluster when PostgreSQL is started with pg_ctl. If the log file was created
with a prior version of PostgreSQL, or if you manually create the log file
and grant the executing user the appropriate file access permissions,
everything works fine.

In other words, the issue occurs when passing a file name to the `-l` flag
where the log file does not already exist. The steps to reproduce this
are:

1. Install a PostgreSQL 12.2 Windows build on a Windows Server 2012 or a
Windows Server 2016 machien
2. Run initdb to initialize a new PostgreSQL cluster
3. Run `pg_ctl start -l logfile` where "logfile" does not currently exist

The resulting output should look like:

  Access is denied.
  waiting for server to start... stopped waiting
  pg_ctl: could not start server
  Examine the log output.

If you try to look at the logfile you will see that it is completely empty.
If you then change the permissions to grant the current user full access to
the logfile and run the above command again it will start without any
issues.

We think this was introduced in 0da33c76. One approach for a fix would be to
add a condition so the added code only executes if the file already exists,
and if the file does not exist it executes the "snprintf" command like
before.

Please let me know if you need any clarifications. Thanks!

Jonathan


PG Bug reporting form <noreply@postgresql.org> writes:
> When testing PostgreSQL 12.2 on Windows built using MSVC (this potentially
> affects mingw too, though untested) on both Windows Server 2012 & 2016, we
> are having issues with a newly generated log file on a fresh PostgreSQL
> cluster when PostgreSQL is started with pg_ctl. If the log file was created
> with a prior version of PostgreSQL, or if you manually create the log file
> and grant the executing user the appropriate file access permissions,
> everything works fine.
> In other words, the issue occurs when passing a file name to the `-l` flag
> where the log file does not already exist.

So ... why aren't we seeing that in the buildfarm?  The TAP tests
certainly do that, and I think the core regression tests do too.

> We think this was introduced in 0da33c76. One approach for a fix would be to
> add a condition so the added code only executes if the file already exists,
> and if the file does not exist it executes the "snprintf" command like
> before.

Without an explanation for why it's not failing everywhere, I don't
have a lot of faith in that being an improvement.

            regards, tom lane



+Heath

On 2/14/20 12:45 PM, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> When testing PostgreSQL 12.2 on Windows built using MSVC (this potentially
>> affects mingw too, though untested) on both Windows Server 2012 & 2016, we
>> are having issues with a newly generated log file on a fresh PostgreSQL
>> cluster when PostgreSQL is started with pg_ctl. If the log file was created
>> with a prior version of PostgreSQL, or if you manually create the log file
>> and grant the executing user the appropriate file access permissions,
>> everything works fine.
>> In other words, the issue occurs when passing a file name to the `-l` flag
>> where the log file does not already exist.
>
> So ... why aren't we seeing that in the buildfarm?  The TAP tests
> certainly do that, and I think the core regression tests do too.

Heath can speak to this one. He's currently looking at Dory to see why
that is.

Per my quick look, there's only one animal that has Windows Server 2016
(Dory) and none with 2012.

>> We think this was introduced in 0da33c76. One approach for a fix would be to
>> add a condition so the added code only executes if the file already exists,
>> and if the file does not exist it executes the "snprintf" command like
>> before.
>
> Without an explanation for why it's not failing everywhere, I don't
> have a lot of faith in that being an improvement.

Sure, mostly was to find a starting point.

Jonathan


Attachment
On Fri, Feb 14, 2020 at 1:04 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>
> +Heath
>
> On 2/14/20 12:45 PM, Tom Lane wrote:
> > PG Bug reporting form <noreply@postgresql.org> writes:
> >> When testing PostgreSQL 12.2 on Windows built using MSVC (this potentially
> >> affects mingw too, though untested) on both Windows Server 2012 & 2016, we
> >> are having issues with a newly generated log file on a fresh PostgreSQL
> >> cluster when PostgreSQL is started with pg_ctl. If the log file was created
> >> with a prior version of PostgreSQL, or if you manually create the log file
> >> and grant the executing user the appropriate file access permissions,
> >> everything works fine.
> >> In other words, the issue occurs when passing a file name to the `-l` flag
> >> where the log file does not already exist.
> >
> > So ... why aren't we seeing that in the buildfarm?  The TAP tests
> > certainly do that, and I think the core regression tests do too.
>
> Heath can speak to this one. He's currently looking at Dory to see why
> that is.
>
> Per my quick look, there's only one animal that has Windows Server 2016
> (Dory) and none with 2012.
>
> >> We think this was introduced in 0da33c76. One approach for a fix would be to
> >> add a condition so the added code only executes if the file already exists,
> >> and if the file does not exist it executes the "snprintf" command like
> >> before.
> >
> > Without an explanation for why it's not failing everywhere, I don't
> > have a lot of faith in that being an improvement.
>
> Sure, mostly was to find a starting point.
>
> Jonathan
>

All,
   So we had a lot of issues when setting up Dory to work with the
build farm and we had to
open up some permissions to fix these issues.  We added the "CREATOR
OWNER" to have
permissions on any files created under the build farm directory.  This
is to fix broken inheritance on
Windows Server machines when using a directory created at the root
(C:\, D:\, etc).  With this change, when the
logfile gets created by pg_ctl it will default to have the permissions
necessary to write to it.
   On a clean install of Server 2016 or Server 2012 this is not the
case and without modifying the default
permissions on an install done anywhere other than in the user's home
directory.  In the PG 12.1 release
we do not see this issue and the logfile is created with the necessary
permissions for the creator to write
to this file.
   We have now reproduced this issue using 3 different versions of
Server 2016 and a version of Server 2012.
If we change the location of the logfile to be anywhere inside of the
current user's home directory, then
we do not see the issue described.  However, if you try to put the
logfile into any directory outside of the
current user's home directory this issue will manifest.

Thanks,
   Heath



Heath Lord <heath.lord@crunchydata.com> writes:
>    So we had a lot of issues when setting up Dory to work with the
> build farm and we had to
> open up some permissions to fix these issues.  We added the "CREATOR
> OWNER" to have
> permissions on any files created under the build farm directory.  This
> is to fix broken inheritance on
> Windows Server machines when using a directory created at the root
> (C:\, D:\, etc).  With this change, when the
> logfile gets created by pg_ctl it will default to have the permissions
> necessary to write to it.
>    On a clean install of Server 2016 or Server 2012 this is not the
> case and without modifying the default
> permissions on an install done anywhere other than in the user's home
> directory.  In the PG 12.1 release
> we do not see this issue and the logfile is created with the necessary
> permissions for the creator to write
> to this file.
>    We have now reproduced this issue using 3 different versions of
> Server 2016 and a version of Server 2012.
> If we change the location of the logfile to be anywhere inside of the
> current user's home directory, then
> we do not see the issue described.  However, if you try to put the
> logfile into any directory outside of the
> current user's home directory this issue will manifest.

Hm.  So that maybe explains why dory isn't showing this problem,
but it's still pretty unclear exactly what the problem *is*,
or what we should do about it.  It sounds like 'fopen(..., "a")'
is creating the file with different permissions than CMD.EXE
would do, but why should that be?  And if that is what's happening,
why aren't lots of other places falling over in a similar fashion?
It's not to be credited that the postmaster logfile is magically
different from every other file we create.

I'm inclined to guess that maybe what we need is some different
parameters to the CreateFile call in pgwin32_open() (which is
what that fopen devolves to).  The lpSecurityAttributes parameter
looks suspicious --- I see that we're just setting

    sa.lpSecurityDescriptor = NULL;

and maybe that's not sufficient in whatever environment you're
testing in?  There's some stuff in

https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea

about inherited attributes, which I don't understand, but perhaps
that's related to what you're talking about.

            regards, tom lane



On Fri, Feb 14, 2020 at 2:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Heath Lord <heath.lord@crunchydata.com> writes:
> >    So we had a lot of issues when setting up Dory to work with the
> > build farm and we had to
> > open up some permissions to fix these issues.  We added the "CREATOR
> > OWNER" to have
> > permissions on any files created under the build farm directory.  This
> > is to fix broken inheritance on
> > Windows Server machines when using a directory created at the root
> > (C:\, D:\, etc).  With this change, when the
> > logfile gets created by pg_ctl it will default to have the permissions
> > necessary to write to it.
> >    On a clean install of Server 2016 or Server 2012 this is not the
> > case and without modifying the default
> > permissions on an install done anywhere other than in the user's home
> > directory.  In the PG 12.1 release
> > we do not see this issue and the logfile is created with the necessary
> > permissions for the creator to write
> > to this file.
> >    We have now reproduced this issue using 3 different versions of
> > Server 2016 and a version of Server 2012.
> > If we change the location of the logfile to be anywhere inside of the
> > current user's home directory, then
> > we do not see the issue described.  However, if you try to put the
> > logfile into any directory outside of the
> > current user's home directory this issue will manifest.
>
> Hm.  So that maybe explains why dory isn't showing this problem,
> but it's still pretty unclear exactly what the problem *is*,
> or what we should do about it.  It sounds like 'fopen(..., "a")'
> is creating the file with different permissions than CMD.EXE
> would do, but why should that be?  And if that is what's happening,
> why aren't lots of other places falling over in a similar fashion?
> It's not to be credited that the postmaster logfile is magically
> different from every other file we create.
>
> I'm inclined to guess that maybe what we need is some different
> parameters to the CreateFile call in pgwin32_open() (which is
> what that fopen devolves to).  The lpSecurityAttributes parameter
> looks suspicious --- I see that we're just setting
>
>         sa.lpSecurityDescriptor = NULL;
>
> and maybe that's not sufficient in whatever environment you're
> testing in?  There's some stuff in
>
> https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea
>
> about inherited attributes, which I don't understand, but perhaps
> that's related to what you're talking about.
>
>                         regards, tom lane

   Another interesting thing that we have found is that this issue
only manifests
itself if you are trying to launch the pg_ctl command as a privileged
user.  If you
try to run this command as a user who is not an Administrator, then we do not
see this issue.  This is another reason why Dory does not exhibit this behavior.
  With this finding, it has to be something with how pg_ctl is restricting the
user when running the postgres.exe command from the CMD.exe process,
so we are not trying to launch postgres as a privileged user.
  The fact that I am able to launch the postgres processes using the pg_ctl
command as a privileged user is what seems to be causing this issue.
The fopen appears to be running as the elevated Administrator, while the
postgres.exe is being run with restricted privileges.
   In the pg_ctl code, I see that when we spawn the CMD process we are
doing it by restricting the users privileges.  So, what I think is happening
is that the fopen is being run by the Administrator with full privileges,
and then the CMD process is being run with restricted privileges so it
can run the postgres process.  It does not have the necessary privileges
to write to the logfile that was created before.

-Heath



Heath Lord <heath.lord@crunchydata.com> writes:
>    Another interesting thing that we have found is that this issue
> only manifests
> itself if you are trying to launch the pg_ctl command as a privileged
> user.  If you
> try to run this command as a user who is not an Administrator, then we do not
> see this issue.  This is another reason why Dory does not exhibit this behavior.
>   With this finding, it has to be something with how pg_ctl is restricting the
> user when running the postgres.exe command from the CMD.exe process,
> so we are not trying to launch postgres as a privileged user.

OOOHHH ... I think the light just went on, then.  The reason we have
an issue is that we drop admin privileges (via CreateRestrictedProcess)
when we launch CMD.EXE, just below this.  So we are still admin when
the new code touches the log file, and that's why it's getting created
with different privileges from files that the postmaster (or CMD.EXE)
would create.

The idea I'd had for a fix is that we don't need to actually create the
file if it's not there yet.  All we need is to delay if it's there and
can't be opened.  So rather than open for writing, I think we could open
for reading, along the lines of

        FILE       *fd = fopen(log_file, "r");

        if (fd != NULL)     /* we may just ignore any error */
            fclose(fd);

The looping logic in pgwin32_open doesn't really care which kind
of access we're asking for.

BTW, we could make this at least slightly cheaper by using
open() not fopen().

Alexander, this was your patch to start with ... any thoughts?

            regards, tom lane



Hello Tom,
15.02.2020 00:46, Tom Lane wrote:
Heath Lord <heath.lord@crunchydata.com> writes:
   Another interesting thing that we have found is that this issue
only manifests
itself if you are trying to launch the pg_ctl command as a privileged
user.  If you
try to run this command as a user who is not an Administrator, then we do not
see this issue.  This is another reason why Dory does not exhibit this behavior. With this finding, it has to be something with how pg_ctl is restricting the
user when running the postgres.exe command from the CMD.exe process,
so we are not trying to launch postgres as a privileged user.
OOOHHH ... I think the light just went on, then.  The reason we have
an issue is that we drop admin privileges (via CreateRestrictedProcess)
when we launch CMD.EXE, just below this.  So we are still admin when
the new code touches the log file, and that's why it's getting created
with different privileges from files that the postmaster (or CMD.EXE)
would create.
I can easily reproduce this by opening "x64 Native Tools Command Prompt for VS 2019" with "Run as Administrator". When executing `vcregress taptest src/bin/pg_basebackup` in this command prompt, I get:
t/010_pg_basebackup.pl ... 1/106 Bailout called.  Further testing stopped:  pg_ctl start failed
FAILED--Further testing stopped: pg_ctl start failed
The idea I'd had for a fix is that we don't need to actually create the
file if it's not there yet.  All we need is to delay if it's there and
can't be opened.  So rather than open for writing, I think we could open
for reading, along the lines of
       FILE       *fd = fopen(log_file, "r");
       if (fd != NULL)     /* we may just ignore any error */           fclose(fd);

The looping logic in pgwin32_open doesn't really care which kind
of access we're asking for.
With the proposed change and the delay_after_unlink_pid.patch [1] applied I get the same fail on `vcregress taptest src/bin/pg_basebackup` as before 0da33c76:
waiting for server to shut down.... done
server stopped
waiting for server to start....The process cannot access the file because it is being used by another process.
 stopped waiting
pg_ctl: could not start server

So just the read permission is not sufficient for that check.
BTW, we could make this at least slightly cheaper by using
open() not fopen().

Alexander, this was your patch to start with ... any thoughts?
Yes, it was my omission. Please look at the improved patch.
It works for me in both command prompts and still gives a meaningful message on error.
 
In fact running taptests (pg_basebackup, pg_ctl) in an elevated prompt fails anyway, but later and with other errors, e.g.:

t/001_start_stop.pl .. ok
t/002_status.pl ...... ok
t/003_promote.pl ..... 5/12 Bailout called.  Further testing stopped:  pg_ctl start failed
FAILED--Further testing stopped: pg_ctl start failed
...
pg_ctl: could not start server
Examine the log output.
# pg_ctl start failed; logfile:
2020-02-15 07:16:05.588 MSK [1784] PANIC:  could not open file "global/pg_control": Permission denied
Bail out!  pg_ctl start failed

But I think it's not related to this bug.

[1] https://www.postgresql.org/message-id/e5179494-715e-f8a3-266b-0cf52adac8f4%40gmail.com

Best regards,
Alexander
Attachment
Alexander Lakhin <exclusion@gmail.com> writes:
>> Alexander, this was your patch to start with ... any thoughts?

> Yes, it was my omission. Please look at the improved patch.
> It works for me in both command prompts and still gives a meaningful
> message on error.

This seems basically reasonable but I'm not in a position to test it.
Heath, can you see if this improves matters for you?

            regards, tom lane



On Sat, Feb 15, 2020 at 12:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alexander Lakhin <exclusion@gmail.com> writes:
> >> Alexander, this was your patch to start with ... any thoughts?
>
> > Yes, it was my omission. Please look at the improved patch.
> > It works for me in both command prompts and still gives a meaningful
> > message on error.
>
> This seems basically reasonable but I'm not in a position to test it.
> Heath, can you see if this improves matters for you?
>
>                         regards, tom lane

   I am sorry it took me so long to get back to you on this, but I was finally
able to stand up a fresh VM, install and compile all of the Windows
dependencies, and compile everything from source with the provided
patch.
  This patch fixes the issue that I was seeing with the 12.2 release. I was
able to run through all of my normal test scenarios successfully using
the patch.
I appreciate you looking into this and getting me a patch so quickly.

-Heath



Heath Lord <heath.lord@crunchydata.com> writes:
> On Sat, Feb 15, 2020 at 12:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This seems basically reasonable but I'm not in a position to test it.
>> Heath, can you see if this improves matters for you?

>    I am sorry it took me so long to get back to you on this, but I was finally
> able to stand up a fresh VM, install and compile all of the Windows
> dependencies, and compile everything from source with the provided
> patch.
>   This patch fixes the issue that I was seeing with the 12.2 release. I was
> able to run through all of my normal test scenarios successfully using
> the patch.

Cool, pushed then.

            regards, tom lane