Thread: windows / initdb oddness

windows / initdb oddness

From
Andrew Dunstan
Date:
This took me hours to find  ...

On my Windows box, CVS HEAD gets an execution failure on "initdb foo" 
but succeeds happily with "initdb -D foo".

This is not true for REL8_1_STABLE, nor is it true for all Windows 
machines/environments, apparently, otherwise we would be seeing failures 
from the buildfarm member snake, since the buildfarm script does "initdb 
data".

My suspicion is that this is some side effect of the restricted-exec 
patch, but I don't have time to dig further right now.

cheers

andrew


Re: windows / initdb oddness

From
"Magnus Hagander"
Date:
> This took me hours to find  ...
>
> On my Windows box, CVS HEAD gets an execution failure on "initdb foo"
> but succeeds happily with "initdb -D foo".
>
> This is not true for REL8_1_STABLE, nor is it true for all
> Windows machines/environments, apparently, otherwise we would
> be seeing failures from the buildfarm member snake, since the
> buildfarm script does "initdb data".
>
> My suspicion is that this is some side effect of the
> restricted-exec patch, but I don't have time to dig further right now.

Um, so what error msg do you get when it's failing?

//Magnus


Re: windows / initdb oddness

From
Andrew Dunstan
Date:
Magnus Hagander wrote:

>>This took me hours to find  ...
>>
>>On my Windows box, CVS HEAD gets an execution failure on "initdb foo" 
>>but succeeds happily with "initdb -D foo".
>>
>>This is not true for REL8_1_STABLE, nor is it true for all 
>>Windows machines/environments, apparently, otherwise we would 
>>be seeing failures from the buildfarm member snake, since the 
>>buildfarm script does "initdb data".
>>
>>My suspicion is that this is some side effect of the 
>>restricted-exec patch, but I don't have time to dig further right now.
>>    
>>
>
>Um, so what error msg do you get when it's failing?
>  
>


I get a popup box that says:

initdb.exe has encountered a problem and needs to close. 
We are sorry for the inconvenience.

Clicking a link gives this info:

AppName: initdb.exe      AppVer: 8.2.0.6051      ModName: msvcrt.dll
ModVer: 7.0.2600.1106    Offset: 00033830

It wouldn't let me copy the rest of the info ;-(

cheers

andrew


Re: windows / initdb oddness

From
"Magnus Hagander"
Date:
> >>This took me hours to find  ...
> >>
> >>On my Windows box, CVS HEAD gets an execution failure on
> "initdb foo"
> >>but succeeds happily with "initdb -D foo".
> >>
> >>This is not true for REL8_1_STABLE, nor is it true for all Windows
> >>machines/environments, apparently, otherwise we would be seeing
> >>failures from the buildfarm member snake, since the
> buildfarm script
> >>does "initdb data".
> >>
> >>My suspicion is that this is some side effect of the
> restricted-exec
> >>patch, but I don't have time to dig further right now.
> >>
> >>
> >
> >Um, so what error msg do you get when it's failing?
> >
> >
>
>
> I get a popup box that says:
>
> initdb.exe has encountered a problem and needs to close.
> We are sorry for the inconvenience.
>
> Clicking a link gives this info:
>
> AppName: initdb.exe      AppVer: 8.2.0.6051      ModName: msvcrt.dll
> ModVer: 7.0.2600.1106    Offset: 00033830
>
> It wouldn't let me copy the rest of the info ;-(

Hm. Crap. (For those not familiar with this, that's a coredump without a
core:-P)

Does it give you an error code? (Nevermind the stackdump etc, just the
code)

Are you running this with an admin account or a non-admin account? If
admin, what are the permissions on the initdb.exe file and libpq.dll?

Anything weird in how you run it - do you specify a path to initdb, or
run it from current directory for example?

And finally, can you check with process explorer if it's the first or
second initdb that dies? (With this patch, initdb will re-exec itself
with lower privs)

//Magnus


Re: windows / initdb oddness

From
Andrew Dunstan
Date:
Magnus Hagander wrote:

>>I get a popup box that says:
>>
>>initdb.exe has encountered a problem and needs to close. 
>>We are sorry for the inconvenience.
>>
>>Clicking a link gives this info:
>>
>>AppName: initdb.exe      AppVer: 8.2.0.6051      ModName: msvcrt.dll
>>ModVer: 7.0.2600.1106    Offset: 00033830
>>
>>It wouldn't let me copy the rest of the info ;-(
>>    
>>
>
>Hm. Crap. (For those not familiar with this, that's a coredump without a
>core:-P)
>
>Does it give you an error code? (Nevermind the stackdump etc, just the
>code)
>  
>

I'll have a look when I get time to reboot the machine into Windows.

>Are you running this with an admin account or a non-admin account? If
>admin, what are the permissions on the initdb.exe file and libpq.dll?
>  
>

Should be unprivileged - it's the account I use to run buildfarm. (and 
which has therefore in each case just successfully run "make check" with 
the identical binaries).

>Anything weird in how you run it - do you specify a path to initdb, or
>run it from current directory for example?
>  
>

relative path: bin/initdb foo

(bin has libpq.dll as well as initdb.exe).

>And finally, can you check with process explorer if it's the first or
>second initdb that dies? (With this patch, initdb will re-exec itself
>with lower privs)
>
>  
>

I will add some trace writes when I get a chance. I was rather hoping 
something would jump out at you, but obviously it hasn't, so I'll have 
to dig into it the slow way. *sigh*

cheers

andrew


Re: windows / initdb oddness

From
"Magnus Hagander"
Date:
> >Are you running this with an admin account or a non-admin
> account? If
> >admin, what are the permissions on the initdb.exe file and libpq.dll?
> >
> >
>
> Should be unprivileged - it's the account I use to run
> buildfarm. (and which has therefore in each case just
> successfully run "make check" with the identical binaries).

Hm. Ok. Is that part running under msys or "plain windows"? I haven't
done much testing under msys. (Though it's clearly not always broken)


> >Anything weird in how you run it - do you specify a path to
> initdb, or
> >run it from current directory for example?
> >
>
> relative path: bin/initdb foo
>
> (bin has libpq.dll as well as initdb.exe).

But if this is from buildfarm, it should be the same on snake, right?


> >And finally, can you check with process explorer if it's the
> first or
> >second initdb that dies? (With this patch, initdb will
> re-exec itself
> >with lower privs)
> >
> >
> >
>
> I will add some trace writes when I get a chance. I was
> rather hoping something would jump out at you, but obviously
> it hasn't, so I'll have to dig into it the slow way. *sigh*

Ok. Sorry, can't help much. It's definitly quite possible it's somewhere
around the new priv code. My first guess would be something with the
build-new-commandline pieces, but I don't see what it should be...

//Magnus


Re: windows / initdb oddness

From
Andrew Dunstan
Date:
I wrote:

>
> I will add some trace writes when I get a chance. I was rather hoping 
> something would jump out at you, but obviously it hasn't, so I'll have 
> to dig into it the slow way. *sigh*


Just eyeballing the code it looks to me like the problem is this line:
       strcat(cmdline, *" --restrictedexec"*);


which is appending an option type argument after the non-option argument.


That would exactly account for the failure when we call "initdb foo" but 
not "initdb -D foo".

The solution would be put --restrictedexec earlier on the new command 
line. I'll work on that.

cheers

andrew


Re: windows / initdb oddness

From
Andrew Dunstan
Date:


Andrew Dunstan wrote:

> I wrote:
>
>>
>> I will add some trace writes when I get a chance. I was rather hoping
>> something would jump out at you, but obviously it hasn't, so I'll
>> have to dig into it the slow way. *sigh*
>
>
>
> Just eyeballing the code it looks to me like the problem is this line:
>
>        strcat(cmdline, *" --restrictedexec"*);
>
>
> which is appending an option type argument after the non-option argument.
>
>
> That would exactly account for the failure when we call "initdb foo"
> but not "initdb -D foo".
>
> The solution would be put --restrictedexec earlier on the new command
> line. I'll work on that.


The probem is apparently the one I identified above, and is fixed by the
attached patch, which I will apply soon unless there are objections.

As for why we saw this on loris but not snake, I suspect they might have
different getopt libraries installed.

cheers

andrew
Index: initdb.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/bin/initdb/initdb.c,v
retrieving revision 1.110
diff -c -r1.110 initdb.c
*** initdb.c    18 Feb 2006 16:15:23 -0000    1.110
--- initdb.c    22 Feb 2006 02:21:59 -0000
***************
*** 2450,2455 ****
--- 2450,2460 ----
                                   * environment */
      char        bin_dir[MAXPGPATH];
      char       *pg_data_native;
+
+ #ifdef WIN32
+     char      *orig_pgdata = NULL;
+ #endif
+
      static const char *subdirs[] = {
          "global",
          "pg_xlog",
***************
*** 2560,2565 ****
--- 2565,2573 ----
      if (optind < argc)
      {
          pg_data = xstrdup(argv[optind]);
+ #ifdef WIN32
+         orig_pgdata = xstrdup(pg_data);
+ #endif
          optind++;
      }

***************
*** 2648,2660 ****
      {
          PROCESS_INFORMATION pi;
          char *cmdline;
!
          ZeroMemory(&pi, sizeof(pi));

!         cmdline = pg_malloc(strlen(GetCommandLine()) + 19);
          strcpy(cmdline, GetCommandLine());
!         strcat(cmdline, " --restrictedexec");

          if (!CreateRestrictedProcess(cmdline, &pi))
          {
              fprintf(stderr,"Failed to re-exec with restricted token: %lu.\n", GetLastError());
--- 2656,2687 ----
      {
          PROCESS_INFORMATION pi;
          char *cmdline;
!
          ZeroMemory(&pi, sizeof(pi));

!         cmdline = pg_malloc(strlen(GetCommandLine()) + 20);
          strcpy(cmdline, GetCommandLine());
!
!         if (orig_pgdata  != NULL)
!         {
!             /* find the LAST occurrence of the data arg on the command line */
!             char *data_arg;
!             char *next_pos;
!             size_t orig_len = strlen(orig_pgdata);
!
!             data_arg=strstr(cmdline,orig_pgdata);
!             while ((next_pos=strstr(data_arg+1,orig_pgdata)) != NULL)
!                 data_arg = next_pos;
!             /* wipe it out so we can add the extra arg */
!             *data_arg = '\0';
!         }
!
!         strcat(cmdline, " --restrictedexec ");

+         /* put back original arg if we wiped it out above */
+         if (orig_pgdata  != NULL)
+             strcat(cmdline,orig_pgdata);
+
          if (!CreateRestrictedProcess(cmdline, &pi))
          {
              fprintf(stderr,"Failed to re-exec with restricted token: %lu.\n", GetLastError());

Re: windows / initdb oddness

From
"Magnus Hagander"
Date:
> >> I will add some trace writes when I get a chance. I was
> rather hoping
> >> something would jump out at you, but obviously it hasn't, so I'll
> >> have to dig into it the slow way. *sigh*
> >
> >
> >
> > Just eyeballing the code it looks to me like the problem is
> this line:
> >
> >        strcat(cmdline, *" --restrictedexec"*);
> >
> >
> > which is appending an option type argument after the
> non-option argument.
> >
> >
> > That would exactly account for the failure when we call
> "initdb foo"
> > but not "initdb -D foo".
> >
> > The solution would be put --restrictedexec earlier on the
> new command
> > line. I'll work on that.
>
>
> The probem is apparently the one I identified above, and is
> fixed by the attached patch, which I will apply soon unless
> there are objections.
>
> As for why we saw this on loris but not snake, I suspect they
> might have different getopt libraries installed.

Isn't that just fixing the symptom and not the actual bug? In this case,
if we cause the bug, we should do this as well, but doesn't it crash the
same way if you *manually* put arguments in the "wrong order" on the
commandline? Like "inidb foo --no-locale" or somehting like that?

(I still can't reproduce it on my machines, so I guess I have a better
getopt as well.)

//Magnus

Re: windows / initdb oddness

From
Andrew Dunstan
Date:

Magnus Hagander wrote:

>>>
>>>The solution would be put --restrictedexec earlier on the 
>>>      
>>>
>>new command 
>>    
>>
>>>line. I'll work on that.
>>>      
>>>
>>The probem is apparently the one I identified above, and is 
>>fixed by the attached patch, which I will apply soon unless 
>>there are objections.
>>
>>As for why we saw this on loris but not snake, I suspect they 
>>might have different getopt libraries installed.
>>    
>>
>
>Isn't that just fixing the symptom and not the actual bug? In this case,
>if we cause the bug, we should do this as well, but doesn't it crash the
>same way if you *manually* put arguments in the "wrong order" on the
>commandline? Like "inidb foo --no-locale" or somehting like that?
>
>(I still can't reproduce it on my machines, so I guess I have a better
>getopt as well.)
>
>
>  
>

We don't promise that you can put the pgdata argument anywhere except at 
the end of the command line. In fact, our manual page requires it at the 
end. Even on systems with GNU getopt, if POSIXLY_CORRECT is set then 
processing would stop at the first non-getopt argument.

So I can live with bombing, even if it's a bit unpleasant, in the case 
of  "initdb foo --no-locale", but we cannot *cause* that by appending a 
secret argument ourselves, so that "initdb foo" also bombs.

The logic to detect and correct this in the general case before getopt 
is called is not worth the pain.

cheers

andrew


Re: windows / initdb oddness

From
Andrew Dunstan
Date:
Andrew Dunstan wrote:

>
>
> Magnus Hagander wrote:
>
>>>>
>>>> The solution would be put --restrictedexec earlier on the
>>>
>>> new command
>>>
>>>> line. I'll work on that.
>>>>
>>>
>>> The probem is apparently the one I identified above, and is fixed by
>>> the attached patch, which I will apply soon unless there are
>>> objections.
>>>
>>> As for why we saw this on loris but not snake, I suspect they might
>>> have different getopt libraries installed.
>>>
>>
>>
>> Isn't that just fixing the symptom and not the actual bug? In this case,
>> if we cause the bug, we should do this as well, but doesn't it crash the
>> same way if you *manually* put arguments in the "wrong order" on the
>> commandline? Like "inidb foo --no-locale" or somehting like that?
>>
>> (I still can't reproduce it on my machines, so I guess I have a better
>> getopt as well.)
>>
>>
>>
>>
>
> We don't promise that you can put the pgdata argument anywhere except
> at the end of the command line. In fact, our manual page requires it
> at the end. Even on systems with GNU getopt, if POSIXLY_CORRECT is set
> then processing would stop at the first non-getopt argument.
>
> So I can live with bombing, even if it's a bit unpleasant, in the case
> of  "initdb foo --no-locale", but we cannot *cause* that by appending
> a secret argument ourselves, so that "initdb foo" also bombs.
>
> The logic to detect and correct this in the general case before getopt
> is called is not worth the pain.
>


Thinking about this a tiny bit more, it struck me that by far the best
way to do this is to stop using a magic argument and use the environment
instead. Then we don't need to mangle the command line at all. This
actually results in less code, and should be more robust (mangling the
command line in Windows is dangerous and difficult because of quotes).

Trial patch below, although I don't have a Windows box handy to test it on.


cheers

andrew




? .deps
? initdb
Index: initdb.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/initdb/initdb.c,v
retrieving revision 1.110
diff -c -r1.110 initdb.c
*** initdb.c    18 Feb 2006 16:15:23 -0000    1.110
--- initdb.c    22 Feb 2006 16:10:28 -0000
***************
*** 95,103 ****
  static bool debug = false;
  static bool noclean = false;
  static bool show_setting = false;
- #ifdef WIN32
- static bool restricted_exec = false;
- #endif


  /* internal vars */
--- 95,100 ----
***************
*** 2427,2437 ****
          {"lc-messages", required_argument, NULL, 7},
          {"no-locale", no_argument, NULL, 8},
          {"auth", required_argument, NULL, 'A'},
!         {"pwprompt", no_argument, NULL, 'W'},
          {"pwfile", required_argument, NULL, 9},
- #ifdef WIN32
-         {"restrictedexec", no_argument, NULL, 10},
- #endif
          {"username", required_argument, NULL, 'U'},
          {"help", no_argument, NULL, '?'},
          {"version", no_argument, NULL, 'V'},
--- 2424,2431 ----
          {"lc-messages", required_argument, NULL, 7},
          {"no-locale", no_argument, NULL, 8},
          {"auth", required_argument, NULL, 'A'},
!         {"pwprompt", no_argument, NULL, 'W'},
          {"pwfile", required_argument, NULL, 9},
          {"username", required_argument, NULL, 'U'},
          {"help", no_argument, NULL, '?'},
          {"version", no_argument, NULL, 'V'},
***************
*** 2450,2455 ****
--- 2444,2452 ----
                                   * environment */
      char        bin_dir[MAXPGPATH];
      char       *pg_data_native;
+ #ifdef WIN32
+     char       *restrict_env;
+ #endif
      static const char *subdirs[] = {
          "global",
          "pg_xlog",
***************
*** 2540,2550 ****
              case 9:
                  pwfilename = xstrdup(optarg);
                  break;
- #ifdef WIN32
-             case 10:
-                 restricted_exec = true;
-                 break;
- #endif
              case 's':
                  show_setting = true;
                  break;
--- 2537,2542 ----
***************
*** 2556,2561 ****
--- 2548,2554 ----
          }
      }

+
      /* Non-option argument specifies data directory */
      if (optind < argc)
      {
***************
*** 2644,2659 ****
       * Before we execute another program, make sure that we are running with a
       * restricted token. If not, re-execute ourselves with one.
       */
!     if (!restricted_exec)
      {
          PROCESS_INFORMATION pi;
          char *cmdline;

          ZeroMemory(&pi, sizeof(pi));

!         cmdline = pg_malloc(strlen(GetCommandLine()) + 19);
!         strcpy(cmdline, GetCommandLine());
!         strcat(cmdline, " --restrictedexec");

          if (!CreateRestrictedProcess(cmdline, &pi))
          {
--- 2637,2654 ----
       * Before we execute another program, make sure that we are running with a
       * restricted token. If not, re-execute ourselves with one.
       */
!
!     if ((restrict_env = getenv("PG_RESTRICT_EXEC")) == NULL
!         || strcmp(restrict_env,"1") != 0)
      {
          PROCESS_INFORMATION pi;
          char *cmdline;

          ZeroMemory(&pi, sizeof(pi));

!         cmdline = x_strdup(GetCommandLine());
!
!         putenv("PG_RESTRICT_EXEC=1");

          if (!CreateRestrictedProcess(cmdline, &pi))
          {

Re: windows / initdb oddness

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Thinking about this a tiny bit more, it struck me that by far the best 
> way to do this is to stop using a magic argument and use the environment 
> instead. Then we don't need to mangle the command line at all. This 
> actually results in less code, and should be more robust (mangling the 
> command line in Windows is dangerous and difficult because of quotes).

This seems like a good idea.

Is there any reason to worry about an accidental environment conflict?
If someone mistakenly did "export PG_RESTRICT_EXEC=1", it looks to me
like this would cause the re-exec bit to be skipped, but I suppose the
worst possible consequence is that the postmaster would refuse to start.
Is there anything I don't see?  (Of course, the magic argument method
can be broken manually in just the same way...)
        regards, tom lane


Re: windows / initdb oddness

From
Andrew Dunstan
Date:
Tom Lane wrote:

>
>Is there any reason to worry about an accidental environment conflict?
>If someone mistakenly did "export PG_RESTRICT_EXEC=1", it looks to me
>like this would cause the re-exec bit to be skipped, but I suppose the
>worst possible consequence is that the postmaster would refuse to start.
>Is there anything I don't see?  (Of course, the magic argument method
>can be broken manually in just the same way...)
>
>    
>  
>

Yes. The effect would be that we just do exactly what we do today 
anyway.  We could make the value some more obscure token, but I don't 
see much point.

cheers

andrew



Re: windows / initdb oddness

From
"Magnus Hagander"
Date:
> > Thinking about this a tiny bit more, it struck me that by
> far the best
> > way to do this is to stop using a magic argument and use the
> > environment instead. Then we don't need to mangle the
> command line at
> > all. This actually results in less code, and should be more robust
> > (mangling the command line in Windows is dangerous and
> difficult because of quotes).
>
> This seems like a good idea.
>
> Is there any reason to worry about an accidental environment conflict?
> If someone mistakenly did "export PG_RESTRICT_EXEC=1", it
> looks to me like this would cause the re-exec bit to be
> skipped, but I suppose the worst possible consequence is that
> the postmaster would refuse to start.
> Is there anything I don't see?  (Of course, the magic
> argument method can be broken manually in just the same way...)

This only affects initdb, not postmaster.

I don't see the risk being bigger with environment than commandline at
all.

//Magnus


Re: windows / initdb oddness

From
"Magnus Hagander"
Date:
> >Is there any reason to worry about an accidental environment
> conflict?
> >If someone mistakenly did "export PG_RESTRICT_EXEC=1", it
> looks to me
> >like this would cause the re-exec bit to be skipped, but I
> suppose the
> >worst possible consequence is that the postmaster would
> refuse to start.
> >Is there anything I don't see?  (Of course, the magic
> argument method
> >can be broken manually in just the same way...)
> >
> >
> >
> >
>
> Yes. The effect would be that we just do exactly what we do
> today anyway.  We could make the value some more obscure
> token, but I don't see much point.

No, if the user wants to break it, go ahead. They're just going to break
things for themselves (since the execuited postgres.exe still retains
the admin check and will bail out). I see no reason to make it obscure.

//Magnus