Thread: windows / initdb oddness
This took me hours to find ... On my Windows box, CVS HEAD gets an execution failure on "initdb foo" but succeeds happily with "initdb -D foo". This is not true for REL8_1_STABLE, nor is it true for all Windows machines/environments, apparently, otherwise we would be seeing failures from the buildfarm member snake, since the buildfarm script does "initdb data". My suspicion is that this is some side effect of the restricted-exec patch, but I don't have time to dig further right now. cheers andrew
> This took me hours to find ... > > On my Windows box, CVS HEAD gets an execution failure on "initdb foo" > but succeeds happily with "initdb -D foo". > > This is not true for REL8_1_STABLE, nor is it true for all > Windows machines/environments, apparently, otherwise we would > be seeing failures from the buildfarm member snake, since the > buildfarm script does "initdb data". > > My suspicion is that this is some side effect of the > restricted-exec patch, but I don't have time to dig further right now. Um, so what error msg do you get when it's failing? //Magnus
Magnus Hagander wrote: >>This took me hours to find ... >> >>On my Windows box, CVS HEAD gets an execution failure on "initdb foo" >>but succeeds happily with "initdb -D foo". >> >>This is not true for REL8_1_STABLE, nor is it true for all >>Windows machines/environments, apparently, otherwise we would >>be seeing failures from the buildfarm member snake, since the >>buildfarm script does "initdb data". >> >>My suspicion is that this is some side effect of the >>restricted-exec patch, but I don't have time to dig further right now. >> >> > >Um, so what error msg do you get when it's failing? > > I get a popup box that says: initdb.exe has encountered a problem and needs to close. We are sorry for the inconvenience. Clicking a link gives this info: AppName: initdb.exe AppVer: 8.2.0.6051 ModName: msvcrt.dll ModVer: 7.0.2600.1106 Offset: 00033830 It wouldn't let me copy the rest of the info ;-( cheers andrew
> >>This took me hours to find ... > >> > >>On my Windows box, CVS HEAD gets an execution failure on > "initdb foo" > >>but succeeds happily with "initdb -D foo". > >> > >>This is not true for REL8_1_STABLE, nor is it true for all Windows > >>machines/environments, apparently, otherwise we would be seeing > >>failures from the buildfarm member snake, since the > buildfarm script > >>does "initdb data". > >> > >>My suspicion is that this is some side effect of the > restricted-exec > >>patch, but I don't have time to dig further right now. > >> > >> > > > >Um, so what error msg do you get when it's failing? > > > > > > > I get a popup box that says: > > initdb.exe has encountered a problem and needs to close. > We are sorry for the inconvenience. > > Clicking a link gives this info: > > AppName: initdb.exe AppVer: 8.2.0.6051 ModName: msvcrt.dll > ModVer: 7.0.2600.1106 Offset: 00033830 > > It wouldn't let me copy the rest of the info ;-( Hm. Crap. (For those not familiar with this, that's a coredump without a core:-P) Does it give you an error code? (Nevermind the stackdump etc, just the code) Are you running this with an admin account or a non-admin account? If admin, what are the permissions on the initdb.exe file and libpq.dll? Anything weird in how you run it - do you specify a path to initdb, or run it from current directory for example? And finally, can you check with process explorer if it's the first or second initdb that dies? (With this patch, initdb will re-exec itself with lower privs) //Magnus
Magnus Hagander wrote: >>I get a popup box that says: >> >>initdb.exe has encountered a problem and needs to close. >>We are sorry for the inconvenience. >> >>Clicking a link gives this info: >> >>AppName: initdb.exe AppVer: 8.2.0.6051 ModName: msvcrt.dll >>ModVer: 7.0.2600.1106 Offset: 00033830 >> >>It wouldn't let me copy the rest of the info ;-( >> >> > >Hm. Crap. (For those not familiar with this, that's a coredump without a >core:-P) > >Does it give you an error code? (Nevermind the stackdump etc, just the >code) > > I'll have a look when I get time to reboot the machine into Windows. >Are you running this with an admin account or a non-admin account? If >admin, what are the permissions on the initdb.exe file and libpq.dll? > > Should be unprivileged - it's the account I use to run buildfarm. (and which has therefore in each case just successfully run "make check" with the identical binaries). >Anything weird in how you run it - do you specify a path to initdb, or >run it from current directory for example? > > relative path: bin/initdb foo (bin has libpq.dll as well as initdb.exe). >And finally, can you check with process explorer if it's the first or >second initdb that dies? (With this patch, initdb will re-exec itself >with lower privs) > > > I will add some trace writes when I get a chance. I was rather hoping something would jump out at you, but obviously it hasn't, so I'll have to dig into it the slow way. *sigh* cheers andrew
> >Are you running this with an admin account or a non-admin > account? If > >admin, what are the permissions on the initdb.exe file and libpq.dll? > > > > > > Should be unprivileged - it's the account I use to run > buildfarm. (and which has therefore in each case just > successfully run "make check" with the identical binaries). Hm. Ok. Is that part running under msys or "plain windows"? I haven't done much testing under msys. (Though it's clearly not always broken) > >Anything weird in how you run it - do you specify a path to > initdb, or > >run it from current directory for example? > > > > relative path: bin/initdb foo > > (bin has libpq.dll as well as initdb.exe). But if this is from buildfarm, it should be the same on snake, right? > >And finally, can you check with process explorer if it's the > first or > >second initdb that dies? (With this patch, initdb will > re-exec itself > >with lower privs) > > > > > > > > I will add some trace writes when I get a chance. I was > rather hoping something would jump out at you, but obviously > it hasn't, so I'll have to dig into it the slow way. *sigh* Ok. Sorry, can't help much. It's definitly quite possible it's somewhere around the new priv code. My first guess would be something with the build-new-commandline pieces, but I don't see what it should be... //Magnus
I wrote: > > I will add some trace writes when I get a chance. I was rather hoping > something would jump out at you, but obviously it hasn't, so I'll have > to dig into it the slow way. *sigh* Just eyeballing the code it looks to me like the problem is this line: strcat(cmdline, *" --restrictedexec"*); which is appending an option type argument after the non-option argument. That would exactly account for the failure when we call "initdb foo" but not "initdb -D foo". The solution would be put --restrictedexec earlier on the new command line. I'll work on that. cheers andrew
Andrew Dunstan wrote: > I wrote: > >> >> I will add some trace writes when I get a chance. I was rather hoping >> something would jump out at you, but obviously it hasn't, so I'll >> have to dig into it the slow way. *sigh* > > > > Just eyeballing the code it looks to me like the problem is this line: > > strcat(cmdline, *" --restrictedexec"*); > > > which is appending an option type argument after the non-option argument. > > > That would exactly account for the failure when we call "initdb foo" > but not "initdb -D foo". > > The solution would be put --restrictedexec earlier on the new command > line. I'll work on that. The probem is apparently the one I identified above, and is fixed by the attached patch, which I will apply soon unless there are objections. As for why we saw this on loris but not snake, I suspect they might have different getopt libraries installed. cheers andrew Index: initdb.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/bin/initdb/initdb.c,v retrieving revision 1.110 diff -c -r1.110 initdb.c *** initdb.c 18 Feb 2006 16:15:23 -0000 1.110 --- initdb.c 22 Feb 2006 02:21:59 -0000 *************** *** 2450,2455 **** --- 2450,2460 ---- * environment */ char bin_dir[MAXPGPATH]; char *pg_data_native; + + #ifdef WIN32 + char *orig_pgdata = NULL; + #endif + static const char *subdirs[] = { "global", "pg_xlog", *************** *** 2560,2565 **** --- 2565,2573 ---- if (optind < argc) { pg_data = xstrdup(argv[optind]); + #ifdef WIN32 + orig_pgdata = xstrdup(pg_data); + #endif optind++; } *************** *** 2648,2660 **** { PROCESS_INFORMATION pi; char *cmdline; ! ZeroMemory(&pi, sizeof(pi)); ! cmdline = pg_malloc(strlen(GetCommandLine()) + 19); strcpy(cmdline, GetCommandLine()); ! strcat(cmdline, " --restrictedexec"); if (!CreateRestrictedProcess(cmdline, &pi)) { fprintf(stderr,"Failed to re-exec with restricted token: %lu.\n", GetLastError()); --- 2656,2687 ---- { PROCESS_INFORMATION pi; char *cmdline; ! ZeroMemory(&pi, sizeof(pi)); ! cmdline = pg_malloc(strlen(GetCommandLine()) + 20); strcpy(cmdline, GetCommandLine()); ! ! if (orig_pgdata != NULL) ! { ! /* find the LAST occurrence of the data arg on the command line */ ! char *data_arg; ! char *next_pos; ! size_t orig_len = strlen(orig_pgdata); ! ! data_arg=strstr(cmdline,orig_pgdata); ! while ((next_pos=strstr(data_arg+1,orig_pgdata)) != NULL) ! data_arg = next_pos; ! /* wipe it out so we can add the extra arg */ ! *data_arg = '\0'; ! } ! ! strcat(cmdline, " --restrictedexec "); + /* put back original arg if we wiped it out above */ + if (orig_pgdata != NULL) + strcat(cmdline,orig_pgdata); + if (!CreateRestrictedProcess(cmdline, &pi)) { fprintf(stderr,"Failed to re-exec with restricted token: %lu.\n", GetLastError());
> >> I will add some trace writes when I get a chance. I was > rather hoping > >> something would jump out at you, but obviously it hasn't, so I'll > >> have to dig into it the slow way. *sigh* > > > > > > > > Just eyeballing the code it looks to me like the problem is > this line: > > > > strcat(cmdline, *" --restrictedexec"*); > > > > > > which is appending an option type argument after the > non-option argument. > > > > > > That would exactly account for the failure when we call > "initdb foo" > > but not "initdb -D foo". > > > > The solution would be put --restrictedexec earlier on the > new command > > line. I'll work on that. > > > The probem is apparently the one I identified above, and is > fixed by the attached patch, which I will apply soon unless > there are objections. > > As for why we saw this on loris but not snake, I suspect they > might have different getopt libraries installed. Isn't that just fixing the symptom and not the actual bug? In this case, if we cause the bug, we should do this as well, but doesn't it crash the same way if you *manually* put arguments in the "wrong order" on the commandline? Like "inidb foo --no-locale" or somehting like that? (I still can't reproduce it on my machines, so I guess I have a better getopt as well.) //Magnus
Magnus Hagander wrote: >>> >>>The solution would be put --restrictedexec earlier on the >>> >>> >>new command >> >> >>>line. I'll work on that. >>> >>> >>The probem is apparently the one I identified above, and is >>fixed by the attached patch, which I will apply soon unless >>there are objections. >> >>As for why we saw this on loris but not snake, I suspect they >>might have different getopt libraries installed. >> >> > >Isn't that just fixing the symptom and not the actual bug? In this case, >if we cause the bug, we should do this as well, but doesn't it crash the >same way if you *manually* put arguments in the "wrong order" on the >commandline? Like "inidb foo --no-locale" or somehting like that? > >(I still can't reproduce it on my machines, so I guess I have a better >getopt as well.) > > > > We don't promise that you can put the pgdata argument anywhere except at the end of the command line. In fact, our manual page requires it at the end. Even on systems with GNU getopt, if POSIXLY_CORRECT is set then processing would stop at the first non-getopt argument. So I can live with bombing, even if it's a bit unpleasant, in the case of "initdb foo --no-locale", but we cannot *cause* that by appending a secret argument ourselves, so that "initdb foo" also bombs. The logic to detect and correct this in the general case before getopt is called is not worth the pain. cheers andrew
Andrew Dunstan wrote: > > > Magnus Hagander wrote: > >>>> >>>> The solution would be put --restrictedexec earlier on the >>> >>> new command >>> >>>> line. I'll work on that. >>>> >>> >>> The probem is apparently the one I identified above, and is fixed by >>> the attached patch, which I will apply soon unless there are >>> objections. >>> >>> As for why we saw this on loris but not snake, I suspect they might >>> have different getopt libraries installed. >>> >> >> >> Isn't that just fixing the symptom and not the actual bug? In this case, >> if we cause the bug, we should do this as well, but doesn't it crash the >> same way if you *manually* put arguments in the "wrong order" on the >> commandline? Like "inidb foo --no-locale" or somehting like that? >> >> (I still can't reproduce it on my machines, so I guess I have a better >> getopt as well.) >> >> >> >> > > We don't promise that you can put the pgdata argument anywhere except > at the end of the command line. In fact, our manual page requires it > at the end. Even on systems with GNU getopt, if POSIXLY_CORRECT is set > then processing would stop at the first non-getopt argument. > > So I can live with bombing, even if it's a bit unpleasant, in the case > of "initdb foo --no-locale", but we cannot *cause* that by appending > a secret argument ourselves, so that "initdb foo" also bombs. > > The logic to detect and correct this in the general case before getopt > is called is not worth the pain. > Thinking about this a tiny bit more, it struck me that by far the best way to do this is to stop using a magic argument and use the environment instead. Then we don't need to mangle the command line at all. This actually results in less code, and should be more robust (mangling the command line in Windows is dangerous and difficult because of quotes). Trial patch below, although I don't have a Windows box handy to test it on. cheers andrew ? .deps ? initdb Index: initdb.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/initdb/initdb.c,v retrieving revision 1.110 diff -c -r1.110 initdb.c *** initdb.c 18 Feb 2006 16:15:23 -0000 1.110 --- initdb.c 22 Feb 2006 16:10:28 -0000 *************** *** 95,103 **** static bool debug = false; static bool noclean = false; static bool show_setting = false; - #ifdef WIN32 - static bool restricted_exec = false; - #endif /* internal vars */ --- 95,100 ---- *************** *** 2427,2437 **** {"lc-messages", required_argument, NULL, 7}, {"no-locale", no_argument, NULL, 8}, {"auth", required_argument, NULL, 'A'}, ! {"pwprompt", no_argument, NULL, 'W'}, {"pwfile", required_argument, NULL, 9}, - #ifdef WIN32 - {"restrictedexec", no_argument, NULL, 10}, - #endif {"username", required_argument, NULL, 'U'}, {"help", no_argument, NULL, '?'}, {"version", no_argument, NULL, 'V'}, --- 2424,2431 ---- {"lc-messages", required_argument, NULL, 7}, {"no-locale", no_argument, NULL, 8}, {"auth", required_argument, NULL, 'A'}, ! {"pwprompt", no_argument, NULL, 'W'}, {"pwfile", required_argument, NULL, 9}, {"username", required_argument, NULL, 'U'}, {"help", no_argument, NULL, '?'}, {"version", no_argument, NULL, 'V'}, *************** *** 2450,2455 **** --- 2444,2452 ---- * environment */ char bin_dir[MAXPGPATH]; char *pg_data_native; + #ifdef WIN32 + char *restrict_env; + #endif static const char *subdirs[] = { "global", "pg_xlog", *************** *** 2540,2550 **** case 9: pwfilename = xstrdup(optarg); break; - #ifdef WIN32 - case 10: - restricted_exec = true; - break; - #endif case 's': show_setting = true; break; --- 2537,2542 ---- *************** *** 2556,2561 **** --- 2548,2554 ---- } } + /* Non-option argument specifies data directory */ if (optind < argc) { *************** *** 2644,2659 **** * Before we execute another program, make sure that we are running with a * restricted token. If not, re-execute ourselves with one. */ ! if (!restricted_exec) { PROCESS_INFORMATION pi; char *cmdline; ZeroMemory(&pi, sizeof(pi)); ! cmdline = pg_malloc(strlen(GetCommandLine()) + 19); ! strcpy(cmdline, GetCommandLine()); ! strcat(cmdline, " --restrictedexec"); if (!CreateRestrictedProcess(cmdline, &pi)) { --- 2637,2654 ---- * Before we execute another program, make sure that we are running with a * restricted token. If not, re-execute ourselves with one. */ ! ! if ((restrict_env = getenv("PG_RESTRICT_EXEC")) == NULL ! || strcmp(restrict_env,"1") != 0) { PROCESS_INFORMATION pi; char *cmdline; ZeroMemory(&pi, sizeof(pi)); ! cmdline = x_strdup(GetCommandLine()); ! ! putenv("PG_RESTRICT_EXEC=1"); if (!CreateRestrictedProcess(cmdline, &pi)) {
Andrew Dunstan <andrew@dunslane.net> writes: > Thinking about this a tiny bit more, it struck me that by far the best > way to do this is to stop using a magic argument and use the environment > instead. Then we don't need to mangle the command line at all. This > actually results in less code, and should be more robust (mangling the > command line in Windows is dangerous and difficult because of quotes). This seems like a good idea. Is there any reason to worry about an accidental environment conflict? If someone mistakenly did "export PG_RESTRICT_EXEC=1", it looks to me like this would cause the re-exec bit to be skipped, but I suppose the worst possible consequence is that the postmaster would refuse to start. Is there anything I don't see? (Of course, the magic argument method can be broken manually in just the same way...) regards, tom lane
Tom Lane wrote: > >Is there any reason to worry about an accidental environment conflict? >If someone mistakenly did "export PG_RESTRICT_EXEC=1", it looks to me >like this would cause the re-exec bit to be skipped, but I suppose the >worst possible consequence is that the postmaster would refuse to start. >Is there anything I don't see? (Of course, the magic argument method >can be broken manually in just the same way...) > > > > Yes. The effect would be that we just do exactly what we do today anyway. We could make the value some more obscure token, but I don't see much point. cheers andrew
> > Thinking about this a tiny bit more, it struck me that by > far the best > > way to do this is to stop using a magic argument and use the > > environment instead. Then we don't need to mangle the > command line at > > all. This actually results in less code, and should be more robust > > (mangling the command line in Windows is dangerous and > difficult because of quotes). > > This seems like a good idea. > > Is there any reason to worry about an accidental environment conflict? > If someone mistakenly did "export PG_RESTRICT_EXEC=1", it > looks to me like this would cause the re-exec bit to be > skipped, but I suppose the worst possible consequence is that > the postmaster would refuse to start. > Is there anything I don't see? (Of course, the magic > argument method can be broken manually in just the same way...) This only affects initdb, not postmaster. I don't see the risk being bigger with environment than commandline at all. //Magnus
> >Is there any reason to worry about an accidental environment > conflict? > >If someone mistakenly did "export PG_RESTRICT_EXEC=1", it > looks to me > >like this would cause the re-exec bit to be skipped, but I > suppose the > >worst possible consequence is that the postmaster would > refuse to start. > >Is there anything I don't see? (Of course, the magic > argument method > >can be broken manually in just the same way...) > > > > > > > > > > Yes. The effect would be that we just do exactly what we do > today anyway. We could make the value some more obscure > token, but I don't see much point. No, if the user wants to break it, go ahead. They're just going to break things for themselves (since the execuited postgres.exe still retains the admin check and will bail out). I see no reason to make it obscure. //Magnus