Thread: pg_regress.sh startup failure patch
Unfortunately, pg_regress.sh fails under Cygwin as follows: ============== creating database "regression" ============== psql: FATAL 1: The database system is starting up createdb: database creation failed pg_regress: createdb failed The attached patch "solves" the problem. Would you be willing to accept this patch into 7.2? Or, at least one specifically for Cygwin? Thanks, Jason
Attachment
Jason Tishler <jason@tishler.net> writes: > Unfortunately, pg_regress.sh fails under Cygwin as follows: > The attached patch "solves" the problem. Why would it take more than 3 seconds to start the postmaster under Cygwin? Something awfully fishy about that, unless you're using a 286 ... I didn't much care for the arbitrary delay in the first place, and raising it to 10 sec is even less palatable. Perhaps until psql ...args... </dev/null 2>/dev/null do sleep 1 done although it might also be prudent to refuse to loop more than a couple dozen times. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 03 January 2002 15:28 > To: Jason Tishler > Cc: Pgsql-Patches > Subject: Re: [PATCHES] pg_regress.sh startup failure patch > > > Jason Tishler <jason@tishler.net> writes: > > Unfortunately, pg_regress.sh fails under Cygwin as follows: The > > attached patch "solves" the problem. > > Why would it take more than 3 seconds to start the postmaster > under Cygwin? Something awfully fishy about that, unless > you're using a 286 ... On a Dell Inspiron 8000, PIII 850MHz, 512Mb RAM, Windows XP Pro (kept nice and tidy with no junk wasting resources), 7.2b4 takes about 15 seconds to get to 'the database system is ready' message. Subsequent startups take about 6 or 7 seconds following a controlled *or* uncontrolled shutdown. I get about 15 seconds again the first startup after a reboot. All regression tests pass except the known issues with parallel tests so I assume everythings OK... Regards, Dave.
Tom, On Thu, Jan 03, 2002 at 10:28:05AM -0500, Tom Lane wrote: > Jason Tishler <jason@tishler.net> writes: > > Unfortunately, pg_regress.sh fails under Cygwin as follows: > > The attached patch "solves" the problem. > > Why would it take more than 3 seconds to start the postmaster under > Cygwin? Something awfully fishy about that, unless you're using > a 286 ... I never had this problem before on my home server machine (PIII 500 MHz) with previous PostgreSQL versions. However, on my work laptop (also PIII 500 MHz, but virus software, slow disk, etc.), PostgreSQL CVS just needs more time to start up. > I didn't much care for the arbitrary delay in the first place, and > raising it to 10 sec is even less palatable. Agreed on both accounts -- I detest open loop solutions myself. > Perhaps > > until psql ...args... </dev/null 2>/dev/null > do > sleep 1 > done > > although it might also be prudent to refuse to loop more than a couple > dozen times. I was going to suggest the retry strategy, but I wasn't sure that such a patch would be accepted at this time. How should I proceed? Thanks, Jason
Jason Tishler <jason@tishler.net> writes: >> Why would it take more than 3 seconds to start the postmaster under >> Cygwin? Something awfully fishy about that, unless you're using >> a 286 ... > I never had this problem before on my home server machine (PIII 500 MHz) > with previous PostgreSQL versions. However, on my work laptop (also > PIII 500 MHz, but virus software, slow disk, etc.), PostgreSQL CVS just > needs more time to start up. Hm. That deserves investigation, but it seems not high priority compared to getting a release out. >> I didn't much care for the arbitrary delay in the first place, and >> raising it to 10 sec is even less palatable. > Agreed on both accounts -- I detest open loop solutions myself. > I was going to suggest the retry strategy, but I wasn't sure that such a > patch would be accepted at this time. How should I proceed? Code up a patch, test it, send in a diff ... I think the only real risk here is to be careful not to write anything unportable. I believe that "until" loops exist even in very old Bourne shells, does anyone think differently? regards, tom lane
> > Agreed on both accounts -- I detest open loop solutions myself. > > I was going to suggest the retry strategy, but I wasn't sure that such a > > patch would be accepted at this time. How should I proceed? > > Code up a patch, test it, send in a diff ... I think the only real risk > here is to be careful not to write anything unportable. I believe that > "until" loops exist even in very old Bourne shells, does anyone think > differently? Probably true, but I have never seen 'until' used in a script. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom, On Thu, Jan 03, 2002 at 11:47:06AM -0500, Tom Lane wrote: > Code up a patch, test it, send in a diff ... Is the attached patch acceptable? > I think the only real risk > here is to be careful not to write anything unportable. I believe that > "until" loops exist even in very old Bourne shells, does anyone think > differently? I just checked one of my old favorites, "The UNIX Programming Environment," by Kernighan and Pike, 1984. It appears that "until" is understood by the Bourne shell back then, so its use should be OK. Thanks, Jason
Attachment
Dave Page <dpage@vale-housing.co.uk> writes: >> Why would it take more than 3 seconds to start the postmaster >> under Cygwin? Something awfully fishy about that, unless >> you're using a 286 ... > On a Dell Inspiron 8000, PIII 850MHz, 512Mb RAM, Windows XP Pro (kept nice > and tidy with no junk wasting resources), 7.2b4 takes about 15 seconds to > get to 'the database system is ready' message. Subsequent startups take > about 6 or 7 seconds following a controlled *or* uncontrolled shutdown. I > get about 15 seconds again the first startup after a reboot. Hm. I'm accustomed to seeing postmaster startup take about one second --- possibly more if recovery from WAL entries is needed, but this wouldn't apply normally. That's on machines a *lot* slower than you two are using. Something is taking an unreasonably long time there. It'd be worth poking into it to try to figure out what. regards, tom lane
Jason Tishler <jason@tishler.net> writes: > I just checked one of my old favorites, "The UNIX Programming > Environment," by Kernighan and Pike, 1984. It appears that "until" > is understood by the Bourne shell back then, so its use should be OK. Yeah, that's what I thought. I ended up applying the attached patch; this not only avoids the timing problem but has more reliable detection of postmaster startup failure than the original code. regards, tom lane *** src/test/regress/pg_regress.sh.orig Sun Sep 16 12:11:11 2001 --- src/test/regress/pg_regress.sh Thu Jan 3 16:52:05 2002 *************** *** 353,358 **** --- 353,379 ---- "$bindir/postmaster" -D "$PGDATA" -F $postmaster_options >"$LOGDIR/postmaster.log" 2>&1 & postmaster_pid=$! + # Wait till postmaster is able to accept connections (normally only + # a second or so, but Cygwin is reportedly *much* slower). Don't + # wait forever, however. + i=0 + max=60 + until "$bindir/psql" $psql_options template1 </dev/null 2>/dev/null + do + i=`expr $i + 1` + if [ $i -ge $max ] + then + break + fi + if kill -0 $postmaster_pid >/dev/null 2>&1 + then + : still starting up + else + break + fi + sleep 1 + done + if kill -0 $postmaster_pid >/dev/null 2>&1 then echo "running on port $PGPORT with pid $postmaster_pid" *************** *** 363,371 **** echo (exit 2); exit fi - - # give postmaster some time to pass WAL recovery - sleep 3 else # not temp-install --- 384,389 ----
Tom Lane allegedly said: > Dave Page <dpage@vale-housing.co.uk> writes: >>> Why would it take more than 3 seconds to start the postmaster >>> under Cygwin? Something awfully fishy about that, unless >>> you're using a 286 ... > >> On a Dell Inspiron 8000, PIII 850MHz, 512Mb RAM, Windows XP Pro (kept >> nice and tidy with no junk wasting resources), 7.2b4 takes about 15 >> seconds to get to 'the database system is ready' message. Subsequent >> startups take about 6 or 7 seconds following a controlled *or* >> uncontrolled shutdown. I get about 15 seconds again the first startup >> after a reboot. > > Hm. I'm accustomed to seeing postmaster startup take about one second > --- possibly more if recovery from WAL entries is needed, but this > wouldn't apply normally. That's on machines a *lot* slower than you > two are using. Something is taking an unreasonably long time there. > It'd be worth poking into it to try to figure out what. I'd be happy to look into it, but I'll need some guidance - I'm not in the least bit familiar with gdb or any of it's friends :-( Regards, Dave.
Tom, On Thu, Jan 03, 2002 at 04:55:45PM -0500, Tom Lane wrote: > I ended up applying the attached patch; > this not only avoids the timing problem but has more reliable detection > of postmaster startup failure than the original code. I just tried the above under Cygwin and it works great. Thanks, Jason