Thread: pg_regress.sh startup failure patch

pg_regress.sh startup failure patch

From
Jason Tishler
Date:
Unfortunately, pg_regress.sh fails under Cygwin as follows:

    ============== creating database "regression"         ==============
    psql: FATAL 1:  The database system is starting up

    createdb: database creation failed
    pg_regress: createdb failed

The attached patch "solves" the problem.  Would you be willing to accept
this patch into 7.2?  Or, at least one specifically for Cygwin?

Thanks,
Jason

Attachment

Re: pg_regress.sh startup failure patch

From
Tom Lane
Date:
Jason Tishler <jason@tishler.net> writes:
> Unfortunately, pg_regress.sh fails under Cygwin as follows:
> The attached patch "solves" the problem.

Why would it take more than 3 seconds to start the postmaster under
Cygwin?  Something awfully fishy about that, unless you're using
a 286 ...

I didn't much care for the arbitrary delay in the first place, and
raising it to 10 sec is even less palatable.  Perhaps

    until psql ...args... </dev/null 2>/dev/null
    do
        sleep 1
    done

although it might also be prudent to refuse to loop more than a couple
dozen times.

            regards, tom lane

Re: pg_regress.sh startup failure patch

From
Dave Page
Date:

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 03 January 2002 15:28
> To: Jason Tishler
> Cc: Pgsql-Patches
> Subject: Re: [PATCHES] pg_regress.sh startup failure patch
>
>
> Jason Tishler <jason@tishler.net> writes:
> > Unfortunately, pg_regress.sh fails under Cygwin as follows: The
> > attached patch "solves" the problem.
>
> Why would it take more than 3 seconds to start the postmaster
> under Cygwin?  Something awfully fishy about that, unless
> you're using a 286 ...

On a Dell Inspiron 8000, PIII 850MHz, 512Mb RAM, Windows XP Pro (kept nice
and tidy with no junk wasting resources), 7.2b4 takes about 15 seconds to
get to 'the database system is ready' message. Subsequent startups take
about 6 or 7 seconds following a controlled *or* uncontrolled shutdown. I
get about 15 seconds again the first startup after a reboot.

All regression tests pass except the known issues with parallel tests so I
assume everythings OK...

Regards, Dave.


Re: pg_regress.sh startup failure patch

From
Jason Tishler
Date:
Tom,

On Thu, Jan 03, 2002 at 10:28:05AM -0500, Tom Lane wrote:
> Jason Tishler <jason@tishler.net> writes:
> > Unfortunately, pg_regress.sh fails under Cygwin as follows:
> > The attached patch "solves" the problem.
>
> Why would it take more than 3 seconds to start the postmaster under
> Cygwin?  Something awfully fishy about that, unless you're using
> a 286 ...

I never had this problem before on my home server machine (PIII 500 MHz)
with previous PostgreSQL versions.  However, on my work laptop (also
PIII 500 MHz, but virus software, slow disk, etc.), PostgreSQL CVS just
needs more time to start up.

> I didn't much care for the arbitrary delay in the first place, and
> raising it to 10 sec is even less palatable.

Agreed on both accounts -- I detest open loop solutions myself.

> Perhaps
>
>     until psql ...args... </dev/null 2>/dev/null
>     do
>         sleep 1
>     done
>
> although it might also be prudent to refuse to loop more than a couple
> dozen times.

I was going to suggest the retry strategy, but I wasn't sure that such a
patch would be accepted at this time.  How should I proceed?

Thanks,
Jason

Re: pg_regress.sh startup failure patch

From
Tom Lane
Date:
Jason Tishler <jason@tishler.net> writes:
>> Why would it take more than 3 seconds to start the postmaster under
>> Cygwin?  Something awfully fishy about that, unless you're using
>> a 286 ...

> I never had this problem before on my home server machine (PIII 500 MHz)
> with previous PostgreSQL versions.  However, on my work laptop (also
> PIII 500 MHz, but virus software, slow disk, etc.), PostgreSQL CVS just
> needs more time to start up.

Hm.  That deserves investigation, but it seems not high priority
compared to getting a release out.

>> I didn't much care for the arbitrary delay in the first place, and
>> raising it to 10 sec is even less palatable.

> Agreed on both accounts -- I detest open loop solutions myself.
> I was going to suggest the retry strategy, but I wasn't sure that such a
> patch would be accepted at this time.  How should I proceed?

Code up a patch, test it, send in a diff ... I think the only real risk
here is to be careful not to write anything unportable.  I believe that
"until" loops exist even in very old Bourne shells, does anyone think
differently?

            regards, tom lane

Re: pg_regress.sh startup failure patch

From
Bruce Momjian
Date:
> > Agreed on both accounts -- I detest open loop solutions myself.
> > I was going to suggest the retry strategy, but I wasn't sure that such a
> > patch would be accepted at this time.  How should I proceed?
>
> Code up a patch, test it, send in a diff ... I think the only real risk
> here is to be careful not to write anything unportable.  I believe that
> "until" loops exist even in very old Bourne shells, does anyone think
> differently?

Probably true, but I have never seen 'until' used in a script.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: pg_regress.sh startup failure patch

From
Jason Tishler
Date:
Tom,

On Thu, Jan 03, 2002 at 11:47:06AM -0500, Tom Lane wrote:
> Code up a patch, test it, send in a diff ...

Is the attached patch acceptable?

> I think the only real risk
> here is to be careful not to write anything unportable.  I believe that
> "until" loops exist even in very old Bourne shells, does anyone think
> differently?

I just checked one of my old favorites, "The UNIX Programming
Environment," by Kernighan and Pike, 1984.  It appears that "until"
is understood by the Bourne shell back then, so its use should be OK.

Thanks,
Jason

Attachment

Re: pg_regress.sh startup failure patch

From
Tom Lane
Date:
Dave Page <dpage@vale-housing.co.uk> writes:
>> Why would it take more than 3 seconds to start the postmaster
>> under Cygwin?  Something awfully fishy about that, unless
>> you're using a 286 ...

> On a Dell Inspiron 8000, PIII 850MHz, 512Mb RAM, Windows XP Pro (kept nice
> and tidy with no junk wasting resources), 7.2b4 takes about 15 seconds to
> get to 'the database system is ready' message. Subsequent startups take
> about 6 or 7 seconds following a controlled *or* uncontrolled shutdown. I
> get about 15 seconds again the first startup after a reboot.

Hm.  I'm accustomed to seeing postmaster startup take about one second
--- possibly more if recovery from WAL entries is needed, but this
wouldn't apply normally.  That's on machines a *lot* slower than you
two are using.  Something is taking an unreasonably long time there.
It'd be worth poking into it to try to figure out what.

            regards, tom lane

Re: pg_regress.sh startup failure patch

From
Tom Lane
Date:
Jason Tishler <jason@tishler.net> writes:
> I just checked one of my old favorites, "The UNIX Programming
> Environment," by Kernighan and Pike, 1984.  It appears that "until"
> is understood by the Bourne shell back then, so its use should be OK.

Yeah, that's what I thought.  I ended up applying the attached patch;
this not only avoids the timing problem but has more reliable detection
of postmaster startup failure than the original code.

            regards, tom lane


*** src/test/regress/pg_regress.sh.orig    Sun Sep 16 12:11:11 2001
--- src/test/regress/pg_regress.sh    Thu Jan  3 16:52:05 2002
***************
*** 353,358 ****
--- 353,379 ----
      "$bindir/postmaster" -D "$PGDATA" -F $postmaster_options >"$LOGDIR/postmaster.log" 2>&1 &
      postmaster_pid=$!

+     # Wait till postmaster is able to accept connections (normally only
+     # a second or so, but Cygwin is reportedly *much* slower).  Don't
+     # wait forever, however.
+     i=0
+     max=60
+     until "$bindir/psql" $psql_options template1 </dev/null 2>/dev/null
+     do
+         i=`expr $i + 1`
+         if [ $i -ge $max ]
+         then
+             break
+         fi
+         if kill -0 $postmaster_pid >/dev/null 2>&1
+         then
+             : still starting up
+         else
+             break
+         fi
+         sleep 1
+     done
+
      if kill -0 $postmaster_pid >/dev/null 2>&1
      then
          echo "running on port $PGPORT with pid $postmaster_pid"
***************
*** 363,371 ****
          echo
          (exit 2); exit
      fi
-
-     # give postmaster some time to pass WAL recovery
-     sleep 3

  else # not temp-install

--- 384,389 ----

Re: pg_regress.sh startup failure patch

From
"Dave Page"
Date:
Tom Lane allegedly said:
> Dave Page <dpage@vale-housing.co.uk> writes:
>>> Why would it take more than 3 seconds to start the postmaster
>>> under Cygwin?  Something awfully fishy about that, unless
>>> you're using a 286 ...
>
>> On a Dell Inspiron 8000, PIII 850MHz, 512Mb RAM, Windows XP Pro (kept
>> nice and tidy with no junk wasting resources), 7.2b4 takes about 15
>> seconds to get to 'the database system is ready' message. Subsequent
>> startups take about 6 or 7 seconds following a controlled *or*
>> uncontrolled shutdown. I get about 15 seconds again the first startup
>> after a reboot.
>
> Hm.  I'm accustomed to seeing postmaster startup take about one second
> --- possibly more if recovery from WAL entries is needed, but this
> wouldn't apply normally.  That's on machines a *lot* slower than you
> two are using.  Something is taking an unreasonably long time there.
> It'd be worth poking into it to try to figure out what.

I'd be happy to look into it, but I'll need some guidance - I'm not in the
least bit familiar with gdb or any of it's friends :-(

Regards, Dave.




Re: pg_regress.sh startup failure patch

From
Jason Tishler
Date:
Tom,

On Thu, Jan 03, 2002 at 04:55:45PM -0500, Tom Lane wrote:
> I ended up applying the attached patch;
> this not only avoids the timing problem but has more reliable detection
> of postmaster startup failure than the original code.

I just tried the above under Cygwin and it works great.

Thanks,
Jason