Thread: 8.04 and RedHat/CentOS init script issue

8.04 and RedHat/CentOS init script issue

From

Tony Caduto

Date:

18 October 2005, 21:55:37

Hi,
I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 
and while booting the init script reports that the daemon [FAILED], but 
after I logon it shows the postmaster running and I am able to connect 
from any client remotely.

I made not modifcations to the script and there is nothing out of the 
ordinary in the log.

Thanks,

Tony

Re: 8.04 and RedHat/CentOS init script issue

From

Devrim GUNDUZ

Date:

19 October 2005, 17:35:01

Hi,

On Tue, 18 Oct 2005, Tony Caduto wrote:

> I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 and 
> while booting the init script reports that the daemon [FAILED], but after I 
> logon it shows the postmaster running and I am able to connect from any 
> client remotely.
>
> I made not modifcations to the script and there is nothing out of the 
> ordinary in the log.

Hmm. In 8.0.4 RPM init scripts, we were using a 1 second of sleep time 
(see sleep 1 line in the init script). On some cases where the system is 
slow, you are prompted about the startup failure; however this is not the 
real case.

In 8.1 RPMs, the sleep time was increased to 2 seconds; which we believe 
that won't have the problem you've reported:

http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgsqlrpms/patches/8.1/postgresql.init?rev=1.2&content-type=text/x-cvsweb-markup

So please increase this sleep time and give another try.

Regards,
--
Devrim GUNDUZ
Kivi Bilişim Teknolojileri - http://www.kivi.com.tr
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr                      http://www.gunduz.org

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Tony Caduto

Date:

20 October 2005, 14:18:25

Hi all,
I tried changing the sleep command in the script to 2, but at boot it 
still says [FAILED].
even though the script reports it failed, the db is up an running.

System is a Compaq DL380(2.5gb ram 2.4 dual 2.4gzh Xeon) running CentOS 4.2

I am going to install 8.1beta 3 on another box that is the exact same 
hardware and OS version, I will report back what happens.

Not sure what is going on, has anyone else had this problem with CentOS 
4.2 or Red Had EL 4.2?

Thanks,

Tony Caduto
http://www.amsoftwaredesign.com
Home of PG Lightning Admin for Postgresql 8.x

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Tom Lane

Date:

20 October 2005, 14:32:45

Tony Caduto <tony_caduto@amsoftwaredesign.com> writes:
> I tried changing the sleep command in the script to 2, but at boot it 
> still says [FAILED].
> even though the script reports it failed, the db is up an running.

This seems to happen for some people and not others.  I've been wanting
to find out how the heck it can take multiple seconds for the postmaster
to start and create its pid-file ... that shouldn't take long at all.
Are you willing to try strace'ing the postmaster?  Modify the script
like
$SU -l postgres -c "strace -tt -o /tmp/strace.out $PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >>
"$PGLOG"2>&1 < /dev/null                    ^^^^^^^^^^^^^^ add this ^^^^^^

and reboot.  (After you've gotten a trace of a failing case, change it
back and reboot again.)

This is kind of invasive and may change the behavior enough that we
don't see the problem :-( --- but if you're willing to reboot a few
times in hopes of capturing a trace of a failed case, it'd be worth
trying.
        regards, tom lane

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Tony Caduto

Date:

20 October 2005, 18:36:42

Tom Lane wrote:

>Tony Caduto <tony_caduto@amsoftwaredesign.com> writes:
>  
>
>>I tried changing the sleep command in the script to 2, but at boot it 
>>still says [FAILED].
>>even though the script reports it failed, the db is up an running.
>>    
>>
>
>This seems to happen for some people and not others.  I've been wanting
>to find out how the heck it can take multiple seconds for the postmaster
>to start and create its pid-file ... that shouldn't take long at all.
>Are you willing to try strace'ing the postmaster?  Modify the script
>like
>
>    $SU -l postgres -c "strace -tt -o /tmp/strace.out $PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >>
"$PGLOG"2>&1 < /dev/null
 
>                        ^^^^^^^^^^^^^^ add this ^^^^^^
>
>and reboot.  (After you've gotten a trace of a failing case, change it
>back and reboot again.)
>
>This is kind of invasive and may change the behavior enough that we
>don't see the problem :-( --- but if you're willing to reboot a few
>times in hopes of capturing a trace of a failed case, it'd be worth
>trying.
>
>            regards, tom lane
>
>  
>
Hi Tom,
I added the strace line like you said and rebooted, it did display the 
[FAILED] after the reboot.
I put the resulting strace.out file on my web server, here is the 
link(warning it's petty big):
http://www.amsoftwaredesign.com/downloads/strace.out

After the second reboot I changed the sleep from 2 to 5 and then it 
worked correctly, of course this really slowed the boot process.

Thanks,

Tony

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Tom Lane

Date:

20 October 2005, 19:06:33

Tony Caduto <tony_caduto@amsoftwaredesign.com> writes:
> Tom Lane wrote:
>> Are you willing to try strace'ing the postmaster?

> I added the strace line like you said and rebooted, it did display the 
> [FAILED] after the reboot.

Thanks for collecting the raw data.  The salient events seem to be these:

12:57:52.400888 exec() call
12:57:52.619268 completion(?) of opening shared libraries
12:57:52.657465 first call coming from our own code instead of libraries
12:57:52.902476 begin reading postgresql.conf
12:57:52.915949 done reading postgresql.conf
12:57:52.916191 begin trying to identify system timezone
12:58:01.117869 done identifying system timezone
12:58:01.131798 postmaster.pid created

In short: pg_timezone_initialize() took about 8.2 seconds out of the
total time of 8.73 seconds.

Since pg_timezone_initialize() needs to scan all of the 500-odd files
under postgresql/share/timezone/, it isn't so surprising that it would
take a little bit of time.  But 8 seconds seems like a lot.  The trace
makes it look like localtime() performs stat("/etc/localtime") on each
call, which is pretty ugly --- I wonder if there isn't some way around
that?

Anyway, the short answer is that pg_timezone_initialize ought to wait
till after we've created postmaster.pid.  There's no urgent reason to
do it earlier AFAICS.  This also explains why we didn't see a startup
problem in earlier releases --- pg_timezone_initialize didn't exist
before 8.0.
        regards, tom lane

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Andrew Dunstan

Date:

20 October 2005, 19:40:45

Tom Lane wrote:

>
>In short: pg_timezone_initialize() took about 8.2 seconds out of the
>total time of 8.73 seconds.
>
>Since pg_timezone_initialize() needs to scan all of the 500-odd files
>under postgresql/share/timezone/, it isn't so surprising that it would
>take a little bit of time.  But 8 seconds seems like a lot.  The trace
>makes it look like localtime() performs stat("/etc/localtime") on each
>call, which is pretty ugly --- I wonder if there isn't some way around
>that?
>
>
>  
>

Further data points:

I just observed this taking over 20 seconds on my clunky old pII 266. 
That's really horrible. But  pg_ctl -w start was able to complete in 
about 2 seconds.

Even on my much faster laptop the timezone lib startup took 3 or 4 
seconds (and pg_ctl -w start came back in about 1 second).

cheers

andrew

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Tom Lane

Date:

20 October 2005, 19:57:28

Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> In short: pg_timezone_initialize() took about 8.2 seconds out of the
>> total time of 8.73 seconds.

> Further data points:

> I just observed this taking over 20 seconds on my clunky old pII 266. 
> That's really horrible. But  pg_ctl -w start was able to complete in 
> about 2 seconds.

Yeah.  I've been experimenting here, and it's clear that strace itself
adds huge overhead --- on my machine, postmaster start is normally well
under a second, but strace'ing it brings it to about 8 seconds.  No
doubt that's because of all the stat("/etc/localtime") calls it has to
trace.

So there's some Heisenberg effect here.  However, I don't think there
can be much doubt that on a machine that is just booting (and has
surely got none of these files in cache) the search through
share/postgresql/timezone could take a few seconds.  Hindsight is
always 20/20 ;-)
        regards, tom lane

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Andrew Dunstan

Date:

20 October 2005, 20:50:46


Tom Lane wrote:

>So there's some Heisenberg effect here.  However, I don't think there
>can be much doubt that on a machine that is just booting (and has
>surely got none of these files in cache) the search through
>share/postgresql/timezone could take a few seconds.  Hindsight is
>always 20/20 ;-)
>  
>

Something is surely wrong in the timezone lib, though:

[andrew@alphonso inst]$ grep /etc/localtime strace.out | wc -l
38073


cheers

andrew

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Tom Lane

Date:

20 October 2005, 21:34:53

Andrew Dunstan <andrew@dunslane.net> writes:
> Something is surely wrong in the timezone lib, though:

[ digs in glibc sources for awhile... ]

The test loop in score_timezone() calls both localtime() and strftime()
for each probe point, and in glibc strftime() calls tzset(), which the
source code claims is required by POSIX.  The explicit tzset() call is
what's forcing the recheck of /etc/localtime.

Possibly the glibc boys would listen to a suggestion that strftime()
need not force the file recheck, but my experience with them is that
they're relatively impervious to suggestions :-(

I'm not actually particularly worried about the startup time.  What's
bothering me right at the moment, given the new-found knowledge that
strftime() is slow on Linux, is that we're using it in elog().  At the
time that code was written, we did it deliberately to ensure that all
the backends would write log timestamps in the same timezone regardless
of local SET TimeZone commands.  That's still an important
consideration, but I wonder whether we don't now have enough timezone
infrastructure that we could get the same results using pg_strftime.
        regards, tom lane

Re: 8.04 and RedHat/CentOS init script issue and sleep

From

Tom Lane

Date:

20 October 2005, 22:57:33

I wrote:
> Possibly the glibc boys would listen to a suggestion that strftime()
> need not force the file recheck, but my experience with them is that
> they're relatively impervious to suggestions :-(

I've filed a bug for this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171351
so no need for everyone else to do it too ...

> I'm not actually particularly worried about the startup time.  What's
> bothering me right at the moment, given the new-found knowledge that
> strftime() is slow on Linux, is that we're using it in elog().  At the
> time that code was written, we did it deliberately to ensure that all
> the backends would write log timestamps in the same timezone regardless
> of local SET TimeZone commands.  That's still an important
> consideration, but I wonder whether we don't now have enough timezone
> infrastructure that we could get the same results using pg_strftime.

If glibc fixes the problem upstream then we can leave well enough alone,
but if they indicate they won't then we should think about doing this
someday.  The major problem with it probably is "what do you do when
messages need to be emitted before pgtz has been initialized?"
        regards, tom lane