Thread: Time to run initdb is mostly figure-out-the-timezone work

Time to run initdb is mostly figure-out-the-timezone work

From
Tom Lane
Date:
On current Fedora 11, there is a huge difference in initdb time if you
have TZ set versus if you don't: I get about 18 seconds versus less than
four.

$ time initdb
... blah blah blah ...

real    0m17.953s
user    0m6.490s
sys     0m10.935s
$ rm -rf $PGDATA
$ export TZ=GMT
$ time initdb
... blah blah blah ...

real    0m3.767s
user    0m2.997s
sys     0m0.784s
$ 

The reason for this is that initdb launches the postmaster many times
(at least 14) and each one of those launches results in a search of
every file in the timezone database, if we don't have a TZ value to
let us identify the timezone immediately.

Now this hardly matters to end users who seldom do initdb, but from a
developer's perspective it would be awfully nice if initdb took less
time.  If other people can reproduce similar behavior, I think it
would be worth the trouble to have initdb forcibly set the TZ or PGTZ
variable while it runs.  AFAIK it does not matter what timezone
environment postgres sees during initdb; we don't put that into the
config file.  It'd be about a one-line addition ...

Comments?
        regards, tom lane


Re: Time to run initdb is mostly figure-out-the-timezone work

From
Guillaume Lelarge
Date:
Le 18/12/2009 18:07, Tom Lane a écrit :
> On current Fedora 11, there is a huge difference in initdb time if you
> have TZ set versus if you don't: I get about 18 seconds versus less than
> four.
> 
> $ time initdb
> ... blah blah blah ...
> 
> real    0m17.953s
> user    0m6.490s
> sys     0m10.935s
> $ rm -rf $PGDATA
> $ export TZ=GMT
> $ time initdb
> ... blah blah blah ...
> 
> real    0m3.767s
> user    0m2.997s
> sys     0m0.784s
> $ 
> 
> The reason for this is that initdb launches the postmaster many times
> (at least 14) and each one of those launches results in a search of
> every file in the timezone database, if we don't have a TZ value to
> let us identify the timezone immediately.
> 
> Now this hardly matters to end users who seldom do initdb, but from a
> developer's perspective it would be awfully nice if initdb took less
> time.  If other people can reproduce similar behavior, I think it
> would be worth the trouble to have initdb forcibly set the TZ or PGTZ
> variable while it runs.

I have the exact same issue:

guillaume@laptop:~$ time initdb
Les fichiers de ce cluster appartiendront à l'utilisateur « guillaume ».
[...]
real    0m7.972s
user    0m3.588s
sys    0m3.444s
guillaume@laptop:~$ export TZ=GMT
guillaume@laptop:~$ rm -rf t1
guillaume@laptop:~$ time initdb
[...]
real    0m1.828s
user    0m1.436s
sys    0m0.368s


This is on Ubuntu 9.10.

Quite impressive. I think I'll add an alias (alias initdb="TZ=GMT initdb").


-- 
Guillaume.http://www.postgresqlfr.orghttp://dalibo.com


Re: Time to run initdb is mostly figure-out-the-timezone work

From
Joshua Tolley
Date:
On Fri, Dec 18, 2009 at 06:20:39PM +0100, Guillaume Lelarge wrote:
> Le 18/12/2009 18:07, Tom Lane a écrit :
> > On current Fedora 11, there is a huge difference in initdb time if you
> > have TZ set versus if you don't: I get about 18 seconds versus less than
> > four.
> I have the exact same issue:

For whatever it's worth, I get it too, on Ubuntu 9.04... ~4s without TZ vs.
~1.8s with TZ.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: Time to run initdb is mostly figure-out-the-timezone work

From
Alvaro Herrera
Date:
Tom Lane wrote:
> On current Fedora 11, there is a huge difference in initdb time if you
> have TZ set versus if you don't: I get about 18 seconds versus less than
> four.

Wow, I can reproduce this (11-12 secs when no TZ versus 5 when TZ is
defined).  I'd never noticed because I normally have TZ set; but yes I
agree that this is worthwhile.

I notice that most of the difference is system time ... I imagine we do
a lot of syscalls to guess the timezone.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Time to run initdb is mostly figure-out-the-timezone work

From
Tom Lane
Date:
Joshua Tolley <eggyknap@gmail.com> writes:
> On Fri, Dec 18, 2009 at 06:20:39PM +0100, Guillaume Lelarge wrote:
>> Le 18/12/2009 18:07, Tom Lane a �crit :
>>> On current Fedora 11, there is a huge difference in initdb time if you
>>> have TZ set versus if you don't: I get about 18 seconds versus less than
>>> four.

>> I have the exact same issue:

> For whatever it's worth, I get it too, on Ubuntu 9.04... ~4s without TZ vs.
> ~1.8s with TZ.

BTW, I just realized that it makes a difference that I customarily use
the configure option --with-system-tzdata=/usr/share/zoneinfo on that
machine.  I do it mainly because it saves a few seconds during "make
install", but also because Red Hat's PG packages use that option so I
want to test it regularly.  The impact of this is that the TZ search
also has to scan through a bunch of leap-second-aware timezone files,
which are not present in a default PG build's timezone tree.  So that
probably explains why I see a 4x slowdown while you get more like 2x.
Still, it seems worth doing something about, if it's as easy as a
one-line addition.
        regards, tom lane


Re: Time to run initdb is mostly figure-out-the-timezone work

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I notice that most of the difference is system time ... I imagine we do
> a lot of syscalls to guess the timezone.

Yeah, it seems to be mostly the cost of searching the timezone directory
tree and reading all those small files.  I was led to notice this
because Red Hat's latest devel kernels seem to have a bit of a
performance regression in this area:
https://bugzilla.redhat.com/show_bug.cgi?id=548403
Obviously there's something there for the kernel guys to fix, but
even with a non-borked kernel it's an expensive thing to do.
        regards, tom lane


Re: Time to run initdb is mostly figure-out-the-timezone work

From
Alex Hunsaker
Date:
On Fri, Dec 18, 2009 at 10:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Obviously there's something there for the kernel guys to fix, but
> even with a non-borked kernel it's an expensive thing to do.

Any thoughts on back patching this? While its not a bug per-say, it
seems reasonably low-risk.  I for one would love a 2-4x initdb speedup
in the back branches :)  Granted now I know I can just set TZ...