Thread: Time to run initdb is mostly figure-out-the-timezone work
On current Fedora 11, there is a huge difference in initdb time if you have TZ set versus if you don't: I get about 18 seconds versus less than four. $ time initdb ... blah blah blah ... real 0m17.953s user 0m6.490s sys 0m10.935s $ rm -rf $PGDATA $ export TZ=GMT $ time initdb ... blah blah blah ... real 0m3.767s user 0m2.997s sys 0m0.784s $ The reason for this is that initdb launches the postmaster many times (at least 14) and each one of those launches results in a search of every file in the timezone database, if we don't have a TZ value to let us identify the timezone immediately. Now this hardly matters to end users who seldom do initdb, but from a developer's perspective it would be awfully nice if initdb took less time. If other people can reproduce similar behavior, I think it would be worth the trouble to have initdb forcibly set the TZ or PGTZ variable while it runs. AFAIK it does not matter what timezone environment postgres sees during initdb; we don't put that into the config file. It'd be about a one-line addition ... Comments? regards, tom lane
Le 18/12/2009 18:07, Tom Lane a écrit : > On current Fedora 11, there is a huge difference in initdb time if you > have TZ set versus if you don't: I get about 18 seconds versus less than > four. > > $ time initdb > ... blah blah blah ... > > real 0m17.953s > user 0m6.490s > sys 0m10.935s > $ rm -rf $PGDATA > $ export TZ=GMT > $ time initdb > ... blah blah blah ... > > real 0m3.767s > user 0m2.997s > sys 0m0.784s > $ > > The reason for this is that initdb launches the postmaster many times > (at least 14) and each one of those launches results in a search of > every file in the timezone database, if we don't have a TZ value to > let us identify the timezone immediately. > > Now this hardly matters to end users who seldom do initdb, but from a > developer's perspective it would be awfully nice if initdb took less > time. If other people can reproduce similar behavior, I think it > would be worth the trouble to have initdb forcibly set the TZ or PGTZ > variable while it runs. I have the exact same issue: guillaume@laptop:~$ time initdb Les fichiers de ce cluster appartiendront à l'utilisateur « guillaume ». [...] real 0m7.972s user 0m3.588s sys 0m3.444s guillaume@laptop:~$ export TZ=GMT guillaume@laptop:~$ rm -rf t1 guillaume@laptop:~$ time initdb [...] real 0m1.828s user 0m1.436s sys 0m0.368s This is on Ubuntu 9.10. Quite impressive. I think I'll add an alias (alias initdb="TZ=GMT initdb"). -- Guillaume.http://www.postgresqlfr.orghttp://dalibo.com
On Fri, Dec 18, 2009 at 06:20:39PM +0100, Guillaume Lelarge wrote: > Le 18/12/2009 18:07, Tom Lane a écrit : > > On current Fedora 11, there is a huge difference in initdb time if you > > have TZ set versus if you don't: I get about 18 seconds versus less than > > four. > I have the exact same issue: For whatever it's worth, I get it too, on Ubuntu 9.04... ~4s without TZ vs. ~1.8s with TZ. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
Tom Lane wrote: > On current Fedora 11, there is a huge difference in initdb time if you > have TZ set versus if you don't: I get about 18 seconds versus less than > four. Wow, I can reproduce this (11-12 secs when no TZ versus 5 when TZ is defined). I'd never noticed because I normally have TZ set; but yes I agree that this is worthwhile. I notice that most of the difference is system time ... I imagine we do a lot of syscalls to guess the timezone. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Joshua Tolley <eggyknap@gmail.com> writes: > On Fri, Dec 18, 2009 at 06:20:39PM +0100, Guillaume Lelarge wrote: >> Le 18/12/2009 18:07, Tom Lane a �crit : >>> On current Fedora 11, there is a huge difference in initdb time if you >>> have TZ set versus if you don't: I get about 18 seconds versus less than >>> four. >> I have the exact same issue: > For whatever it's worth, I get it too, on Ubuntu 9.04... ~4s without TZ vs. > ~1.8s with TZ. BTW, I just realized that it makes a difference that I customarily use the configure option --with-system-tzdata=/usr/share/zoneinfo on that machine. I do it mainly because it saves a few seconds during "make install", but also because Red Hat's PG packages use that option so I want to test it regularly. The impact of this is that the TZ search also has to scan through a bunch of leap-second-aware timezone files, which are not present in a default PG build's timezone tree. So that probably explains why I see a 4x slowdown while you get more like 2x. Still, it seems worth doing something about, if it's as easy as a one-line addition. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > I notice that most of the difference is system time ... I imagine we do > a lot of syscalls to guess the timezone. Yeah, it seems to be mostly the cost of searching the timezone directory tree and reading all those small files. I was led to notice this because Red Hat's latest devel kernels seem to have a bit of a performance regression in this area: https://bugzilla.redhat.com/show_bug.cgi?id=548403 Obviously there's something there for the kernel guys to fix, but even with a non-borked kernel it's an expensive thing to do. regards, tom lane
On Fri, Dec 18, 2009 at 10:57, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Obviously there's something there for the kernel guys to fix, but > even with a non-borked kernel it's an expensive thing to do. Any thoughts on back patching this? While its not a bug per-say, it seems reasonably low-risk. I for one would love a 2-4x initdb speedup in the back branches :) Granted now I know I can just set TZ...