Re: [HACKERS] Cutting initdb's runtime (Perl question embedded) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date
Msg-id 7408.1525812528@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)  (ilmari@ilmari.org (Dagfinn Ilmari Mannsåker))
Responses Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Resurrecting this old thread ...

I decided it'd be interesting to re-examine where initdb's runtime
is going, seeing that we just got done with a lot of bootstrap data
restructuring.  I stuck some timing code into initdb, and got
results like this:

creating directory /home/postgres/testversion/data ... ok
elapsed = 0.256 msec
creating subdirectories ... ok
elapsed = 2.385 msec
selecting default max_connections ... 100
elapsed = 13.528 msec
selecting default shared_buffers ... 128MB
elapsed = 13.699 msec
selecting dynamic shared memory implementation ... posix
elapsed = 0.129 msec
elapsed = 281.335 msec in select_default_timezone
creating configuration files ... ok
elapsed = 1.319 msec
running bootstrap script ... ok
elapsed = 162.143 msec
performing post-bootstrap initialization ... ok
elapsed = 832.569 msec

Sync to disk skipped.

real    0m1.316s
user    0m0.941s
sys     0m0.395s

(I'm using "initdb -N" because the cost of the sync step is so
platform-dependent, and it's not interesting anyway for buildfarm
or make check-world testing.  Also, I rearranged the code slightly
so that select_default_timezone could be timed separately from the
rest of the "creating configuration files" step.)

In trying to break down the "post-bootstrap initialization" step
a bit further, I soon realized that trying to time the sub-steps from
initdb is useless.  initdb is just shoving bytes down the pipe as
fast as the kernel will let it; it has no idea how long it's taking
the backend to do any one query or queries.  So I ended up adding
"-c log_statement=all -c log_min_duration_statement=0" to the
backend_options, and digging query durations out of the log output.
I got these totals for the major steps in the post-boot run:

pg_authid setup: 0.909 ms
pg_depend setup: 64.980 ms
system views: 106.221 ms
pg_description: 39.665 ms
pg_collation: 65.162 ms
conversions: 72.024 ms
text search: 29.454 ms
init-acl hacking: 14.339 ms
information schema: 188.497 ms
plpgsql: 2.531 ms
analyze/vacuum/additional db creation: 171.762 ms

So the conversions don't look nearly as interesting as Andreas
suggested upthread.  Pushing them into .bki format would at best
save ~ 70 ms out of 1300.  Which is not nothing, but it's not
going to change the world either.

Really the only thing here that jumps out as being unduly expensive for
what it's doing is select_default_timezone.  That is, and always has been,
a brute-force algorithm; I wonder if there's a way to do better?  We can
probably guess that every non-Windows platform is using the IANA timezone
data these days.  If there were some way to extract the name of the active
timezone setting directly, we wouldn't have to try to reverse-engineer it.
But I don't know of any portable way :-(

            regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: perlcritic script
Next
From: Stas Kelvich
Date:
Subject: Re: Global snapshots