Hi,
On 2022-02-19 20:46:26 -0500, Tom Lane wrote:
> I tried it like that (full patch attached) and the results are intensely
> disappointing. On my Mac laptop, the time needed for 50 iterations of
> initdb drops from 16.8 sec to 16.75 sec.
Hm. I'd hoped for at least a little bit bigger win. But I think it enables
more, see below:
> Not sure that this is worth pursuing any further.
I experimented with moving all the bootstrapping into --boot mode and got it
working. Albeit definitely with a few hacks (more below).
While I had hoped for a bit more of a win, it's IMO a nice improvement.
Executing 10 initdb -N --wal-segsize 1 in a loop:
HEAD:
assert:
8.06user 1.17system 0:09.25elapsed 99%CPU (0avgtext+0avgdata 91724maxresident)k
0inputs+549280outputs (40major+99824minor)pagefaults 0swaps
opt:
2.89user 0.99system 0:04.81elapsed 80%CPU (0avgtext+0avgdata 88864maxresident)k
0inputs+549280outputs (40major+99792minor)pagefaults 0swaps
default to lz4:
assert:
7.61user 1.03system 0:08.69elapsed 99%CPU (0avgtext+0avgdata 91508maxresident)k
0inputs+546400outputs (42major+99551minor)pagefaults 0swaps
opt:
2.55user 0.94system 0:03.49elapsed 99%CPU (0avgtext+0avgdata 88816maxresident)k
0inputs+546400outputs (40major+99551minor)pagefaults 0swaps
bootstrap replace:
assert:
7.42user 1.00system 0:08.52elapsed 98%CPU (0avgtext+0avgdata 91656maxresident)k
0inputs+546400outputs (40major+97737minor)pagefaults 0swaps
opt:
2.49user 0.98system 0:03.49elapsed 99%CPU (0avgtext+0avgdata 88700maxresident)k
0inputs+546400outputs (40major+97728minor)pagefaults 0swaps
everything in bootstrap:
assert:
6.31user 0.94system 0:07.35elapsed 98%CPU (0avgtext+0avgdata 97812maxresident)k
0inputs+547360outputs (30major+88617minor)pagefaults 0swaps
opt:
2.42user 0.85system 0:03.28elapsed 99%CPU (0avgtext+0avgdata 94572maxresident)k
0inputs+547360outputs (30major+83712minor)pagefaults 0swaps
optimize WAL in bootstrap:
assert:
6.26user 0.96system 0:07.29elapsed 99%CPU (0avgtext+0avgdata 97844maxresident)k
0inputs+547360outputs (30major+88586minor)pagefaults 0swaps
opt:
2.43user 0.80system 0:03.24elapsed 99%CPU (0avgtext+0avgdata 94436maxresident)k
0inputs+547360outputs (30major+83664minor)pagefaults 0swaps
remote isatty in bootstrap:
assert:
6.15user 0.83system 0:06.99elapsed 99%CPU (0avgtext+0avgdata 97832maxresident)k
0inputs+465120outputs (30major+88559minor)pagefaults 0swaps
opt:
2.28user 0.85system 0:03.14elapsed 99%CPU (0avgtext+0avgdata 94604maxresident)k
0inputs+465120outputs (30major+83728minor)pagefaults 0swaps
That's IMO not bad.
On windows I see a higher gains, which makes sense, because filesystem IO is
slower. Freebsd as well, but the variance is oddly high, so I might be doing
something wrong.
The main reason I like this however isn't the speedup itself, but that after
this initdb doesn't depend on single user mode at all anymore.
About the prototype:
- Most of the bootstrap SQL is executed from bootstrap.c itself. But some
still comes from the client. E.g. password, a few information_schema
details and the database / authid changes.
- To execute the sql I mostly used extension.c's
read_whole_file()/execute_sql_string(). But VACUUM, CREATE DATABASE require
all the transactional hacks in portal.c etc. So I wrapped
exec_simple_query() for that phase.
Might be better to just call vacuum.c / database.c directly.
- for indexed relcache access to work the phase of
RelationCacheInitializePhase3() that's initially skipped needs to be
executed. I hacked that up by adding a RelationCacheInitializePhase3b() that
bootstrap.c can call, but that's obviously too ugly to live.
- InvalidateSystemCaches() is needed after bki processing. Otherwise I see an
"row is too big:" error. Didn't investigate yet.
- I definitely removed some validation that we'd probably want. But that seems
something to care about later...
- 0004 prevents a fair bit of WAL from being written. While XLogInsert did
some of that, it didn't block FPIs, which obviously are bulky. This reduces
WAL from ~5MB to ~100kB.
There's quite a bit of further speedup potential:
- One bottleneck, particularly in optimized mode, is the handling of huge node
trees for views. strToNode() and nodeRead() are > 10% alone
- Enabling index access sometime during the postgres.bki processing would make
invalidation handling for subsequent indexes faster. Or maybe we can disable
a few more invalidations. Inval processing is >10%
- more than 10% (assert) / 7% (optimized) is spent in
compute_scalar_stats()->qsort_arg(). Something seems off with that to me.
Completely crazy?
Greetings,
Andres Freund