Re: Oh, this is embarrassing: init file logic is still broken - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Oh, this is embarrassing: init file logic is still broken |
Date | |
Msg-id | 558B26B0.1070704@agliodbs.com Whole thread Raw |
In response to | Oh, this is embarrassing: init file logic is still broken (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Oh, this is embarrassing: init file logic is still
broken
(Tatsuo Ishii <ishii@postgresql.org>)
Re: Oh, this is embarrassing: init file logic is still broken (Peter Geoghegan <pg@heroku.com>) Re: Oh, this is embarrassing: init file logic is still broken (Tatsuo Ishii <ishii@postgresql.org>) |
List | pgsql-hackers |
On 06/23/2015 04:44 PM, Tom Lane wrote: > Chasing a problem identified by my Salesforce colleagues led me to the > conclusion that my commit f3b5565dd ("Use a safer method for determining > whether relcache init file is stale") is rather borked. It causes > pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file, > because that index is not used by any syscache. I had been aware of that > actually, but considered it a minor issue. It's not so minor though, > because RelationCacheInitializePhase3 marks that index as nailed for > performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES. > That means that load_relcache_init_file *always* decides that the init > file is busted and silently(!) ignores it. So we're taking a nontrivial > hit in backend startup speed as of the last set of minor releases. OK, this is pretty bad in its real performance effects. On a workload which is dominated by new connection creation, we've lost about 17% throughput. To test it, I ran pgbench -s 100 -j 2 -c 6 -r -C -S -T 1200 against a database which fits in shared_buffers on two different m3.large instances on AWS (across the network, not on unix sockets). A typical run on 9.3.6 looks like this: scaling factor: 100 query mode: simple number of clients: 6 number of threads: 2 duration: 1200 s number of transactions actually processed: 252322 tps = 210.267219 (including connections establishing) tps = 31958.233736 (excluding connections establishing) statement latencies in milliseconds: 0.002515 \set naccounts 100000 * :scale 0.000963 \setrandomaid 1 :naccounts 19.042859 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; Whereas a typical run on 9.3.9 looks like this: scaling factor: 100 query mode: simple number of clients: 6 number of threads: 2 duration: 1200 s number of transactions actually processed: 208180 tps = 173.482259 (including connections establishing) tps = 31092.866153 (excluding connections establishing) statement latencies in milliseconds: 0.002518 \set naccounts 100000 * :scale 0.000988 \setrandomaid 1 :naccounts 23.076961 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; Numbers are pretty consistent on four runs each on two different instances (+/- 4%), so I don't think this is Amazon variability we're seeing. I think the syscache invalidation is really costing us 17%. :-( -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: