Re: Index tuple killing code committed - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Index tuple killing code committed |
Date | |
Msg-id | 21703.1062544602@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Index tuple killing code committed (Tatsuo Ishii <t-ishii@sra.co.jp>) |
List | pgsql-hackers |
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > I found following mail in my mail archive and tried the same test with > 7.4 current. Contrary to my expectation 7.4 showed some performance > degration with continuous pgbench runs: I did some profiling and found that with repeated runs, an increasing amount of time was being spent in _bt_check_unique, verifying that updates aren't inserting duplicate keys. This is not real surprising when running without vacuums between runs --- the number of dead tuples will increase from run to run. I am not sure why the effect was not visible in my old post that you quote. Perhaps it was masked by some other inefficiency and is now visible in 7.4. I was able to reduce the cost somewhat by changing the order of tests in _bt_check_unique. With CVS tip, using default postmaster settings (except fsync off) and "pgbench -i -s 10 bench" followed by a manual CHECKPOINT command, I get this for repeated runs ofPGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench tps = 113.387182 (including connections establishing) tps = 113.433897 (excluding connections establishing) 0.95user 0.86system 0:44.51elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k tps = 72.989851 (including connections establishing) tps = 73.010209 (excluding connections establishing) 0.86user 0.89system 1:09.17elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 88.848571 (including connections establishing) tps = 88.878369 (excluding connections establishing) 0.82user 0.92system 0:57.19elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 89.156320 (including connections establishing) tps = 89.186299 (excluding connections establishing) 0.88user 0.92system 0:57.08elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 92.708733 (including connections establishing) tps = 92.742335 (excluding connections establishing) 0.85user 0.83system 0:54.76elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 93.533412 (including connections establishing) tps = 93.566385 (excluding connections establishing) 0.92user 0.87system 0:54.33elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 94.987538 (including connections establishing) tps = 95.021715 (excluding connections establishing) 0.91user 0.88system 0:53.44elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 98.119179 (including connections establishing) tps = 98.157492 (excluding connections establishing) 0.93user 0.94system 0:51.58elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 108.404186 (including connections establishing) tps = 108.448242 (excluding connections establishing) 0.85user 0.90system 0:47.30elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 102.725845 (including connections establishing) tps = 102.767392 (excluding connections establishing) 0.91user 0.90system 0:49.93elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k Not sure why the apparent up-trend in performance after the second run. Maybe we've done about as much btree index splitting as we need, after a few runs? Odd. Same conditions, but with a vacuum between runs, ie no "-n": tps = 112.266969 (including connections establishing) tps = 112.314017 (excluding connections establishing) 0.91user 0.89system 0:44.62elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k tps = 77.936183 (including connections establishing) tps = 77.959810 (excluding connections establishing) 0.90user 0.92system 1:07.20elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 81.669010 (including connections establishing) tps = 81.694586 (excluding connections establishing) 0.88user 0.87system 1:02.23elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 86.493578 (including connections establishing) tps = 86.522430 (excluding connections establishing) 0.87user 0.83system 0:59.61elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 100.759980 (including connections establishing) tps = 100.797922 (excluding connections establishing) 0.90user 0.89system 0:50.10elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 77.215387 (including connections establishing) tps = 77.238131 (excluding connections establishing) 0.95user 0.88system 1:06.35elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 83.298182 (including connections establishing) tps = 83.324866 (excluding connections establishing) 0.88user 0.89system 1:01.23elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 90.534451 (including connections establishing) tps = 90.566022 (excluding connections establishing) 0.95user 0.86system 0:56.40elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 84.085577 (including connections establishing) tps = 84.113060 (excluding connections establishing) 0.93user 0.89system 1:05.35elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 95.637302 (including connections establishing) tps = 95.672293 (excluding connections establishing) 0.92user 0.85system 0:52.80elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k Same with vacuum and checkpoint between runs: tps = 107.715226 (including connections establishing) tps = 107.758258 (excluding connections establishing) 0.86user 0.88system 0:46.51elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 78.816549 (including connections establishing) tps = 78.839575 (excluding connections establishing) 0.87user 0.87system 1:03.68elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 90.150489 (including connections establishing) tps = 90.184264 (excluding connections establishing) 0.90user 0.89system 0:55.74elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 81.648951 (including connections establishing) tps = 81.674276 (excluding connections establishing) 0.93user 0.94system 1:01.54elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 86.850517 (including connections establishing) tps = 86.879127 (excluding connections establishing) 0.84user 0.88system 0:57.86elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 85.776988 (including connections establishing) tps = 85.809491 (excluding connections establishing) 0.83user 0.94system 0:58.53elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 80.293178 (including connections establishing) tps = 80.317318 (excluding connections establishing) 0.90user 0.86system 1:02.49elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 80.691960 (including connections establishing) tps = 80.716409 (excluding connections establishing) 0.84user 0.93system 1:02.19elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 85.067751 (including connections establishing) tps = 85.095110 (excluding connections establishing) 0.92user 0.91system 0:59.02elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 80.134553 (including connections establishing) tps = 80.158595 (excluding connections establishing) 0.87user 0.97system 1:02.61elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k It appears that much of the run-to-run variability is accounted for by background checkpoints; forcing a checkpoint between runs helps but doesn't eliminate the noise. (Probably if I'd changed the checkpoint frequency settings to prevent intra-run checkpoints from happening, I could get stable numbers. I'm out of time to play with it for now though.) regards, tom lane
pgsql-hackers by date: