synchronous commit vs. hint bits - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | synchronous commit vs. hint bits |
Date | |
Msg-id | CA+TgmoaCr3kDPafK5ygYDA9mF9zhObGp_13q0XwkEWsScw6h=w@mail.gmail.com Whole thread Raw |
Responses |
Re: synchronous commit vs. hint bits
Re: synchronous commit vs. hint bits Re: synchronous commit vs. hint bits |
List | pgsql-hackers |
I've long considered synchronous_commit=off to be one of our best performance features. Certainly, it's not applicable in every situation, but there are many applications where losing a second or so worth of transactions is an acceptable price to pay for not needing to wait for the disk to spin around for every commit. However, recent experimentation has convinced me that it's got a serious downside: SetHintBits() can't set HEAP_XMIN_COMMITTED or HEAP_XMAX_COMMITTED hints until the commit record has been durably flushed to disk. It turns out that can cause a major performance regression on systems with many CPU cores. I fixed this for temporary and unlogged tables in commit 53f1ca59b5875f1d3e95ee709ecaddcbdfdbd175, but the same issue exists (without any clear fix) for permanent tables. Here are some benchmark results on Nate Boley's 32-core AMD system. These are pgbench -T 300 -c 32 -j 32 runs with scale factor 100, shared_buffers = 8GB, maintenance_work_mem = 1GB, synchronous_commit = off, checkpoint_segments = 300, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9: tps = 8360.657049 (including connections establishing) tps = 7818.766335 (including connections establishing) tps = 8344.653290 (including connections establishing) And here are the same results after lobotomizing SetHintBits() to always sent the hint bits immediately (#if 0 around the TransactionIdIsValid(xid) test): tps = 9548.943930 (including connections establishing) tps = 9579.485767 (including connections establishing) tps = 9590.350954 (including connections establishing) That's pretty significant - about a 15% improvement. That's quite remarkable when you think about the fact that we're talking about refraining from setting hint bits for just a fraction of a second. The failure to sent those hint bits even for that very brief period of time has to cause enough additional work (or lock contention) to degrade performance quite noticeably. So, what could we do about this? Ideas: 1. Set the hint bits right away, and avoid letting the page be flushed to disk until the commit record is durably on disk (by bumping the page LSN?). 2. Improve CLOG concurrency or performance in some way so that consulting it repeatedly doesn't slow us down so much. 3. Do more backend-private XID status caching - in particular, for commits, since this isn't a problem for aborts. 4. (Crazy idea) Have something that's like a hint bit, but stored in the buffer header rather than the data block itself. We allocate an array large enough to hold 2 bits per tuple (for the maximum number of tuples that can exist on a page), with one bit indicating that xmin is async-committed and the other indicating that xmax is async-committed. There are probably other options as well. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: