synchronous commit vs. hint bits - Mailing list pgsql-hackers

From Robert Haas
Subject synchronous commit vs. hint bits
Date
Msg-id CA+TgmoaCr3kDPafK5ygYDA9mF9zhObGp_13q0XwkEWsScw6h=w@mail.gmail.com
Whole thread Raw
Responses Re: synchronous commit vs. hint bits
Re: synchronous commit vs. hint bits
Re: synchronous commit vs. hint bits
List pgsql-hackers
I've long considered synchronous_commit=off to be one of our best
performance features.  Certainly, it's not applicable in every
situation, but there are many applications where losing a second or so
worth of transactions is an acceptable price to pay for not needing to
wait for the disk to spin around for every commit.  However, recent
experimentation has convinced me that it's got a serious downside:
SetHintBits() can't set HEAP_XMIN_COMMITTED or HEAP_XMAX_COMMITTED
hints until the commit record has been durably flushed to disk.  It
turns out that can cause a major performance regression on systems
with many CPU cores.  I fixed this for temporary and unlogged tables
in commit 53f1ca59b5875f1d3e95ee709ecaddcbdfdbd175, but the same issue
exists (without any clear fix) for permanent tables.

Here are some benchmark results on Nate Boley's 32-core AMD system.
These are pgbench -T 300 -c 32 -j 32 runs with scale factor 100,
shared_buffers = 8GB, maintenance_work_mem = 1GB, synchronous_commit =
off, checkpoint_segments = 300, checkpoint_timeout = 15min,
checkpoint_completion_target = 0.9:

tps = 8360.657049 (including connections establishing)
tps = 7818.766335 (including connections establishing)
tps = 8344.653290 (including connections establishing)

And here are the same results after lobotomizing SetHintBits() to
always sent the hint bits immediately (#if 0 around the
TransactionIdIsValid(xid) test):

tps = 9548.943930 (including connections establishing)
tps = 9579.485767 (including connections establishing)
tps = 9590.350954 (including connections establishing)

That's pretty significant - about a 15% improvement.  That's quite
remarkable when you think about the fact that we're talking about
refraining from setting hint bits for just a fraction of a second.
The failure to sent those hint bits even for that very brief period of
time has to cause enough additional work (or lock contention) to
degrade performance quite noticeably.

So, what could we do about this?  Ideas:

1. Set the hint bits right away, and avoid letting the page be flushed
to disk until the commit record is durably on disk (by bumping the
page LSN?).
2. Improve CLOG concurrency or performance in some way so that
consulting it repeatedly doesn't slow us down so much.
3. Do more backend-private XID status caching - in particular, for
commits, since this isn't a problem for aborts.
4. (Crazy idea) Have something that's like a hint bit, but stored in
the buffer header rather than the data block itself.  We allocate an
array large enough to hold 2 bits per tuple (for the maximum number of
tuples that can exist on a page), with one bit indicating that xmin is
async-committed and the other indicating that xmax is async-committed.

There are probably other options as well.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Next
From: "Kevin Grittner"
Date:
Subject: Re: git trunk doesn't build