Early hint bit setting - Mailing list pgsql-hackers

From Ants Aasma
Subject Early hint bit setting
Date
Msg-id CA+CSw_twkPMHv3KoC3Kc_1e+Wt7Vcdix8bBDUnyMft+QxDPimw@mail.gmail.com
Whole thread Raw
Responses Re: Early hint bit setting
Re: Early hint bit setting
List pgsql-hackers
I was thinking about what is the earliest time where we could set hint
bits. This would be just after the commit has been made visible. When
the transaction completes and commit confirmation is sent to the
client the backend will usually go to sleep waiting on the network
socket waiting for further commands. Because most clients wait for the
commit confirmation before proceeding this means that we have atleast
one network RTT before this backend is expected to respond again.

The idea is to keep a small backend local ring buffer of pages that
have been modified. When a transaction has just committed, we do a
non-blocking read on the socket. When nothing is available we take the
opportunity to go and set hint bits in the recently modified buffers.

Hurting latency for single-threaded workloads using lots of
transactions is bad. It follows that it would be a bad idea to do
anything that could take a long time while waiting for the next
command. Because early hinting is a performance optimisation we can
safely skip it if it becomes bothersome. Anything that causes IO can
take too long. So we only set the hint bits when the page is still in
shared buffers to avoid reading in the page. Furthermore, we only hint
the tuples that the recently completed transaction modified to avoid
IO from CLOG (we could hint other tuples if their xid happens to be in
the SLRU, but it probably won't be very useful).

Hint bits are set sooner or later. Setting them earlier is a
throughput win for any workload because we avoid generating extra
load. We avoid doing any IO and we might save some so for IO this is a
pure win. The hinting CPU work needs to be done sooner or later, so
that's a tie, except for extremely bursty write heavy loads with lots
of transactions. Memory loads could in principle hurt other backends.
Refilling the whole last level cache of modern processors takes a few
hundred microseconds at peak speed. If the WAL is on fast storage
(BBWC, SSD) there's a pretty good chance that the page being hinted is
still in the cpu cache, avoiding the memory bandwidth overhead.

Abstraction wise, I think we need to set up a mechanism to run very
short maintenance jobs from backends waiting for new commands.
SocketBackend could check if there's anything to do, and call
pq_getbyte_if_available if there is anything to do before proceeding
to do it.

Setting hint bits early would help workloads with small synchronously
writing transactions. Async commits could also benefit from proactive
hint bit setting, but this would require some global cooperation and
isn't as clear of a win. One idea would be to copy the local ring
buffer entries to a global one tagged with the LSN when the
transaction has been made visible. When someone flushes xlog, they
also check if it enables some background hinting and set the
corresponding flag for any backend with spare cycles to pick up.

Comments?

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: Fake async rep target
Next
From: Bruce Momjian
Date:
Subject: Re: Figuring out shared buffer pressure