Re: [HACKERS] 答复:[HACKERS] 答复:[HACKERS] about fsync in CLOG buffer write - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] 答复:[HACKERS] 答复:[HACKERS] about fsync in CLOG buffer write
Date
Msg-id CA+TgmoZYEe3apby_nYMe1eTBY12VVud7ZA-ApJodksUpKeZ5Kw@mail.gmail.com
Whole thread Raw
In response to Re: 答复:[HACKERS] 答复:[HACKERS] about fsync inCLOG buffer write  (Andres Freund <andres@anarazel.de>)
Responses Re: Re: [HACKERS] 答复:[HACKERS] 答复:[HACKERS] about fsync in CLOG buffer write  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Sep 8, 2015 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-09-08 15:58:26 +0800, 周正中(德歌) wrote:
>> postgres@digoal-> cat 7.sql
>> select txid_current();
>>
>> postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./7.sql -c 1 -j 1 -T 100000
>> About 32K tps.
>> progress: 240.0 s, 31164.4 tps, lat 0.031 ms stddev 0.183
>> progress: 241.0 s, 33243.3 tps, lat 0.029 ms stddev 0.127
>
> So you're benchmarking how fast you can assign txids. Is that actually
> something meaningful? If you have other writes interleaved you'll write
> much more WAL, so there'll be checkpoints and such.
>
> FWIW, if you measure something realistic and there's checkpoints,
> there'll be fewer fsyncs if you increase the slru buffer size - as
> there'll often be clean buffers due to the checkpoint having written
> them out.

But I think it's not very hard to come up with a workload where 32
clog buffers isn't enough, and you end up waiting for backends to
fsync().  Suppose you have a pgbench workload with N tuples.  We
update tuples at random, so sometimes we hit one that's just recently
been updated, and other times we hit one that hasn't been updated for
many transactions.  At 32k transactions per page, each transaction's
chance of updating a tuple whose existing xmin is on the most recent
clog page is 32k/N.  The chance of hitting a tuple whose existing xmin
is on the next page back is (1 - 32k/N) * 32k/N.  The chance of
hitting a page whose current xmin is at least X pages prior to the
most recent one is (1 - 32k/N)^X.  So, how many tuples do we need in
order for each update to have a 50% chance of hitting a tuple that is
at least 32 pages back?

(1 - 32k/N)^32 = .5
1 - 32k/N = 0.9785720620877
32k/N = 0.0214279379122999
N = 32k/0.0214279379122999
N = 1529218.54329206

...or in other words, scale factor 16.  At scale factors >= 1044, the
chance that the next update hits an xmin more than 32 clog buffers
back is > 99%.  So any test of this sort causes extremely rapid clog
page eviction - basically every transaction is going to request a
buffer that's probably not cached, and as soon as it's done with it,
some other transaction will evict it to bring in a different buffer
that's not cached.

How often such a workload actually has to replace a *dirty* clog
buffer obviously depends on how often you checkpoint, but if you're
getting ~28k TPS you can completely fill 32 clog buffers (1 million
transactions) in less than 40 seconds, and you're probably not
checkpointing nearly that often.

I'm not entirely sure how much fsync absorption for SLRU buffers will
help, but I think it would be worth trying.  Back when I was spending
more time on this area, I saw some evidence that those fsyncs did
cause at least some latency spikes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [patch] Proposal for \rotate in psql
Next
From: Tom Lane
Date:
Subject: Re: Counting lines correctly in psql help displays