Re: [HACKERS] 答复:[HACKERS] 答复:[HACKERS] about fsync in CLOG buffer write - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] 答复:[HACKERS] 答复:[HACKERS] about fsync in CLOG buffer write |
Date | |
Msg-id | CA+TgmoZYEe3apby_nYMe1eTBY12VVud7ZA-ApJodksUpKeZ5Kw@mail.gmail.com Whole thread Raw |
In response to | Re: 答复:[HACKERS] 答复:[HACKERS] about fsync inCLOG buffer write (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Re: [HACKERS] 答复:[HACKERS] 答复:[HACKERS] about fsync in CLOG buffer write
|
List | pgsql-hackers |
On Tue, Sep 8, 2015 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-09-08 15:58:26 +0800, 周正中(德歌) wrote: >> postgres@digoal-> cat 7.sql >> select txid_current(); >> >> postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./7.sql -c 1 -j 1 -T 100000 >> About 32K tps. >> progress: 240.0 s, 31164.4 tps, lat 0.031 ms stddev 0.183 >> progress: 241.0 s, 33243.3 tps, lat 0.029 ms stddev 0.127 > > So you're benchmarking how fast you can assign txids. Is that actually > something meaningful? If you have other writes interleaved you'll write > much more WAL, so there'll be checkpoints and such. > > FWIW, if you measure something realistic and there's checkpoints, > there'll be fewer fsyncs if you increase the slru buffer size - as > there'll often be clean buffers due to the checkpoint having written > them out. But I think it's not very hard to come up with a workload where 32 clog buffers isn't enough, and you end up waiting for backends to fsync(). Suppose you have a pgbench workload with N tuples. We update tuples at random, so sometimes we hit one that's just recently been updated, and other times we hit one that hasn't been updated for many transactions. At 32k transactions per page, each transaction's chance of updating a tuple whose existing xmin is on the most recent clog page is 32k/N. The chance of hitting a tuple whose existing xmin is on the next page back is (1 - 32k/N) * 32k/N. The chance of hitting a page whose current xmin is at least X pages prior to the most recent one is (1 - 32k/N)^X. So, how many tuples do we need in order for each update to have a 50% chance of hitting a tuple that is at least 32 pages back? (1 - 32k/N)^32 = .5 1 - 32k/N = 0.9785720620877 32k/N = 0.0214279379122999 N = 32k/0.0214279379122999 N = 1529218.54329206 ...or in other words, scale factor 16. At scale factors >= 1044, the chance that the next update hits an xmin more than 32 clog buffers back is > 99%. So any test of this sort causes extremely rapid clog page eviction - basically every transaction is going to request a buffer that's probably not cached, and as soon as it's done with it, some other transaction will evict it to bring in a different buffer that's not cached. How often such a workload actually has to replace a *dirty* clog buffer obviously depends on how often you checkpoint, but if you're getting ~28k TPS you can completely fill 32 clog buffers (1 million transactions) in less than 40 seconds, and you're probably not checkpointing nearly that often. I'm not entirely sure how much fsync absorption for SLRU buffers will help, but I think it would be worth trying. Back when I was spending more time on this area, I saw some evidence that those fsyncs did cause at least some latency spikes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: