Re: Group Commit - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Group Commit
Date
Msg-id 200705171721.l4HHLJY25455@momjian.us
Whole thread Raw
In response to Group Commit  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
This is not ready for 8.3.

This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> It's been known for years that commit_delay isn't very good at giving us 
> group commit behavior. I did some experiments with this simple test 
> case: "BEGIN; INSERT INTO test VALUES (1); COMMIT;", with different 
> numbers of concurrent clients and with and without commit_delay.
> 
> Summary for the impatient:
> 1. Current behavior sucks.
> 2. commit_delay doesn't help with # of clients < ~10. It does help with 
> higher numbers, but it still sucks.
> 3. I'm working on a patch.
> 
> 
> I added logging to show how many commit records are flushed on each 
> fsync. The output with otherwise unpatched PG head looks like this, with 
> 5 clients:
> 
> LOG:  Flushed 4 out of 5 commits
> LOG:  Flushed 1 out of 5 commits
> LOG:  Flushed 4 out of 5 commits
> LOG:  Flushed 1 out of 5 commits
> LOG:  Flushed 4 out of 5 commits
> LOG:  Flushed 1 out of 5 commits
> LOG:  Flushed 4 out of 5 commits
> LOG:  Flushed 1 out of 5 commits
> LOG:  Flushed 3 out of 5 commits
> LOG:  Flushed 2 out of 5 commits
> LOG:  Flushed 3 out of 5 commits
> LOG:  Flushed 2 out of 5 commits
> LOG:  Flushed 3 out of 5 commits
> LOG:  Flushed 2 out of 5 commits
> LOG:  Flushed 3 out of 5 commits
> ...
> 
> Here's what's happening:
> 
> 1. Client 1 issues fsync (A)
> 2. Clients 2-5 write their commit record, and try to fsync, but they 
> have to wait for fsync (A) to finish.
> 3. fsync (A) finishes, freeing client 1.
> 4. One of clients 2-5 starts the next fsync (B), which will flush 
> commits of clients 2-5 to disk
> 5. Client 1 begins new transaction, inserts commit record and tries to 
> fsync. Needs to wait for previous fsync (B) to finish
> 6. fsync B finishes, freeing clients 2-5
> 7. Client 1 issues fsync (C)
> 8. ...
> 
> The 2-3-2-3 pattern can be explained with similar unfortunate 
> "resonance", but with two clients instead of client 1 in the above 
> possibly running in separate cores (test was run on a dual-core laptop).
> 
> I also draw a diagram illustrating the above, attached.
> 
> I wrote a quick & dirty patch for this that I'm going to refine further, 
> but wanted to get the results out for others to look at first. I'm not 
> posting the patch yet, but it basically adds some synchronization to the 
> WAL flushes. It introduces a counter of inserted but not yet flushed 
> commit records. Instead of the commit_delay, the counter is checked. If 
> it's smaller than NBackends, the process waits until count reaches 
> NBackends, or a timeout expires. There's two significant differences to 
> commit_delay here:
> 1. Instead of waiting for commit_delay to expire, processes are woken 
> and fsync is started immediately when we know there's no more commit 
> records coming that we should wait for. Even though commit_delay is 
> given in microseconds, the real granularity of the wait can be as high 
> as 10 ms, which is in the same ball park as the fsync itself.
> 2. commit_delay is not used when there's less than commit_siblings 
> non-idle backends in the system. With very short transactions, it's 
> worthwhile to wait even if that's the case, because a client can begin 
> and finish a transaction in much shorter time than it takes to fsync. 
> This is what makes the commit_delay to not work at all in my test case 
> with 2 clients.
> 
> Here's a spreadsheet with the results of the tests I ran:
> http://community.enterprisedb.com/groupcommit-comparison.ods
> 
> It contains a graph that shows that the patch works very well for this 
> test case. It's not very good for real life as it is, though. An obvious 
> flaw is that if you have a longer-running transaction, effect 1. goes 
> away. Instead of waiting for NBackends commit records, we should try to 
> guess the number of transactions that are likely to finish in a 
> reasonably short time. I'm thinking of keeping a running average of 
> commits per second, or # of transactions that finish while an fsync is 
> taking place.
> 
> Any thoughts?
> 
> -- 
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com


> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Patch queue triage
Next
From: Bruce Momjian
Date:
Subject: Re: Patch queue triage