checkpoint patches - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | checkpoint patches |
Date | |
Msg-id | CA+Tgmobv6gm6SzHx8e2w-0180+jHbCNYbAot9KyzG_3DxRYxaw@mail.gmail.com Whole thread Raw |
Responses |
Re: checkpoint patches
|
List | pgsql-hackers |
There are two checkpoint-related patches in this CommitFest that haven't gotten much love, one from me and the other from Greg Smith: https://commitfest.postgresql.org/action/patch_view?id=752 https://commitfest.postgresql.org/action/patch_view?id=795 Mine uses sync_file_range() when available (i.e. on Linux) to add the already-dirty data to the kernel writeback queue at the beginning of each checkpoint, in the hopes of reducing the tendency of checkpoints to disrupt other activity on the system. Greg's adds an optional pause after each fsync() call for similar purposes. What we're lacking is any testimony to the effectiveness or ineffectiveness of either approach. I took a shot at trying to figure this out by throwing pgbench at it, but didn't get too far. Here's scale factor 300, which fits in shared_buffers, on the IBM POWER7 machine: resultsckpt.checkpoint-sync-pause-v1.1:tps = 14274.784431 (including connections establishing) resultsckpt.checkpoint-sync-pause-v1.2:tps = 12114.861879 (including connections establishing) resultsckpt.checkpoint-sync-pause-v1.3:tps = 14117.602996 (including connections establishing) resultsckpt.master.1:tps = 14485.394298 (including connections establishing) resultsckpt.master.2:tps = 14162.000100 (including connections establishing) resultsckpt.master.3:tps = 14307.221824 (including connections establishing) resultsckpt.writeback-v1.1:tps = 14264.851218 (including connections establishing) resultsckpt.writeback-v1.2:tps = 14314.773839 (including connections establishing) resultsckpt.writeback-v1.3:tps = 14230.219900 (including connections establishing) Looks like a whole lot of "that didn't matter". Of course then I realized that it was a stupid test, since if the whole database fits in shared_buffers then of course there won't be any data in the OS at checkpoint start time. So I ran some more tests with scale factor 1000, which doesn't fit in shared_buffers. Unfortunately an operating system crash intervened before the test finished, but it still looks like a whole lot of nothing: resultsckpt.checkpoint-sync-pause-v1.4:tps = 1899.745078 (including connections establishing) resultsckpt.checkpoint-sync-pause-v1.5:tps = 1925.848571 (including connections establishing) resultsckpt.checkpoint-sync-pause-v1.6:tps = 1920.624753 (including connections establishing) resultsckpt.master.4:tps = 1855.866476 (including connections establishing) resultsckpt.master.5:tps = 1862.413311 (including connections establishing) resultsckpt.writeback-v1.4:tps = 1869.536435 (including connections establishing) resultsckpt.writeback-v1.5:tps = 1912.669580 (including connections establishing) There might be a bit of improvement there with the patches, but it doesn't look like very much, and then you also have to think about the fact that they work by making checkpoints take longer, and therefore potentially less frequent, especially in the case of the checkpoint-sync-pause patch. Of course, this is maybe all not very surprising, since Greg already spent some time talking about the sorts of conditions he thought were needed to replicate his test, and they're more complicated than throwing transactions at the database at top speed. I don't know how to replicate those conditions, though, and there's certainly plenty of checkpoint-related latency to be quashed even on this test - a problem which these patches apparently do little if anything to address. So my feeling is that it's premature to change anything here and we should punt any changes in this area to 9.3. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: