Improvement of checkpoint IO scheduler for stable transaction responses - Mailing list pgsql-hackers
From | KONDO Mitsumasa |
---|---|
Subject | Improvement of checkpoint IO scheduler for stable transaction responses |
Date | |
Msg-id | 51B5AFB1.5030404@lab.ntt.co.jp Whole thread Raw |
Responses |
Re: Improvement of checkpoint IO scheduler for stable
transaction responses
Re: Improvement of checkpoint IO scheduler for stable transaction responses |
List | pgsql-hackers |
Hi, I create patch which is improvement of checkpoint IO scheduler for stable transaction responses. * Problem in checkpoint IO schedule in heavy transaction case When heavy transaction in database, I think PostgreSQL checkpoint scheduler has two problems at start and end of checkpoint. One problem is IO heavy when starting initial checkpoint in rounds of checkpoint. This problem was caused by full-page-write which cause WAL IO in fast page writes after checkpoint write page. Therefore, when starting checkpoint, WAL-based checkpoint scheduler wrong judgment that is late schedule by full-page-write, nevertheless checkpoint schedule is not late. This is caused bad transaction response. I think WAL-based checkpoint scheduler was not property in starting checkpoint. Second problem is fsync freeze problem in end of checkpoint. Normally, checkpoint write is executed in background by OS's IO scheduler. But when it does not correctly work, end of checkpoint fsync was caused IO freeze and slower transactions. Unexpected slow transaction will cause monitor error in HA-cluster and decrease user-experience in application service. It is especially serious problem in cloud and virtual server database system which does not have IO performance. However we don't have solution in postgresql.conf parameter very much. We prefer checkpoint time to fast response transactions. In fact checkpoint time is short, and it becomes little bit long that is not problem. You may think that checkpoint_segments and checkpoint_timeout are set larger value, however large checkpoint_segments affects file-cache which is not read and is wasted, and large checkpoint_timeout was caused long-time crash-recovery. * Improvement method of checkpoint IO scheduler 1. Improvement full-page-write IO heavy problem in start of checkpoint My idea is very simple. When start of checkpoint, checkpoint_completion_target become more loose. I set three parameter of this issue; 'checkpoint_smooth_target', 'checkpoint_smooth_margin' and 'checkpointer_write_delay'. 'checkpointer_smooth_target' parameter is a term point that is smooth checkpoint IO schedule in checkpoint progress. 'checkpoint_smooth_margin' parameter can be more smooth checkpoint schedule. It is heuristic parameter, but it solves this problem effectively. 'checkpointer_write_delay' parameter is sleep time for checkpoint schedule. This parameter is nearly same 'bgwriter_delay' in PG9.1 older. If you want to get more detail information, please see attached patch. 2. Improvement fsync freeze problem in end of checkpoint When fsync freeze problem was happened, file fsync more repeatedly is meaningless and causes stop transactions. So I think, if fsync executing time was long, IO queue is flooded and should give IO priority to transactions for fast response time. It realize by inserting sleep time during fsync when fsync time was long. It seems to be long time in checkpoint, but it is not very long. In fact, when fsync time is long, IO queue is packed by another IO which is included checkpoint writes, it only gives IO priority to another executing transactions. I tested my patch in DBT-2 benchmark. Please see result of test. My patch realize higher transaction and fast response than plain PG. Checkpoint time is little bit longer than plain PG, but it is not serious. * Result of DBT-2 with this patch. (Compared with original PG9.2.4) I use DBT-2 benchmark software by OSDL. I also use pg_statsinfo and pg_stats_reporter in this benchmark. - Patched PG (patched 9.2.4) DBT-2 result: http://goo.gl/1PD3l statsinfo report: http://goo.gl/UlGAO settings: http://goo.gl/X4Whu - Original PG (9.2.4) DBT-2 result: http://goo.gl/XVxtj statsinfo report: http://goo.gl/UT1Li settings: http://goo.gl/eofmb Measurement Value is improved 4%, 'new-order 90%tile' is improved 20%, 'new-order average' is improved 18%, 'new-order deviation' is improved 24%, and 'new-order maximum' is improved 27%. I confirm high throughput and WAL IO at executing checkpoint in pg_stats_reporter's report. My patch realizes high response transactions and non-blocking executing transactions. Bad point of my patch is longer checkpoint. Checkpoint time was increased about 10% - 20%. But it can work correctry on schedule-time in checkpoint_timeout. Please see checkpoint result (http://goo.gl/NsbC6). * Test server Server: HP Proliant DL360 G7 CPU: Xeon E5640 2.66GHz (1P/4C) Memory: 18GB(PC3-10600R-9) Disk: 146GB(15k)*4 RAID1+0 RAID controller: P410i/256MB It is not advertisement of pg_statsinfo and pg_stats_reporter:-) They are free software. If you have comment and another idea about my patch, please send me. Best Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Attachment
pgsql-hackers by date: