Re: [PATCHES] Full page writes improvement, code update - Mailing list pgsql-hackers
From | Koichi Suzuki |
---|---|
Subject | Re: [PATCHES] Full page writes improvement, code update |
Date | |
Msg-id | 4609CAC3.7060008@oss.ntt.co.jp Whole thread Raw |
In response to | Re: Full page writes improvement, code update ("Simon Riggs" <simon@2ndquadrant.com>) |
Responses |
Re: [PATCHES] Full page writes improvement, code update
|
List | pgsql-hackers |
Simon; Thanks a lot for your comments/advices. I'd like to write some feedback. Simon Riggs wrote: > On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote: > >> Here's an update of a code to improve full page writes as proposed in >> >> http://archives.postgresql.org/pgsql-hackers/2007-01/msg01491.php >> and >> http://archives.postgresql.org/pgsql-patches/2007-01/msg00607.php >> >> Update includes some modification for error handling in archiver and >> restoration command. >> >> In the previous threads, I posted several evaluation and shown that we >> can keep all the full page writes needed for full XLOG crash recovery, >> while compressing archive log size considerably better than gzip, with >> less CPU consumption. I've found no further objection for this proposal >> but still would like to hear comments/opinions/advices. > > Koichi-san, > > Looks interesting. I like the small amount of code to do this. > > A few thoughts: > > - Not sure why we need "full_page_compress", why not just mark them > always? That harms noone. (Did someone else ask for that? If so, keep > it) No, no one asked to have a separate option. There'll be no bad influence to do so. As written below, full page write can be categolized as follows: 1) Needed for crash recovery: first page update after each checkpoint. This has to be kept in WAL. 2) Needed for archive recovery: page update between pg_start_backup and pg_stop_backup. This has to be kept in archive log. 3) For log-shipping slave such as pg_standby: no full page writes will be needed for this purpose. My proposal deals with 2). So, if we mark each "full_page_write", I'd rather mark when this is needed. Still need only one bit because the case 3) does not need any mark. > - OTOH I'd like to see an explicit parameter set during recovery since > you're asking the main recovery path to act differently in case a single > bit is set/unset. If you are using that form of recovery, we should say > so explicitly, to keep everybody else safe. Only one thing I had to do is to create "dummy" full page write to maintain LSNs. Full page writes are omitted in archive log. We have to LSNs same as those in the original WAL. In this case, recovery has to read logical log, not "dummy" full page writes. On the other hand, if both logical log and "real" full page writes are found in a log record, the recovery has to use "real" full page writes. > - I'd rather mark just the nonremovable blocks. But no real difference It sound nicer. According to the full page write categories above, we can mark full page writes as needed in "crash recovery" or "archive recovery". Please give some feedback to the above full page write categories. > > - We definitely don't want an normal elog in a XLogInsert critical > section, especially at DEBUG1 level I agree. I'll remove elog calls from critical sections. > - diff -c format is easier and the standard I'll change the diff option. > - pg_logarchive and pg_logrestore should be called by a name that > reflects what they actually do. Possibly pg_compresslog and > pg_decompresslog etc.. I've not reviewed those programs, but: I wasn't careful to have command names more based on what to be done. I'll change the command name. > > - Not sure why we have to compress away page headers. Touch as little as > you can has always been my thinking with recovery code. Eliminating page headers gives compression rate slightly better, a couple of percents. To make code simpler, I'll drop this compression. > > - I'm very uncomfortable with touching the LSN. Maybe I misunderstand? The reason why we need pg_logarchive (or pg_decompresslog) is to maintain LSN the same as those in the original WAL. For this purpose, we need "dummy" full page write. I don't like to touch LSN either and this is the reason why pg_decompresslog need some extra work, eliminating many other changes in the recovery. > > - Have you thought about how pg_standby would integrate with this > option? Can you please? Yes I believe so. As pg_standby does not include any chance to meet partial writes of pages, I believe you can omit all the full page writes. Of course, as Tom Lange suggested in http://archives.postgresql.org/pgsql-hackers/2007-02/msg00034.php removing full page writes can lose a chance to recover from partial/inconsisitent writes in the crash of pg_standby. In this case, we have to import a backup and archive logs (with full page writes during the backup) to recover. (We have to import them when the file system crashes anyway). If it's okay, I believe pg_compresslog/pg_decompresslog can be integrated with pg_standby. Maybe we can work together to include pg_compresslog/pg_decompresslog in pg_standby. > > - I'll do some docs for this after Freeze, if you'd like. I have some > other changes to make there, so I can do this at the same time. I'll be looking forward to your writings. Best regards; -- Koichi Suzuki
pgsql-hackers by date: