Thread: Full page writes improvement
Here's an idea and a patch for full page writes improvement. Idea: (1) keep full page writes for ordinary WAL, make them available during the crash recovery, -> recovery from inconsistent pages which can be made at the crash, (2) Remove them from the archive log except for those written during online backup (between pg_start_backup and pg_stop_backup) -> small size archive log. Implementation: (1) Mark WAL record whose full-page-writes can be removed, (2) Remove full-page writes from the marked WAL record in archive command, and (3) Restore the removed full-page writes to make LSN consistent. Included is a patch for this as well as archive and restore command source. Patch is very small and I hope this to be included in 8.3. -- Koichi Suzuki
Attachment
Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes: > Here's an idea and a patch for full page writes improvement. > Idea: > (1) keep full page writes for ordinary WAL, make them available during > the crash recovery, -> recovery from inconsistent pages which can be > made at the crash, > (2) Remove them from the archive log except for those written during > online backup (between pg_start_backup and pg_stop_backup) -> small size > archive log. Doesn't this break crash recovery on PITR slaves? regards, tom lane
Tom Lane wrote: > Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes: >> Here's an idea and a patch for full page writes improvement. > >> Idea: >> (1) keep full page writes for ordinary WAL, make them available during >> the crash recovery, -> recovery from inconsistent pages which can be >> made at the crash, >> (2) Remove them from the archive log except for those written during >> online backup (between pg_start_backup and pg_stop_backup) -> small size >> archive log. > > Doesn't this break crash recovery on PITR slaves? Compressed archive log contains the same data as full_page_writes off case. So the influence to PITR slaves is the same as full_page_writes off. K.Suzuki > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Koichi Suzuki
Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes: > Tom Lane wrote: >> Doesn't this break crash recovery on PITR slaves? > Compressed archive log contains the same data as full_page_writes off > case. So the influence to PITR slaves is the same as full_page_writes off. Right. So what is the use-case for running your primary database with full_page_writes on and the slaves with it off? It doesn't seem like a very sensible combination to me. Also, it seems to me that some significant performance hit would be taken by having to grovel through the log files to remove and re-add the full-page data. Plus you are actually writing *more* WAL data out of the primary, not less, because you have to save both the full-page images and the per-tuple data they normally replace. Do you have numbers showing that there's actually any meaningful savings overall? regards, tom lane
Tom Lane wrote: > Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes: >> Tom Lane wrote: >>> Doesn't this break crash recovery on PITR slaves? > >> Compressed archive log contains the same data as full_page_writes off >> case. So the influence to PITR slaves is the same as full_page_writes off. > > Right. So what is the use-case for running your primary database with > full_page_writes on and the slaves with it off? It doesn't seem like > a very sensible combination to me. > > Also, it seems to me that some significant performance hit would be > taken by having to grovel through the log files to remove and re-add the > full-page data. Plus you are actually writing *more* WAL data out of > the primary, not less, because you have to save both the full-page > images and the per-tuple data they normally replace. Do you have > numbers showing that there's actually any meaningful savings overall? Yes, I have some evaluations to show that we're writing less and using overall less resources. Please give me a couple of days to translate. In the case of PITR slave, because archive logs are read in a short period, amount of archive log may not be an issue. In the case where online backup and archive logs must be kept for (relatively) long period, archive log size is a major issue. K.Suzuki > > regards, tom lane > -- Koichi Suzuki
Full_page_compress is not intended to use with PITR slave, but for the case to keep both online backup and archive log for archive recovery, which is very popular PostgreSQL operation now. I've just posted my evaluation for the patch as a reply for another thread of the same proposal (sorry, I created new thread because old one seemed not good). It compares log compression with gzip case. Also, our proposal can combine with gzip. It's overall overhead is slightly less than just copying WAL using cat. As a result, my proposal does not include serious overhead. Please refer to the thread "Archive log compression keeping physical log available in the crash recovery". I appreciate further opinion/comment on this. I'd like to have more suggestion which evaluation is useful. I've posted two (archive and restore) commands and a small patch. These two commands can be treated as contrib and the patch itself does work if WAL is simply copied to the archive directory. Regards; Koichi Suzuki Tom Lane wrote: > Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes: >> Tom Lane wrote: >>> Doesn't this break crash recovery on PITR slaves? > >> Compressed archive log contains the same data as full_page_writes off >> case. So the influence to PITR slaves is the same as full_page_writes off. > > Right. So what is the use-case for running your primary database with > full_page_writes on and the slaves with it off? It doesn't seem like > a very sensible combination to me. > > Also, it seems to me that some significant performance hit would be > taken by having to grovel through the log files to remove and re-add the > full-page data. Plus you are actually writing *more* WAL data out of > the primary, not less, because you have to save both the full-page > images and the per-tuple data they normally replace. Do you have > numbers showing that there's actually any meaningful savings overall? > > regards, tom lane > -- Koichi Suzuki