Re: [HACKERS] Full page writes improvement, code update - Mailing list pgsql-patches
From | Koichi Suzuki |
---|---|
Subject | Re: [HACKERS] Full page writes improvement, code update |
Date | |
Msg-id | 46288450.1050703@oss.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] Full page writes improvement, code update ("Simon Riggs" <simon@2ndquadrant.com>) |
List | pgsql-patches |
Hi, I agree that pg_compresslog should be aware of all the WAL records' details so that it can optimize archive log safely. In my patch, I've examined 8.2's WAL records to make pg_compresslog/pg_decompresslog safe. Also I agree further pg_compresslog maintenance needs to examine changes in WAL record format. Because the number of such format will be limited, I think the amount of work will be reasonable enough. Regards; Simon Riggs wrote: > On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote: >> "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes: >>> But you also turn off the optimization that avoids writing regular >>> WAL records when the info is already contained in a full-page image >>> (increasing the uncompressed size of WAL). >>> It was that part I questioned. > > I think its right to question it, certainly. > >> That's what bothers me about this patch, too. It will be increasing >> the cost of writing WAL (more data -> more CRC computation and more >> I/O, not to mention more contention for the WAL locks) which translates >> directly to a server slowdown. > > I don't really understand this concern. Koichi-san has included a > parameter setting that would prevent any change at all in the way WAL is > written. If you don't want this slight increase in WAL, don't enable it. > If you do enable it, you'll also presumably be compressing the xlog too, > which works much better than gzip using less CPU. So overall it saves > more than it costs, ISTM, and nothing at all if you choose not to use > it. > >> The main arguments that I could see against Andreas' alternative are: >> >> 1. Some WAL record types are arranged in a way that actually would not >> permit the reconstruction of the short form from the long form, because >> they throw away too much data when the full-page image is substituted. >> An example that's fresh in my mind is that the current format of the >> btree page split WAL record discards newitemoff in that case, so you >> couldn't identify the inserted item in the page image. Now this is only >> saving two bytes in what's usually going to be a darn large record >> anyway, and it complicates the code to do it, so I wouldn't cry if we >> changed btree split to include newitemoff always. But there might be >> some other cases where more data is involved. In any case, someone >> would have to look through every single WAL record type to determine >> whether reconstruction is possible and fix it if not. >> >> 2. The compresslog utility would have to have specific knowledge about >> every compressible WAL record type, to know how to convert it to the >> short format. That means an ongoing maintenance commitment there. >> I don't think this is unacceptable, simply because we need only teach >> it about a few common record types, not everything under the sun --- >> anything it doesn't know how to fix, just leave alone, and if it's an >> uncommon record type it really doesn't matter. (I guess that means >> that we don't really have to do #1 for every last record type, either.) >> >> So I don't think either of these is a showstopper. Doing it this way >> would certainly make the patch more acceptable, since the argument that >> it might hurt rather than help performance in some cases would go away. > > Yeh, its additional code paths, but it sounds like Koichi-san and > colleagues are going to be trail blazing any bugs there and will be > around to fix any more that emerge. > >>> What about disconnecting WAL LSN from physical WAL record position >>> during replay ? >>> Add simple short WAL records in pg_compresslog like: advance LSN by 8192 >>> bytes. >> I don't care for that, as it pretty much destroys some of the more >> important sanity checks that xlog replay does. The page boundaries >> need to match the records contained in them. So I think we do need >> to have pg_decompresslog insert dummy WAL entries to fill up the >> space saved by omitting full pages. > > Agreed. I don't want to start touching something that works so well. > > > We've been thinking about doing this for at least 3 years now, so I > don't see any reason to baulk at it now. I'm happy with Koichi-san's > patch as-is, assuming further extensive testing will be carried out on > it during beta. > -- ------------- Koichi Suzuki
pgsql-patches by date: