Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: WAL format and API changes (9.5) |
Date | |
Msg-id | 5469F68E.3080300@vmware.com Whole thread Raw |
In response to | Re: WAL format and API changes (9.5) (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: WAL format and API changes (9.5)
(Heikki Linnakangas <hlinnakangas@vmware.com>)
|
List | pgsql-hackers |
On 11/13/2014 05:04 PM, Andres Freund wrote: > On 2014-11-13 15:33:44 +0200, Heikki Linnakangas wrote: >> In quick testing, this new WAL format is somewhat more compact than the 9.4 >> format. > > Well, that's mixing different kinds of changes together to some > degree. Most of this could have been done independently as well. I'm not > generally objecting to doing that, but the space/performance comparison > isn't entirely fair. One could (I'm not!) argue that we should just do a > refactoring like you suggest above, without the additional block > information. Yep. It would be interesting to just do the minimal changes to the current record format, to get a similar reduction in WAL volume, and then compare the performance and WAL volume of that vs. the big patch. But I don't think it's really worth it; we do want the bigger changes, anyway, even if it costs a little. >> That also seems to have more than bought back the performance >> regression I saw earlier. > > I think we really need to do the performance test with a different CRC > implementation. So far all the tests in this thread seem to be > correlating pretty linearly to the size of the record. Will do. >> XLogReader now has >> a new function, DecodeXLogRecord, which parses the block headers etc. from >> the WAL record, and copies the data chunks to aligned buffers. The redo >> routines are passed a pointer to the XLogReaderState, instead of the plain >> XLogRecord, and the redo routines can use macros and functions defined >> xlogreader.h to access the already-decoded WAL record. The new WAL record >> format is difficult to parse in a piece-meal fashion, so it really needs >> this separate decoding pass to be efficient. > > Hm. I'm not sure if there's really a point in exposing > DecodeXLogRecord() outside of xlogreader.c. In which case would a > xlogreader user not want to decode the record? I.e. couldn't we just > always do that and not bother exposing the function? > > Sure, you could skip decoding if the record is of a "uninteresting" > rmgr, but since all validation but CRC now happens only during decoding > I'm not sure that's a good idea. Well, extracting all the backup block data etc. is not free, so it would be nice to be able to filter first. I'm thinking of logical decoding in particular; it could avoid parsing index AM records. Then again, the parsing isn't very expensive anyway. I'll change the patch to decode as part of the XLogReadRecord call for now; it does things a little bit simpler. We can change it back later if it seems worthwhile. I nevertheless kept the DecodeXLogRecord function. It can be used to decode an XLogRecord that's not read from disk, using the xlogreader, but already in memory. That was needed to support WAL_DEBUG at insertion; it needs to rm_desc() a record that's not previously read from disk with an xlogreader. >> Thoughts on this new WAL record format? I've attached the xlogrecord.h file >> here separately for easy reading, if you want to take a quick look at just >> that without applying the whole patch. > > After a quick skim I like it in general. There's some details I'd rather > change, but I think it's quite the step forward. Glad to hear that. Here is an updated version of the patch. It's rebased over current master, and fixes a bunch of issues you and Michael pointed out, and a ton of other cleanup. I'll do more comprehensive performance testing with this, and post results. BTW, I'm planning to replace the malloc() calls in xlogreader.c with palloc(). It currently uses malloc because xlogreader.c needs to be usable in frontend code, but nowadays we map palloc automatically to malloc when building frontend code. A consequence of that is that you get a straight exit() on out-of-memory, but I think that's acceptable. - Heikki
Attachment
pgsql-hackers by date: