On 06.10.2011 15:11, Robert Haas wrote:
> On Thu, Oct 6, 2011 at 7:33 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> A regular heap_insert record leaves out a lot of information that can be
>> deduced at replay time. It can leave out all the headers, including just the
>> null bitmap + data. In addition to that, there's just the location of the
>> tuple (RelFileNode+ItemPointer). At replay, xmin is taken from the WAL
>> record header.
>>
>> For a multi-insert record, you don't even need to store the RelFileNode and
>> the block number for every tuple, just the offsets.
>>
>> In comparison, a full-page image will include the full tuple header, and
>> also the line pointers. If I'm doing my math right, a full-page image takes
>> 25 bytes more data per tuple, than the special-purpose multi-insert record.
>
> Interesting. It's always seemed to me fairly inefficient in general
> to store the whole RelFileNode. For many people, the database and
> tablespace OID will be constants, and even if they aren't, there
> certainly aren't going to be 96 bits of entropy in the relfilenode. I
> thought about whether we could create some sort of mapping layer,
> where say once per checkpoint we'd allocate a 4-byte integer to denote
> a relfilenode, and WAL-log that mapping. Then after that everyone
> could just refer to the 4-byte integer instead of the whole
> relfilenode. But it seems like a lot of work for 8 bytes per record.
> Then again, if you're getting that much benefit from shaving off 25
> bytes per tuple, maybe it is, although I feel like FPW is the elephant
> in the room.
A very simple optimization would be to leave out tablespace OID
altogether if it's DEFAULTTABLESPACE_OID, and just set a flag somewhere.
Then again, we could also just compress the WAL wholesale.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com