Re: Replication identifiers, take 4 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Replication identifiers, take 4
Date
Msg-id 55313F00.7010609@iki.fi
Whole thread Raw
In response to Re: Replication identifiers, take 4  (Simon Riggs <simon.riggs@2ndquadrant.com>)
Responses Re: Replication identifiers, take 4
List pgsql-hackers
On 04/17/2015 12:04 PM, Simon Riggs wrote:
> On 17 April 2015 at 09:54, Andres Freund <andres@anarazel.de> wrote:
>
>> Hrmpf. Says the person that used a lot of padding, without much
>> discussion, for the WAL level infrastructure making pg_rewind more
>> maintainable.
>
> Sounds bad. What padding are we talking about?

In the new WAL format, the data chunks are stored unaligned, without 
padding, to save space. The new format is quite different to the old 
one, so it's not straightforward to compare how much that saved. The 
fixed-size XLogRecord header is 8 bytes shorter in the new format, 
because it doesn't have the xl_len field anymore. But the same 
information is stored elsewhere in the record, where it takes 2 or 5 
bytes (XLogRecordDataHeaderShort/Long).

But it's a fair point that we could've just made small adjustments to 
the old format, without revamping every record type and the way the 
block information is stored, and that the space saving of the new format 
should be compared with that instead, for a fair comparison.

As an example, one simple thing we could've done with the old format: 
remove xl_len, and store the length in place of the two unused padding 
bytes instead, as long as it fits in 16 bits. For longer records, set a 
flag and store it right after XLogRecord header. For practically all WAL 
records, that would've shrunk XLogRecord from 32 to 24 bytes, and made 
each record 8 bytes shorter.

I ran the same pgbench test Andres used, with scale 10, and 50000 
transactions, and compared the WAL size between master and 9.4:

master: 20738352
9.4: 23915800

According to pg_xlogdump, there were 301153 WAL records. If you take the 
9.4 figure, and imagine that we had saved those 8 bytes on each WAL 
record, 9.4 would've been 21506576 bytes instead. So yeah, we could've 
achieved much of the WAL savings with that much smaller change. That's a 
useful thing to compare with.

BTW, those numbers are with wal_level=minimal. With wal_level=logical, 
the WAL size from the same test on master was 26503520 bytes. That's 
quite a bump. Looking at pg_xlogdump output, it seems that it's all 
because the commit records are wider.

- Heikki




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Next
From: Simon Riggs
Date:
Subject: Re: Moving on to close the current CF 2015-02