Home > mailing lists

Re: Replication identifiers, take 4 - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Replication identifiers, take 4
Date	April 17, 2015 17:12:42
Msg-id	55313F00.7010609@iki.fi Whole thread Raw
In response to	Re: Replication identifiers, take 4 (Simon Riggs <simon.riggs@2ndquadrant.com>)
Responses	Re: Replication identifiers, take 4
List	pgsql-hackers

Tree view

On 04/17/2015 12:04 PM, Simon Riggs wrote:
> On 17 April 2015 at 09:54, Andres Freund <andres@anarazel.de> wrote:
>
>> Hrmpf. Says the person that used a lot of padding, without much
>> discussion, for the WAL level infrastructure making pg_rewind more
>> maintainable.
>
> Sounds bad. What padding are we talking about?

In the new WAL format, the data chunks are stored unaligned, without 
padding, to save space. The new format is quite different to the old 
one, so it's not straightforward to compare how much that saved. The 
fixed-size XLogRecord header is 8 bytes shorter in the new format, 
because it doesn't have the xl_len field anymore. But the same 
information is stored elsewhere in the record, where it takes 2 or 5 
bytes (XLogRecordDataHeaderShort/Long).

But it's a fair point that we could've just made small adjustments to 
the old format, without revamping every record type and the way the 
block information is stored, and that the space saving of the new format 
should be compared with that instead, for a fair comparison.

As an example, one simple thing we could've done with the old format: 
remove xl_len, and store the length in place of the two unused padding 
bytes instead, as long as it fits in 16 bits. For longer records, set a 
flag and store it right after XLogRecord header. For practically all WAL 
records, that would've shrunk XLogRecord from 32 to 24 bytes, and made 
each record 8 bytes shorter.

I ran the same pgbench test Andres used, with scale 10, and 50000 
transactions, and compared the WAL size between master and 9.4:

master: 20738352
9.4: 23915800

According to pg_xlogdump, there were 301153 WAL records. If you take the 
9.4 figure, and imagine that we had saved those 8 bytes on each WAL 
record, 9.4 would've been 21506576 bytes instead. So yeah, we could've 
achieved much of the WAL savings with that much smaller change. That's a 
useful thing to compare with.

BTW, those numbers are with wal_level=minimal. With wal_level=logical, 
the WAL size from the same test on master was 26503520 bytes. That's 
quite a bump. Looking at pg_xlogdump output, it seems that it's all 
because the commit records are wider.

- Heikki

pgsql-hackers by date:

From: Peter Geoghegan
Date: 17 April 2015, 16:29:21
Subject: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From: Simon Riggs
Date: 17 April 2015, 18:00:57
Subject: Re: Moving on to close the current CF 2015-02

Re: Replication identifiers, take 4 - Mailing list pgsql-hackers

Previous

Next