Re: Missing FIN_CRC32 calls in logical replication code - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Missing FIN_CRC32 calls in logical replication code
Date
Msg-id 5457DE7B.6060700@vmware.com
Whole thread Raw
In response to Re: Missing FIN_CRC32 calls in logical replication code  (Andres Freund <andres@2ndQuadrant.com>)
Responses Re: Missing FIN_CRC32 calls in logical replication code  (Andres Freund <andres@2ndQuadrant.com>)
List pgsql-hackers
On 10/31/2014 03:46 PM, Andres Freund wrote:
> On 2014-10-27 09:30:33 -0400, Tom Lane wrote:
>> Andres Freund <andres@2ndQuadrant.com> writes:
>>> On 2014-10-27 12:51:44 +0200, Heikki Linnakangas wrote:
>>>> replication/slot.c and replication/logical/snapbuild.c use a CRC on the
>>>> physical slot and snapshot files. It uses the same algorithm as used e.g.
>>>> for the WAL. However, they are not doing the finalization step, FIN_CRC32()
>>>> on the calculated checksums. Not that it matters much, but it's a bit weird
>>>> and inconsistent, and was probably an oversight.
>>
>>> Hm. Good catch - that's stupid. I wonder what to do about it. I'm
>>> tempted to just add a comment about it to 9.4 and fix it on master as
>>> changing it is essentially a catversion bump. Any objections to that?
>>
>> Yeah, I think you should get it right the first time.  It hardly seems
>> likely that any 9.4 beta testers are depending on those files to be stable
>> yet.
>
> Since both state files have the version embedded it'd be trivial to just
> do the FIN_CRC32() when loading a version 2 file. Does anybody object to
> the relevant two lines of code + docs?

No objection, if you feel the backwards-compatibility with beta3 is 
worth it.

Looking at slot.c again, a comment in ReplicationSlotOnDisk contradicts 
the code:

> /*
>  * Replication slot on-disk data structure.
>  */
> typedef struct ReplicationSlotOnDisk
> {
>     /* first part of this struct needs to be version independent */
>
>     /* data not covered by checksum */
>     uint32        magic;
>     pg_crc32    checksum;
>
>     /* data covered by checksum */
>     uint32        version;
>     uint32        length;
>
>     ReplicationSlotPersistentData slotdata;
> } ReplicationSlotOnDisk;
>
> /* size of the part of the slot that is version independent */
> #define ReplicationSlotOnDiskConstantSize \
>     offsetof(ReplicationSlotOnDisk, slotdata)
> /* size of the slots that is not version indepenent */
> #define ReplicationSlotOnDiskDynamicSize \
>     sizeof(ReplicationSlotOnDisk) - ReplicationSlotOnDiskConstantSize

The code that calculates the checksum skips over the constant size, i.e. 
upto ReplicationSlotOnDiskConstantSize, or slotdata. That means that the 
version and length are in fact not covered by the checksum, contrary to 
the comment.

Looking at the similar code in snapbuild.c, I think the code was 
supposed to do what the comment says, and include 'version' and 'length' 
in the checksum. It would be nice to fix that too in slot.c, to be 
consistent with snapbuild.c.

PS. I find the name "ReplicationSlotOnDiskDynamicSize" confusing, as it 
is in fact a fixed size struct. I gather it's expected that the size of 
that part might change across versions, but by that definition nothing 
is constant.

- Heikki



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Let's drop two obsolete features which are bear-traps for novices
Next
From: Tom Lane
Date:
Subject: Re: how to handle missing "prove"