On 05/06/18 06:28, Michael Paquier wrote:
> On Mon, Jun 04, 2018 at 11:51:35AM +0200, Petr Jelinek wrote:
>> On 01/06/18 21:13, Michael Paquier wrote:
>>> - startlsn =3D MyReplicationSlot->data.confirmed_flush;
>>> + if (OidIsValid(MyReplicationSlot->data.database))
>>> + startlsn =3D MyReplicationSlot->data.confirmed_flush;
>>> + else
>>> + startlsn =3D MyReplicationSlot->data.restart_lsn;
>>> +
>>> if (moveto < startlsn)
>>> {
>>> ReplicationSlotRelease();
>>
>> This part looks correct for the checking that we are not moving
>> backwards. However, there is another existing issue with this code
>> which
>> is that we are later using the confirmed_flush (via startlsn) as start
>> point of logical decoding (XLogReadRecord parameter in
>> pg_logical_replication_slot_advance) which is not correct. The
>> restart_lsn should be used for that. I think it would make sense to
>> fix
>> that as part of this patch as well.
>
> I am not sure I understand what you are coming at here. Could you
> explain your point a bit more please as 9c7d06d is yours?
As I said, it's an existing issue. I just had chance to reread the code
when looking and your patch.
> When creating
> the decoding context, all other code paths use the confirmed LSN as a
> start point if not explicitely defined by the caller of
> CreateDecodingContext, as it points to the last LSN where a transaction
> has been committed and decoded.
I didn't say anything about CreateDecodingContext though. I am talking
about the fact that we then use the same variable as input to
XLogReadRecord later in the logical slot code path. The whole point of
having restart_lsn for logical slot is to have correct start point for
XLogReadRecord so using the confirmed_flush there is wrong (and has been
wrong since the original commit).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services