On 06/06/18 04:04, Michael Paquier wrote:
> On Tue, Jun 05, 2018 at 01:00:30PM +0200, Petr Jelinek wrote:
>> I didn't say anything about CreateDecodingContext though. I am talking
>> about the fact that we then use the same variable as input to
>> XLogReadRecord later in the logical slot code path. The whole point of
>> having restart_lsn for logical slot is to have correct start point for
>> XLogReadRecord so using the confirmed_flush there is wrong (and has been
>> wrong since the original commit).
>
> OK, I finally see the point you are coming at after a couple of hours
> brainstorming about it, and indeed using confirmed_lsn is logically
> incorrect so the current code is inconsistent with what
> DecodingContextFindStartpoint() does, so we rely on restart_lsn to scan
> for all the records and the decoder state is here to make sure that the
> slot's confirmed_lsn is updated to a consistent position. I have added
> your feedback in the attached (indented and such), which results in some
> simplifications with a couple of routines.
>
> I am attaching as well the patch I sent yesterday. 0001 is candidate
> for a back-patch, 0002 is for HEAD to fix the slot advance stuff.
>
> There is another open item related to slot advancing here:
> https://www.postgresql.org/message-id/2840048a-1184-417a-9da8-3299d207a1d7%40postgrespro.ru
> And per my tests the patch is solving this item as well. I have just
> used the test mentioned in the thread which:
> 1) creates a slot
> 2) does some activity to generate a couple of WAL pages
> 3) advances the slot at page boundary
> 4) Moves again the slot.
> This test crashes on HEAD at step 4, and not with the attached.
>
> What do you think?
Seems reasonable to me.
I think the only thing to note about the patches from my side is that we
probably don't want to default to restart_lsn for the
pg_logical_replication_slot_advance() return value (when nothing was
done) but rather the confirmed_lsn. As it is in current patch if we call
the function repeatedly and one call moved slot forward but the next one
didn't the return value will go backwards as restart_lsn tends to be
behind the confirmed one.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services