Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing - Mailing list pgsql-bugs

From
Subject Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
Date
Msg-id CAPL_MpMAEVqx-v8qfS-dzU7qJx_Vbhb+X8gywLrKcQ70JLADYg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
List pgsql-bugs
Regarding my expectations...
1. I have observed that pg_logical_slot_get_changes always return changes
at the transaction boundary.
For example, asking for 1 change (third parameter, replace NULL a number)
can still return million of changes (if the next transaction is big), as
the check is likely done at the end of each transaction.
I'm actually OK with that (weird but doesn't hurt me).
2. I have validated that a single call to pg_logical_slot_get_changes
returns a result set with duplicates, going back to the start (I've seen it
with a Java debugger, looping over the forward-only cursor of the SELECT
from the replication slot). That is the bug I'm reporting - not across
calls but within a single call.
3. I also don't understand the "end of WAL" thing, since the call must wait
to see if the transaction commits or rolls back.
Maybe if it reaches "end of WAL", it starts over internally, adding the
same changes again to some internal buffer that later gets sent?
I can also add that it starts over after a "round" number of rows - I saw
8K, 52K, 64K etc (run the test case). Maybe that is a hint (or just an
artifact of how INSERT AS SELECT works?)


On Mon, Oct 26, 2015 at 1:51 PM, Shulgin, Oleksandr <
oleksandr.shulgin@zalando.de> wrote:

> On Mon, Oct 26, 2015 at 12:38 PM, Andres Freund <andres@anarazel.de>
> wrote:
>
>> On 2015-10-26 12:34:48 +0100, Shulgin, Oleksandr wrote:
>> > On Mon, Oct 26, 2015 at 12:30 PM, Andres Freund <andres@anarazel.de>
>> wrote:
>> >
>> > > On 2015-10-26 13:21:44 +0200, ofir.manor@gmail.com wrote:
>> > > > Yes, this is a small script to reproduce, the real code is Java, we
>> saw
>> > > > sporadic wrong results.
>> > > > However, I'm interested in CDC (get change notifications per row to
>> my
>> > > > app), not PG-to-PG replication.
>> > >
>> > > The streaming interface exists for row-by-row notification as well,
>> and
>> > > is a *LOT* more efficient.
>> > >
>> >
>> > Yeah, but I don't think there's a workable Java implementation available
>> > yet?
>>
>> No idea, but it's not that hard to write one.
>>
>> > > If there's a bug here, we obviously need to fix it nonetheless.
>> > I would assume re-calculating end_of_wal in the while loop condition
>> would
>> > fix this?
>>
>> Why? That'd just lead to outputting more rows in one invocation, and
>> that's it? I think I'm not following what you see as the problem?
>>
>
> I think there are just some false expectations involved about how this
> interface should work.  The OP likely expects that after the partial
> results were returned by the first call to pg_logical_slot_get_changes(),
> the next call will continue from the point where the first call left.
>
> This doesn't happen because in the first call we never cross transaction
> boundary?  Hm, but why do we see the partial changes anyway?  I would
> assume if we started decoding this at all, the transaction was already
> committed and end_of_wal will be past its end...
>
> I'm lost.
>
> --
> Alex
>
>


--

Ofir Manor

   Blog: http://ofirm.wordpress.com  <http://ofirm.wordpress.com>
LinkedIn: http://il.linkedin.com/in/ofirmanor

   Twitter: @ofirm   Mobile:   +972-54-7801286

pgsql-bugs by date:

Previous
From:
Date:
Subject: Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
Next
From:
Date:
Subject: Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing