Re: ERROR: subtransaction logged without previous top-level txn record - Mailing list pgsql-bugs

From Arseny Sher
Subject Re: ERROR: subtransaction logged without previous top-level txn record
Date
Msg-id 87lfog6yev.fsf@ars-thinkpad
Whole thread Raw
In response to Re: ERROR: subtransaction logged without previous top-level txn record  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
Amit Kapila <amit.kapila16@gmail.com> writes:

>> Because when we create the slot we don't demand to stream from some
>> specific point. In fact we just can't, because we don't know since which
>> LSN it is actually possible to stream, i.e. when we'd have good snapshot
>> and no old (which we haven't seen in full) xacts running. It is up to
>> snapbuild.c to define this point. The previous coding was meaningless:
>> we asked for some random restart_lsn and snapbuild.c would silently
>> advance it to earliest suitable LSN.
>>
>
> Hmm, if this is the case then it should be true even without solving
> this particular problem and we should be able to make this change.

Right.

> Leaving that aside, I think this change can make copy replication slot
> functionality to also skip using serialized snapshots with this patch
> which is not our intention.

As I say at [1] logical slot copying facility is currently anyway broken
in this regard: restart_lsn is copied, but confirmed_flush isn't, and
the right fix, in my view, is to avoid DecodingContextFindStartpoint
there altogether (by checking donor's confirmed_flush is valid and
copying it) which would render this irrelevant. To speculate, even if
wanted to go through DecodingContextFindStartpoint for slot copying and
establish confirmed_flush on our own, surely we'd need to handle
serialized snapshots exactly as new slot creation does because dangers
of getting SNAPBUILD_CONSISTENT too early are the same in both cases.

> Also, it doesn't seem like a good idea to ignore setting
> start_decoding_at when we already set slot->data.restart_lsn with this
> value.

Well, these two fields have absolutely different values. BTW I find the
naming here somewhat unfortunate, and this phrase suggests that it
indeed leads to confusion.

Slot's restart_lsn is the LSN since which we start reading WAL and by
setting data.restart_lsn we prevent WAL we need from
recycling. start_decoding_at is the LSN since which we start
*streaming*, i.e. actually replaying COMMITs. So setting the first one
(as we must hold WAL) and not the second one (as we don't know the
streaming point yet when we start slot creation) is just fine.


[1]
https://www.postgresql.org/message-id/flat/CA%2Bfd4k70BXLTm-N6q18LrL%3DGbKtwY3-2%2B%2BUVFw05SvFTkZgTyQ%40mail.gmail.com


-- cheers, arseny



pgsql-bugs by date:

Previous
From: Przemysław Szustak
Date:
Subject: Re: BUG #16283: crash on create index segmentation fault
Next
From: Christoph Berg
Date:
Subject: Wrong de translation for commands/tablecmds.c:5028