Re: ERROR: subtransaction logged without previous top-level txn record - Mailing list pgsql-bugs

From Arseny Sher
Subject Re: ERROR: subtransaction logged without previous top-level txn record
Date
Msg-id 871rrb942q.fsf@ars-thinkpad
Whole thread Raw
In response to Re: ERROR: subtransaction logged without previous top-level txn record  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: ERROR: subtransaction logged without previous top-level txn record
List pgsql-bugs
Amit Kapila <amit.kapila16@gmail.com> writes:

> So, doesn't this mean that it started occurring after the fix done in
> commit 96b5033e11 [1]?  Because before that fix we wouldn't have
> allowed processing XLOG_XACT_ASSIGNMENT records unless we are in
> SNAPBUILD_FULL_SNAPSHOT state.  I am not telling the fix in that
> commit is wrong, but just trying to understand the situation here.

Nope. Consider again example of WAL above triggering the error:

[ <xl_xact_assignment_1> <restart_lsn> <subxact_change> <xl_xact_assignment_2> <commit> <confirmed_flush_lsn> ]

Decoder starting reading WAL at <restart_lsn> where he immediately reads
from disk snapshot serialized earlier, which makes it jump to
SNAPBUILD_CONSISTENT right away. It doesn't read xl_xact_assignment_1,
but it reads xl_xact_assignment_2 already in SNAPBUILD_CONSISTENT state,
so catches the error regardless of this commit.

>> Well, almost. This is true as long initial snapshot construction process
>> goes the long way of building the snapshot by itself. If it happens to
>> pick up from disk ready snapshot pickled there by another decoding
>> session, it fast path'es to SNAPBUILD_CONSISTENT, which is technically a
>> bug as described in
>> https://www.postgresql.org/message-id/87ftjifoql.fsf%40ars-thinkpad
>>
>
> Can't we deal with this separately?  If so, I think let's not mix the
> discussions for both as the root cause of both seems different.

These issues are related: before removing the check it would be nice to
ensure that there is no bugs it might protect us from (and it turns out
there actually is, though it won't always protect, and though this bug
has very small probability). Moreover, they are about more or less
subject -- avoiding partially decoded xacts -- and once you dived deep
enough to deal with one, it is reasonable to deal with another instead
of doing that twice. But as a practical matter, removing the check is
simple one-liner, and its presence causes people troubles -- so I'd
suggest doing that first and then deal with the rest. I don't think
starting new thread is worthwhile here, but if you think it does, I can
create it.


--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-bugs by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Unable to trigger createdb
Next
From: Tom Lane
Date:
Subject: Re: BUG #16240: The now() function is populating different date time than expected