Re: BUG #15808: ERROR: subtransaction logged without previoustop-level txn record (SQLSTATE XX000) - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #15808: ERROR: subtransaction logged without previoustop-level txn record (SQLSTATE XX000)
Date
Msg-id 20190516170434.masck6ehwg2kvbi2@alap3.anarazel.de
Whole thread Raw
In response to BUG #15808: ERROR: subtransaction logged without previous top-level txn record (SQLSTATE XX000)  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #15808: ERROR: subtransaction logged without previoustop-level txn record (SQLSTATE XX000)  (Dave Cramer <davecramer@gmail.com>)
List pgsql-bugs
Hi,

On 2019-05-16 04:56:15 +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      15808
> Logged by:          Mansour Behabadi
> Email address:      mansour@oxplot.com
> PostgreSQL version: 10.6
> Operating system:   Amazon RDS
> Description:        
> 
> We have some custom logical replication client that makes
> pg_logical_slot_get_changes() calls in SQL. E.g.:

Unrelated to the bug: You really should use the streaming
interface. It's much, much, much more efficient.

https://www.postgresql.org/docs/current/logicaldecoding-walsender.html


> Once every few thousand calls, we get the following error:
> 
> ERROR: subtransaction logged without previous top-level txn record (SQLSTATE
> XX000)
> 
> which will persist on all subsequent calls, essentially forcing us to drop
> the slot and create a new one.

That obviously shouldn't happen.


> We had little success looking for solutions online and the only lead is that
> of a recent commit
> (https://github.com/postgres/postgres/commit/f49a80c481f74fa81407dce8e51dea6956cb64f8)
> whose commit message seem to correlate to the error we're getting. Below is
> the relevant excerpt:
> 
> The second issue concerns SnapBuilder snapshots and subtransactions.
> SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
> transaction that is known to be a subtxn, which is good in the common
> case that the top-level transaction already has one (no point in doing
> so), but a bug otherwise.  To fix, arrange to transfer the snapshot from
> the subtxn to its top-level txn as soon as the kinship gets known.
> test_decoding's snapshot_transfer verifies this.

That seems unrelated to the error message you're getting.


> We're not sure if this is a fix to our problem and whether upgrading to
> Postgres 11 (which has this change in it) will solve the issue.

Note that this change isn't just in 11:

Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Branch: master Release: REL_11_BR [f49a80c48] 2018-06-26 16:48:10 -0400
Branch: REL_10_STABLE Release: REL_10_5 [b767b3f2e] 2018-06-26 16:38:34 -0400
Branch: REL9_6_STABLE Release: REL9_6_10 [da10d6a8a] 2018-06-26 16:38:34 -0400
Branch: REL9_5_STABLE Release: REL9_5_14 [4cb6f7837] 2018-06-26 16:38:34 -0400
Branch: REL9_4_STABLE Release: REL9_4_19 [962313558] 2018-06-26 16:38:34 -0400


> Please let me know if any more info is needed.

The easiest way to progress here would be a recipe to reproduce the
problem. As long as the problem is on RDS, we unfortunately can't really
debug this - neither can we modify the source to emit more debugging
information, nor can we inspect the WAL files ourselves (I think).

It's possible that trying to reproduce this on RDS with the debug level
set to very high (debug5) would allow for a bit more insight. But I'm
somewhat doubtful.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #15810: Using custom directory on external HDD gives permission error
Next
From: Tom Lane
Date:
Subject: Re: inconsistent results querying table partitioned by date