Error "initial slot snapshot too large" in create replication slot - Mailing list pgsql-hackers

From Dilip Kumar
Subject Error "initial slot snapshot too large" in create replication slot
Date
Msg-id CAFiTN-tqopqpfS6HHug2nnOGieJJ_nm-Nvy0WBZ=Zpo-LqtSJA@mail.gmail.com
Whole thread Raw
Responses Re: Error "initial slot snapshot too large" in create replication slot
List pgsql-hackers
While creating an "export snapshot" I don't see any protection why the
number of xids in the snapshot can not cross the
"GetMaxSnapshotXidCount()"?.

Basically, while converting the HISTORIC snapshot to the MVCC snapshot
in "SnapBuildInitialSnapshot()", we add all the xids between
snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
commit were not recorded).  The problem is that we add both topxids as
well as the subxids into the same array and expect that the "xid"
count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
an issue but I am not sure what is the fix for this, some options are
a) Don't limit the xid count in the exported snapshot and dynamically
resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount().  But in option b) there would still be a
problem that how do we handle the overflowed subtransaction?

I have locally, reproduced the issue,

1. Configuration
max_connections= 5
autovacuum = off
max_worker_processes = 0

2.Then from pgbench I have run the attached script (test.sql) from 5 clients.
./pgbench -i postgres
./pgbench -c4 -j4 -T 3000 -f test1.sql -P1 postgres

3. Concurrently, create replication slot,
[dilipkumar@localhost bin]$ ./psql "dbname=postgres replication=database"
postgres[7367]=#
postgres[6463]=# CREATE_REPLICATION_SLOT "slot" LOGICAL "test_decoding";
ERROR:  40001: initial slot snapshot too large
LOCATION:  SnapBuildInitialSnapshot, snapbuild.c:597
postgres[6463]=# CREATE_REPLICATION_SLOT "slot" LOGICAL "test_decoding";
ERROR:  XX000: clearing exported snapshot in wrong transaction state
LOCATION:  SnapBuildClearExportedSnapshot, snapbuild.c:690

I could reproduce this issue, at least once in 8-10 attempts of
creating the replication slot.

Note:  After that issue, I have noticed one more issue "clearing
exported snapshot in wrong transaction state", that is because the
"ExportInProgress" is not cleared on the transaction abort, for this,
a simple fix is we can clear this state on the transaction abort,
maybe I will raise this as a separate issue?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.
Next
From: Fujii Masao
Date:
Subject: Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()