Tom Lane wrote:
> It's still not entirely clear what's happening on okapi, but in the
> meantime I've thought of an easily-reproducible way to cause similar
> failures in any branch. That is to run CREATE INDEX CONCURRENTLY
> with default_transaction_isolation = serializable. Then, snapmgr.c
> will set up a transaction snapshot (actually identical to the
> "reference snapshot" used by DefineIndex), and that will not get
> released, so the process's xmin doesn't get cleared, and we have
> a deadlock hazard.
Hah, ouch.
> I experimented with running the isolation tests under "alter system set
> default_transaction_isolation to serializable". Oddly, multiple-cic
> tends to not fail that way for me, though if I reduce the
> isolation_schedule file to contain just that one test, it fails nine
> times out of ten. Leftover activity from the previous tests must be
> messing up the timing somehow. Anyway, the problem is definitely real.
> (A couple of the other isolation tests do fail reliably under this
> scenario; is it worth hardening them?)
Yes, I think it's worth making them pass somehow -- see commits
f18795e7b74c, a0eae1a2eeb6.
> I thought for a bit about trying to force C.I.C.'s transactions to
> be run with a lower transaction isolation level, but that seems messy
> and I'm not very sure it wouldn't have bad side-effects. A much simpler
> fix is to just start YA transaction before waiting, as in the attached
> proposed patch. (With the transaction restart, I feel sufficiently
> confident that there should be no open snapshots that it seems okay
> to put in the Assert I was previously afraid to add.)
Seems like an acceptable fix to me.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services