Re: BUG #13660: serializable snapshotting hangs - Mailing list pgsql-bugs

From Kevin Grittner
Subject Re: BUG #13660: serializable snapshotting hangs
Date
Msg-id 1379192640.1267648.1444167495623.JavaMail.yahoo@mail.yahoo.com
Whole thread Raw
In response to BUG #13660: serializable snapshotting hangs  (cpacejo@clearskydata.com)
Responses Re: BUG #13660: serializable snapshotting hangs  (Chris Pacejo <cpacejo@clearskydata.com>)
List pgsql-bugs
On Thursday, October 1, 2015 9:19 PM, "cpacejo@clearskydata.com" <cpacejo@c=
learskydata.com> wrote:

> PostgreSQL version: 9.4.4

> After running fine for weeks, we now find that serializable snapshots han=
g:
>
> our_db=3D> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEF=
ERRABLE;
> START TRANSACTION
> our_db=3D> SELECT pg_export_snapshot();
> (...hangs indefinitely...)

> Killing all backends (i.e. including those accessing other databases)
> unblocked serializable snapshotting.
>
> Is this expected behavior?

What should be happening is that when the serializable read only
deferrable transaction executes the first statement that needs a
snapshot, it grabs a snapshot and looks for concurrent read write
transactions.=C2=A0 If there are none, the snapshot is safe and it goes
ahead with execution; otherwise it goes into a loop where it delays
(to avoid sucking 100% CPU for a core) by calling
ProcWaitForSignal().=C2=A0 Each time that function returns it checks
whether any of the concurrent read write transactions developed a
read-write conflict out (i.e., it read data which was modified by a
transaction concurrent to *it*) to a transaction which committed
before this read only transaction got its snapshot.=C2=A0 If so, this
snapshot is unsafe and we need to discard it, get a new snapshot,
and go back into the loop.=C2=A0 If all read write transactions
concurrent to the read only transaction complete without developing
such a conflict our snapshot is safe and we can proceed to run
without taking predicate locks or risking a serialization failure.

Now, that's pretty technical, and you definitely don't need to
follow all of the above to safely use deferrable transactions.=C2=A0 The
short version is that your initial statement that needs a snapshot
cannot begin to execute *at least* until any concurrent
transactions which are not READ ONLY complete.=C2=A0 One connection
stuck "idle in transaction" can hold it up indefinitely.=C2=A0 A
prepared transaction also counts as a transaction which has not yet
completed.

So if you had any long-running READ WRITE transactions (including
one stuck "idle in transaction" or prepared but not committed), it
is expected behavior.=C2=A0 If not, we should try to gather more
information to figure out what's going on.

One other possibility is that your workload has changed such that
there are now so many read write transactions reading data modified
by other transactions that it is just not finding a safe snapshot
very quickly.=C2=A0 Even when running benchmarks at saturation levels on
a 16 core machine we rarely saw a delay more than six seconds, but
there is no upper bound on how long it might take.=C2=A0 In such a case
you would have a few options, but let's not get ahead of ourselves
-- the first thing is to check for "idle in transaction" or
prepared transaction issues.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.
Next
From: Chris Pacejo
Date:
Subject: Re: BUG #13660: serializable snapshotting hangs