Re: BUG #13660: serializable snapshotting hangs - Mailing list pgsql-bugs

From Chris Pacejo
Subject Re: BUG #13660: serializable snapshotting hangs
Date
Msg-id CAC8iE5isTnJ_8KHv6HyWAqW=2q37wpHTiTA3EBLhWzgLVTBJGw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #13660: serializable snapshotting hangs  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: BUG #13660: serializable snapshotting hangs  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-bugs
I get that, but I had no other open connections to that database.
There were many connections to *other* databases on that same
*server*, but that shouldn't affect taking a snapshot on an otherwise
unused database, correct?

Also note that once this happened to one database, it happened to
*all* databases on the same server simultaneously.  It's as if the
read-write transaction check is performed server-wide, not
database-wide as I would expect.



On Tue, Oct 6, 2015 at 5:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> On Thursday, October 1, 2015 9:19 PM, "cpacejo@clearskydata.com"
> <cpacejo@clearskydata.com> wrote:
>
>> PostgreSQL version: 9.4.4
>
>> After running fine for weeks, we now find that serializable snapshots
>> hang:
>>
>> our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY,
>> DEFERRABLE;
>> START TRANSACTION
>> our_db=> SELECT pg_export_snapshot();
>> (...hangs indefinitely...)
>
>> Killing all backends (i.e. including those accessing other databases)
>> unblocked serializable snapshotting.
>>
>> Is this expected behavior?
>
> What should be happening is that when the serializable read only
> deferrable transaction executes the first statement that needs a
> snapshot, it grabs a snapshot and looks for concurrent read write
> transactions.  If there are none, the snapshot is safe and it goes
> ahead with execution; otherwise it goes into a loop where it delays
> (to avoid sucking 100% CPU for a core) by calling
> ProcWaitForSignal().  Each time that function returns it checks
> whether any of the concurrent read write transactions developed a
> read-write conflict out (i.e., it read data which was modified by a
> transaction concurrent to *it*) to a transaction which committed
> before this read only transaction got its snapshot.  If so, this
> snapshot is unsafe and we need to discard it, get a new snapshot,
> and go back into the loop.  If all read write transactions
> concurrent to the read only transaction complete without developing
> such a conflict our snapshot is safe and we can proceed to run
> without taking predicate locks or risking a serialization failure.
>
> Now, that's pretty technical, and you definitely don't need to
> follow all of the above to safely use deferrable transactions.  The
> short version is that your initial statement that needs a snapshot
> cannot begin to execute *at least* until any concurrent
> transactions which are not READ ONLY complete.  One connection
> stuck "idle in transaction" can hold it up indefinitely.  A
> prepared transaction also counts as a transaction which has not yet
> completed.
>
> So if you had any long-running READ WRITE transactions (including
> one stuck "idle in transaction" or prepared but not committed), it
> is expected behavior.  If not, we should try to gather more
> information to figure out what's going on.
>
> One other possibility is that your workload has changed such that
> there are now so many read write transactions reading data modified
> by other transactions that it is just not finding a safe snapshot
> very quickly.  Even when running benchmarks at saturation levels on
> a 16 core machine we rarely saw a delay more than six seconds, but
> there is no upper bound on how long it might take.  In such a case
> you would have a few options, but let's not get ahead of ourselves
> -- the first thing is to check for "idle in transaction" or
> prepared transaction issues.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

pgsql-bugs by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: BUG #13660: serializable snapshotting hangs
Next
From: irina.guberman@gmail.com
Date:
Subject: BUG #13665: Foreign Key constraint doesn't work