Thread: BUG #13660: serializable snapshotting hangs

BUG #13660: serializable snapshotting hangs

From
cpacejo@clearskydata.com
Date:
The following bug has been logged on the website:

Bug reference:      13660
Logged by:          Chris Pacejo
Email address:      cpacejo@clearskydata.com
PostgreSQL version: 9.4.4
Operating system:   CentOS 7 (kernel 3.10.0-123.el7.x86_64)
Description:

After running fine for weeks, we now find that serializable snapshots hang:

our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY,
DEFERRABLE;
START TRANSACTION
our_db=> SELECT pg_export_snapshot();
(...hangs indefinitely...)

This occurs on all databases in the cluster.

pg_stat_activity reports for one database:

-[ RECORD 1 ]----+------------------------------
datid            | 16385
datname          | our_db
pid              | 31347
usesysid         | 99450
usename          | our_user
application_name | psql
client_addr      | X.X.X.X
client_hostname  |
client_port      | 55975
backend_start    | 2015-10-01 14:24:12.942063-04
xact_start       | 2015-10-01 14:26:54.437046-04
query_start      | 2015-10-01 14:26:56.245404-04
state_change     | 2015-10-01 14:26:56.245407-04
waiting          | f
state            | active
backend_xid      |
backend_xmin     | 222030266
query            | select pg_export_snapshot();

(i.e., this is the only active transaction on this database)

gdb on this backend reports:

(gdb) bt
#0  0x00007f496a5e6bd7 in semop () from /lib64/libc.so.6
#1  0x000000000061ba47 in PGSemaphoreLock (sema=0x7f495f32f930,
    interruptOK=interruptOK@entry=1 '\001') at pg_sema.c:421
#2  0x000000000066e4c5 in ProcWaitForSignal () at proc.c:1641
#3  0x0000000000673b5d in GetSafeSnapshot (origSnapshot=<optimized out>) at
predicate.c:1534
#4  GetSerializableTransactionSnapshot (snapshot=0xb73aa0
<CurrentSnapshotData>) at predicate.c:1598
#5  0x0000000000782cad in GetTransactionSnapshot () at snapmgr.c:200
#6  0x000000000067df35 in exec_simple_query (query_string=0x24dc0b0 "select
pg_export_snapshot();")
    at postgres.c:986
#7  PostgresMain (argc=<optimized out>, argv=argv@entry=0x244a5f8,
dbname=0x244a4a8 "pod_10003_1",
    username=<optimized out>) at postgres.c:4074
#8  0x0000000000462922 in BackendRun (port=0x247b1d0) at postmaster.c:4164
#9  BackendStartup (port=0x247b1d0) at postmaster.c:3829
#10 ServerLoop () at postmaster.c:1597
#11 0x000000000062d33c in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x2449300)
    at postmaster.c:1244
#12 0x0000000000463569 in main (argc=3, argv=0x2449300) at main.c:228

Killing all backends (i.e. including those accessing other databases)
unblocked serializable snapshotting.

Is this expected behavior?

Re: BUG #13660: serializable snapshotting hangs

From
Kevin Grittner
Date:
On Thursday, October 1, 2015 9:19 PM, "cpacejo@clearskydata.com" <cpacejo@c=
learskydata.com> wrote:

> PostgreSQL version: 9.4.4

> After running fine for weeks, we now find that serializable snapshots han=
g:
>
> our_db=3D> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEF=
ERRABLE;
> START TRANSACTION
> our_db=3D> SELECT pg_export_snapshot();
> (...hangs indefinitely...)

> Killing all backends (i.e. including those accessing other databases)
> unblocked serializable snapshotting.
>
> Is this expected behavior?

What should be happening is that when the serializable read only
deferrable transaction executes the first statement that needs a
snapshot, it grabs a snapshot and looks for concurrent read write
transactions.=C2=A0 If there are none, the snapshot is safe and it goes
ahead with execution; otherwise it goes into a loop where it delays
(to avoid sucking 100% CPU for a core) by calling
ProcWaitForSignal().=C2=A0 Each time that function returns it checks
whether any of the concurrent read write transactions developed a
read-write conflict out (i.e., it read data which was modified by a
transaction concurrent to *it*) to a transaction which committed
before this read only transaction got its snapshot.=C2=A0 If so, this
snapshot is unsafe and we need to discard it, get a new snapshot,
and go back into the loop.=C2=A0 If all read write transactions
concurrent to the read only transaction complete without developing
such a conflict our snapshot is safe and we can proceed to run
without taking predicate locks or risking a serialization failure.

Now, that's pretty technical, and you definitely don't need to
follow all of the above to safely use deferrable transactions.=C2=A0 The
short version is that your initial statement that needs a snapshot
cannot begin to execute *at least* until any concurrent
transactions which are not READ ONLY complete.=C2=A0 One connection
stuck "idle in transaction" can hold it up indefinitely.=C2=A0 A
prepared transaction also counts as a transaction which has not yet
completed.

So if you had any long-running READ WRITE transactions (including
one stuck "idle in transaction" or prepared but not committed), it
is expected behavior.=C2=A0 If not, we should try to gather more
information to figure out what's going on.

One other possibility is that your workload has changed such that
there are now so many read write transactions reading data modified
by other transactions that it is just not finding a safe snapshot
very quickly.=C2=A0 Even when running benchmarks at saturation levels on
a 16 core machine we rarely saw a delay more than six seconds, but
there is no upper bound on how long it might take.=C2=A0 In such a case
you would have a few options, but let's not get ahead of ourselves
-- the first thing is to check for "idle in transaction" or
prepared transaction issues.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #13660: serializable snapshotting hangs

From
Chris Pacejo
Date:
I get that, but I had no other open connections to that database.
There were many connections to *other* databases on that same
*server*, but that shouldn't affect taking a snapshot on an otherwise
unused database, correct?

Also note that once this happened to one database, it happened to
*all* databases on the same server simultaneously.  It's as if the
read-write transaction check is performed server-wide, not
database-wide as I would expect.



On Tue, Oct 6, 2015 at 5:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> On Thursday, October 1, 2015 9:19 PM, "cpacejo@clearskydata.com"
> <cpacejo@clearskydata.com> wrote:
>
>> PostgreSQL version: 9.4.4
>
>> After running fine for weeks, we now find that serializable snapshots
>> hang:
>>
>> our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY,
>> DEFERRABLE;
>> START TRANSACTION
>> our_db=> SELECT pg_export_snapshot();
>> (...hangs indefinitely...)
>
>> Killing all backends (i.e. including those accessing other databases)
>> unblocked serializable snapshotting.
>>
>> Is this expected behavior?
>
> What should be happening is that when the serializable read only
> deferrable transaction executes the first statement that needs a
> snapshot, it grabs a snapshot and looks for concurrent read write
> transactions.  If there are none, the snapshot is safe and it goes
> ahead with execution; otherwise it goes into a loop where it delays
> (to avoid sucking 100% CPU for a core) by calling
> ProcWaitForSignal().  Each time that function returns it checks
> whether any of the concurrent read write transactions developed a
> read-write conflict out (i.e., it read data which was modified by a
> transaction concurrent to *it*) to a transaction which committed
> before this read only transaction got its snapshot.  If so, this
> snapshot is unsafe and we need to discard it, get a new snapshot,
> and go back into the loop.  If all read write transactions
> concurrent to the read only transaction complete without developing
> such a conflict our snapshot is safe and we can proceed to run
> without taking predicate locks or risking a serialization failure.
>
> Now, that's pretty technical, and you definitely don't need to
> follow all of the above to safely use deferrable transactions.  The
> short version is that your initial statement that needs a snapshot
> cannot begin to execute *at least* until any concurrent
> transactions which are not READ ONLY complete.  One connection
> stuck "idle in transaction" can hold it up indefinitely.  A
> prepared transaction also counts as a transaction which has not yet
> completed.
>
> So if you had any long-running READ WRITE transactions (including
> one stuck "idle in transaction" or prepared but not committed), it
> is expected behavior.  If not, we should try to gather more
> information to figure out what's going on.
>
> One other possibility is that your workload has changed such that
> there are now so many read write transactions reading data modified
> by other transactions that it is just not finding a safe snapshot
> very quickly.  Even when running benchmarks at saturation levels on
> a 16 core machine we rarely saw a delay more than six seconds, but
> there is no upper bound on how long it might take.  In such a case
> you would have a few options, but let's not get ahead of ourselves
> -- the first thing is to check for "idle in transaction" or
> prepared transaction issues.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: BUG #13660: serializable snapshotting hangs

From
Kevin Grittner
Date:
On Tuesday, October 6, 2015 4:47 PM, Chris Pacejo <cpacejo@clearskydata.com> wrote:

> but I had no other open connections to that database.  There were
> many connections to *other* databases on that same *server*, but
> that shouldn't affect taking a snapshot on an otherwise unused
> database, correct?

Incorrect.  Like vacuum and other visibility-related operations,
long-running transactions in other databases in the same "cluster"
(in the sense of a group of backends running under a single
postmaster) will be an issue.  I seem to recall that it was
non-trivial to limit things in general to a single database, but
perhaps the DEFERRABLE feature could be enhanced to do that without
too much pain.

> Also note that once this happened to one database, it happened to
> *all* databases on the same server simultaneously.

Right; that is because the check for pending transactions is not
database-sensitive.

> It's as if the read-write transaction check is performed
> server-wide, not database-wide as I would expect.

Undoubtedly that would be nicer, but it is not how the current
implementation works.  Taking a quick look, since we currently have
no way (that I know of) to get the database associated with a
VirtualTransactionId, we would need to add the OID of the database
to the SERIALIZABLEXACT structure and add testing it to the tests
inside this loop:


http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/lmgr/predicate.c;h=bad5618341e3995e538edefd17561cf6db33721e;hb=master#l1766

A change like that would need to be considered an enhancement, not
a bug fix, so it would not be something we could back-patch to
stable branches (or even, at this point, version 9.5).  Given that
it is only a few straightforward lines of code, I'll see about
getting a patch into the next CommitFest so it has a shot at
getting into the release *after* 9.5.  If you're comfortable doing
your own builds from source, you might want to be an "early
adopter" of this patch, once it is written and reviewed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #13660: serializable snapshotting hangs

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Incorrect.  Like vacuum and other visibility-related operations,
> long-running transactions in other databases in the same "cluster"
> (in the sense of a group of backends running under a single
> postmaster) will be an issue.  I seem to recall that it was
> non-trivial to limit things in general to a single database, but
> perhaps the DEFERRABLE feature could be enhanced to do that without
> too much pain.

How will that work in the case of serialized access to a shared catalog?

            regards, tom lane

Re: BUG #13660: serializable snapshotting hangs

From
Kevin Grittner
Date:
On Wednesday, October 7, 2015 8:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:

>> Incorrect.  Like vacuum and other visibility-related operations,
>> long-running transactions in other databases in the same "cluster"
>> (in the sense of a group of backends running under a single
>> postmaster) will be an issue.  I seem to recall that it was
>> non-trivial to limit things in general to a single database, but
>> perhaps the DEFERRABLE feature could be enhanced to do that without
>> too much pain.
>
> How will that work in the case of serialized access to a shared catalog?

The Serializable Snapshot Isolation techniques only work on top of
MVCC snapshot isolation.  At the time of the SSI patch, catalogs
were not accessed according to those rules, so we excluded them
from SSI handling -- it just could not have worked in a meaningful
way.  (So catalog access is currently identical under REPEATABLE
READ and SERIALIZABLE transaction isolation levels.)  I am not
clear on whether MVCC snapshot access changes to the catalogs since
then have made it possible to extend SSI into the catalogs, or what
the pros and cons of that would be.  So, until such time as as SSI
is changed to cover catalogs, it would be safe to ignore that issue
in implementing what the OP wants -- nothing would break.

The question is whether providing this feature would paint us into
a corner in terms of the possibility of later extending SSI to
cover shared catalogs.  It's worth taking a close look at that; but
for right now this is basically a few spots below the top of my
TODO list, and I don't have time available right now to do the
analysis that requires.  Thanks for pointing it out, though.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #13660: serializable snapshotting hangs

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> The Serializable Snapshot Isolation techniques only work on top of
> MVCC snapshot isolation.  At the time of the SSI patch, catalogs
> were not accessed according to those rules, so we excluded them
> from SSI handling -- it just could not have worked in a meaningful
> way.  (So catalog access is currently identical under REPEATABLE
> READ and SERIALIZABLE transaction isolation levels.)

Really?  "SELECT * FROM pg_database" will ignore SSI?

I agree that the system's own catalog accesses don't pay attention
to SSI, and probably can't because of concerns about needing to see
fully up-to-date data.  But that's not my question.  My point is
that userland snapshots can be used to examine shared catalogs,
and even update them if you're a superuser.

            regards, tom lane

Re: BUG #13660: serializable snapshotting hangs

From
Kevin Grittner
Date:
On Wednesday, October 7, 2015 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> The Serializable Snapshot Isolation techniques only work on top of
>> MVCC snapshot isolation.  At the time of the SSI patch, catalogs
>> were not accessed according to those rules, so we excluded them
>> from SSI handling -- it just could not have worked in a meaningful
>> way.  (So catalog access is currently identical under REPEATABLE
>> READ and SERIALIZABLE transaction isolation levels.)
>
> Really?  "SELECT * FROM pg_database" will ignore SSI?

Yes.  If that is run in a SERIALIZABLE transaction, the effect will
be the same as if it were run in a REPEATABLE READ transaction --
it will use the transaction snapshot but will not create
SIReadLocks or examine the MVCC state of the rows read to look for
read-write conflicts.

> I agree that the system's own catalog accesses don't pay attention
> to SSI, and probably can't because of concerns about needing to see
> fully up-to-date data.  But that's not my question.  My point is
> that userland snapshots can be used to examine shared catalogs,
> and even update them if you're a superuser.

In such situations a SERIALIZABLE transaction provides no
additional protections against serialization anomalies involving
the catalogs than a REPEATABLE READ transaction.

If we decided to support the distinction you suggest, I'm not sure
how we would distinguish changes made by DDL commands from changes
made by superuser DML to catalog tables when examining the row's
MVCC data during a scan.  I am also skeptical that the addition of
predicate locks (done from inside heap and index AMs) would have a
sensible way to distinguish between the two cases.  Finally, it's
not immediately obvious what the benefit would be of distinguishing
between the two forms of catalog modification.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #13660: serializable snapshotting hangs

From
Chris Pacejo
Date:
On Wed, Oct 7, 2015 at 8:41 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>
> A change like that would need to be considered an enhancement, not
> a bug fix, so it would not be something we could back-patch to
> stable branches (or even, at this point, version 9.5).  Given that
> it is only a few straightforward lines of code, I'll see about
> getting a patch into the next CommitFest so it has a shot at
> getting into the release *after* 9.5.  If you're comfortable doing
> your own builds from source, you might want to be an "early
> adopter" of this patch, once it is written and reviewed.

Thanks.  We'd love to be an early adopter; we already build some parts
from source (we use pg_dump --snapshot).  We require that different
databases do not block each other and would prefer to avoid the added
complexity of running multiple servers so we'd love to try it out as
soon as it is ready.

Thanks for all the info and your help with this issue.