Thread: BUG #13660: serializable snapshotting hangs
The following bug has been logged on the website: Bug reference: 13660 Logged by: Chris Pacejo Email address: cpacejo@clearskydata.com PostgreSQL version: 9.4.4 Operating system: CentOS 7 (kernel 3.10.0-123.el7.x86_64) Description: After running fine for weeks, we now find that serializable snapshots hang: our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEFERRABLE; START TRANSACTION our_db=> SELECT pg_export_snapshot(); (...hangs indefinitely...) This occurs on all databases in the cluster. pg_stat_activity reports for one database: -[ RECORD 1 ]----+------------------------------ datid | 16385 datname | our_db pid | 31347 usesysid | 99450 usename | our_user application_name | psql client_addr | X.X.X.X client_hostname | client_port | 55975 backend_start | 2015-10-01 14:24:12.942063-04 xact_start | 2015-10-01 14:26:54.437046-04 query_start | 2015-10-01 14:26:56.245404-04 state_change | 2015-10-01 14:26:56.245407-04 waiting | f state | active backend_xid | backend_xmin | 222030266 query | select pg_export_snapshot(); (i.e., this is the only active transaction on this database) gdb on this backend reports: (gdb) bt #0 0x00007f496a5e6bd7 in semop () from /lib64/libc.so.6 #1 0x000000000061ba47 in PGSemaphoreLock (sema=0x7f495f32f930, interruptOK=interruptOK@entry=1 '\001') at pg_sema.c:421 #2 0x000000000066e4c5 in ProcWaitForSignal () at proc.c:1641 #3 0x0000000000673b5d in GetSafeSnapshot (origSnapshot=<optimized out>) at predicate.c:1534 #4 GetSerializableTransactionSnapshot (snapshot=0xb73aa0 <CurrentSnapshotData>) at predicate.c:1598 #5 0x0000000000782cad in GetTransactionSnapshot () at snapmgr.c:200 #6 0x000000000067df35 in exec_simple_query (query_string=0x24dc0b0 "select pg_export_snapshot();") at postgres.c:986 #7 PostgresMain (argc=<optimized out>, argv=argv@entry=0x244a5f8, dbname=0x244a4a8 "pod_10003_1", username=<optimized out>) at postgres.c:4074 #8 0x0000000000462922 in BackendRun (port=0x247b1d0) at postmaster.c:4164 #9 BackendStartup (port=0x247b1d0) at postmaster.c:3829 #10 ServerLoop () at postmaster.c:1597 #11 0x000000000062d33c in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2449300) at postmaster.c:1244 #12 0x0000000000463569 in main (argc=3, argv=0x2449300) at main.c:228 Killing all backends (i.e. including those accessing other databases) unblocked serializable snapshotting. Is this expected behavior?
On Thursday, October 1, 2015 9:19 PM, "cpacejo@clearskydata.com" <cpacejo@c= learskydata.com> wrote: > PostgreSQL version: 9.4.4 > After running fine for weeks, we now find that serializable snapshots han= g: > > our_db=3D> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEF= ERRABLE; > START TRANSACTION > our_db=3D> SELECT pg_export_snapshot(); > (...hangs indefinitely...) > Killing all backends (i.e. including those accessing other databases) > unblocked serializable snapshotting. > > Is this expected behavior? What should be happening is that when the serializable read only deferrable transaction executes the first statement that needs a snapshot, it grabs a snapshot and looks for concurrent read write transactions.=C2=A0 If there are none, the snapshot is safe and it goes ahead with execution; otherwise it goes into a loop where it delays (to avoid sucking 100% CPU for a core) by calling ProcWaitForSignal().=C2=A0 Each time that function returns it checks whether any of the concurrent read write transactions developed a read-write conflict out (i.e., it read data which was modified by a transaction concurrent to *it*) to a transaction which committed before this read only transaction got its snapshot.=C2=A0 If so, this snapshot is unsafe and we need to discard it, get a new snapshot, and go back into the loop.=C2=A0 If all read write transactions concurrent to the read only transaction complete without developing such a conflict our snapshot is safe and we can proceed to run without taking predicate locks or risking a serialization failure. Now, that's pretty technical, and you definitely don't need to follow all of the above to safely use deferrable transactions.=C2=A0 The short version is that your initial statement that needs a snapshot cannot begin to execute *at least* until any concurrent transactions which are not READ ONLY complete.=C2=A0 One connection stuck "idle in transaction" can hold it up indefinitely.=C2=A0 A prepared transaction also counts as a transaction which has not yet completed. So if you had any long-running READ WRITE transactions (including one stuck "idle in transaction" or prepared but not committed), it is expected behavior.=C2=A0 If not, we should try to gather more information to figure out what's going on. One other possibility is that your workload has changed such that there are now so many read write transactions reading data modified by other transactions that it is just not finding a safe snapshot very quickly.=C2=A0 Even when running benchmarks at saturation levels on a 16 core machine we rarely saw a delay more than six seconds, but there is no upper bound on how long it might take.=C2=A0 In such a case you would have a few options, but let's not get ahead of ourselves -- the first thing is to check for "idle in transaction" or prepared transaction issues. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I get that, but I had no other open connections to that database. There were many connections to *other* databases on that same *server*, but that shouldn't affect taking a snapshot on an otherwise unused database, correct? Also note that once this happened to one database, it happened to *all* databases on the same server simultaneously. It's as if the read-write transaction check is performed server-wide, not database-wide as I would expect. On Tue, Oct 6, 2015 at 5:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > On Thursday, October 1, 2015 9:19 PM, "cpacejo@clearskydata.com" > <cpacejo@clearskydata.com> wrote: > >> PostgreSQL version: 9.4.4 > >> After running fine for weeks, we now find that serializable snapshots >> hang: >> >> our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, >> DEFERRABLE; >> START TRANSACTION >> our_db=> SELECT pg_export_snapshot(); >> (...hangs indefinitely...) > >> Killing all backends (i.e. including those accessing other databases) >> unblocked serializable snapshotting. >> >> Is this expected behavior? > > What should be happening is that when the serializable read only > deferrable transaction executes the first statement that needs a > snapshot, it grabs a snapshot and looks for concurrent read write > transactions. If there are none, the snapshot is safe and it goes > ahead with execution; otherwise it goes into a loop where it delays > (to avoid sucking 100% CPU for a core) by calling > ProcWaitForSignal(). Each time that function returns it checks > whether any of the concurrent read write transactions developed a > read-write conflict out (i.e., it read data which was modified by a > transaction concurrent to *it*) to a transaction which committed > before this read only transaction got its snapshot. If so, this > snapshot is unsafe and we need to discard it, get a new snapshot, > and go back into the loop. If all read write transactions > concurrent to the read only transaction complete without developing > such a conflict our snapshot is safe and we can proceed to run > without taking predicate locks or risking a serialization failure. > > Now, that's pretty technical, and you definitely don't need to > follow all of the above to safely use deferrable transactions. The > short version is that your initial statement that needs a snapshot > cannot begin to execute *at least* until any concurrent > transactions which are not READ ONLY complete. One connection > stuck "idle in transaction" can hold it up indefinitely. A > prepared transaction also counts as a transaction which has not yet > completed. > > So if you had any long-running READ WRITE transactions (including > one stuck "idle in transaction" or prepared but not committed), it > is expected behavior. If not, we should try to gather more > information to figure out what's going on. > > One other possibility is that your workload has changed such that > there are now so many read write transactions reading data modified > by other transactions that it is just not finding a safe snapshot > very quickly. Even when running benchmarks at saturation levels on > a 16 core machine we rarely saw a delay more than six seconds, but > there is no upper bound on how long it might take. In such a case > you would have a few options, but let's not get ahead of ourselves > -- the first thing is to check for "idle in transaction" or > prepared transaction issues. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
On Tuesday, October 6, 2015 4:47 PM, Chris Pacejo <cpacejo@clearskydata.com> wrote: > but I had no other open connections to that database. There were > many connections to *other* databases on that same *server*, but > that shouldn't affect taking a snapshot on an otherwise unused > database, correct? Incorrect. Like vacuum and other visibility-related operations, long-running transactions in other databases in the same "cluster" (in the sense of a group of backends running under a single postmaster) will be an issue. I seem to recall that it was non-trivial to limit things in general to a single database, but perhaps the DEFERRABLE feature could be enhanced to do that without too much pain. > Also note that once this happened to one database, it happened to > *all* databases on the same server simultaneously. Right; that is because the check for pending transactions is not database-sensitive. > It's as if the read-write transaction check is performed > server-wide, not database-wide as I would expect. Undoubtedly that would be nicer, but it is not how the current implementation works. Taking a quick look, since we currently have no way (that I know of) to get the database associated with a VirtualTransactionId, we would need to add the OID of the database to the SERIALIZABLEXACT structure and add testing it to the tests inside this loop: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/lmgr/predicate.c;h=bad5618341e3995e538edefd17561cf6db33721e;hb=master#l1766 A change like that would need to be considered an enhancement, not a bug fix, so it would not be something we could back-patch to stable branches (or even, at this point, version 9.5). Given that it is only a few straightforward lines of code, I'll see about getting a patch into the next CommitFest so it has a shot at getting into the release *after* 9.5. If you're comfortable doing your own builds from source, you might want to be an "early adopter" of this patch, once it is written and reviewed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Incorrect. Like vacuum and other visibility-related operations, > long-running transactions in other databases in the same "cluster" > (in the sense of a group of backends running under a single > postmaster) will be an issue. I seem to recall that it was > non-trivial to limit things in general to a single database, but > perhaps the DEFERRABLE feature could be enhanced to do that without > too much pain. How will that work in the case of serialized access to a shared catalog? regards, tom lane
On Wednesday, October 7, 2015 8:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> Incorrect. Like vacuum and other visibility-related operations, >> long-running transactions in other databases in the same "cluster" >> (in the sense of a group of backends running under a single >> postmaster) will be an issue. I seem to recall that it was >> non-trivial to limit things in general to a single database, but >> perhaps the DEFERRABLE feature could be enhanced to do that without >> too much pain. > > How will that work in the case of serialized access to a shared catalog? The Serializable Snapshot Isolation techniques only work on top of MVCC snapshot isolation. At the time of the SSI patch, catalogs were not accessed according to those rules, so we excluded them from SSI handling -- it just could not have worked in a meaningful way. (So catalog access is currently identical under REPEATABLE READ and SERIALIZABLE transaction isolation levels.) I am not clear on whether MVCC snapshot access changes to the catalogs since then have made it possible to extend SSI into the catalogs, or what the pros and cons of that would be. So, until such time as as SSI is changed to cover catalogs, it would be safe to ignore that issue in implementing what the OP wants -- nothing would break. The question is whether providing this feature would paint us into a corner in terms of the possibility of later extending SSI to cover shared catalogs. It's worth taking a close look at that; but for right now this is basically a few spots below the top of my TODO list, and I don't have time available right now to do the analysis that requires. Thanks for pointing it out, though. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > The Serializable Snapshot Isolation techniques only work on top of > MVCC snapshot isolation. At the time of the SSI patch, catalogs > were not accessed according to those rules, so we excluded them > from SSI handling -- it just could not have worked in a meaningful > way. (So catalog access is currently identical under REPEATABLE > READ and SERIALIZABLE transaction isolation levels.) Really? "SELECT * FROM pg_database" will ignore SSI? I agree that the system's own catalog accesses don't pay attention to SSI, and probably can't because of concerns about needing to see fully up-to-date data. But that's not my question. My point is that userland snapshots can be used to examine shared catalogs, and even update them if you're a superuser. regards, tom lane
On Wednesday, October 7, 2015 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> The Serializable Snapshot Isolation techniques only work on top of >> MVCC snapshot isolation. At the time of the SSI patch, catalogs >> were not accessed according to those rules, so we excluded them >> from SSI handling -- it just could not have worked in a meaningful >> way. (So catalog access is currently identical under REPEATABLE >> READ and SERIALIZABLE transaction isolation levels.) > > Really? "SELECT * FROM pg_database" will ignore SSI? Yes. If that is run in a SERIALIZABLE transaction, the effect will be the same as if it were run in a REPEATABLE READ transaction -- it will use the transaction snapshot but will not create SIReadLocks or examine the MVCC state of the rows read to look for read-write conflicts. > I agree that the system's own catalog accesses don't pay attention > to SSI, and probably can't because of concerns about needing to see > fully up-to-date data. But that's not my question. My point is > that userland snapshots can be used to examine shared catalogs, > and even update them if you're a superuser. In such situations a SERIALIZABLE transaction provides no additional protections against serialization anomalies involving the catalogs than a REPEATABLE READ transaction. If we decided to support the distinction you suggest, I'm not sure how we would distinguish changes made by DDL commands from changes made by superuser DML to catalog tables when examining the row's MVCC data during a scan. I am also skeptical that the addition of predicate locks (done from inside heap and index AMs) would have a sensible way to distinguish between the two cases. Finally, it's not immediately obvious what the benefit would be of distinguishing between the two forms of catalog modification. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 7, 2015 at 8:41 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > > A change like that would need to be considered an enhancement, not > a bug fix, so it would not be something we could back-patch to > stable branches (or even, at this point, version 9.5). Given that > it is only a few straightforward lines of code, I'll see about > getting a patch into the next CommitFest so it has a shot at > getting into the release *after* 9.5. If you're comfortable doing > your own builds from source, you might want to be an "early > adopter" of this patch, once it is written and reviewed. Thanks. We'd love to be an early adopter; we already build some parts from source (we use pg_dump --snapshot). We require that different databases do not block each other and would prefer to avoid the added complexity of running multiple servers so we'd love to try it out as soon as it is ready. Thanks for all the info and your help with this issue.