BUG #13660: serializable snapshotting hangs - Mailing list pgsql-bugs

From cpacejo@clearskydata.com
Subject BUG #13660: serializable snapshotting hangs
Date
Msg-id 20151001185144.335.99956@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #13660: serializable snapshotting hangs  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      13660
Logged by:          Chris Pacejo
Email address:      cpacejo@clearskydata.com
PostgreSQL version: 9.4.4
Operating system:   CentOS 7 (kernel 3.10.0-123.el7.x86_64)
Description:

After running fine for weeks, we now find that serializable snapshots hang:

our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY,
DEFERRABLE;
START TRANSACTION
our_db=> SELECT pg_export_snapshot();
(...hangs indefinitely...)

This occurs on all databases in the cluster.

pg_stat_activity reports for one database:

-[ RECORD 1 ]----+------------------------------
datid            | 16385
datname          | our_db
pid              | 31347
usesysid         | 99450
usename          | our_user
application_name | psql
client_addr      | X.X.X.X
client_hostname  |
client_port      | 55975
backend_start    | 2015-10-01 14:24:12.942063-04
xact_start       | 2015-10-01 14:26:54.437046-04
query_start      | 2015-10-01 14:26:56.245404-04
state_change     | 2015-10-01 14:26:56.245407-04
waiting          | f
state            | active
backend_xid      |
backend_xmin     | 222030266
query            | select pg_export_snapshot();

(i.e., this is the only active transaction on this database)

gdb on this backend reports:

(gdb) bt
#0  0x00007f496a5e6bd7 in semop () from /lib64/libc.so.6
#1  0x000000000061ba47 in PGSemaphoreLock (sema=0x7f495f32f930,
    interruptOK=interruptOK@entry=1 '\001') at pg_sema.c:421
#2  0x000000000066e4c5 in ProcWaitForSignal () at proc.c:1641
#3  0x0000000000673b5d in GetSafeSnapshot (origSnapshot=<optimized out>) at
predicate.c:1534
#4  GetSerializableTransactionSnapshot (snapshot=0xb73aa0
<CurrentSnapshotData>) at predicate.c:1598
#5  0x0000000000782cad in GetTransactionSnapshot () at snapmgr.c:200
#6  0x000000000067df35 in exec_simple_query (query_string=0x24dc0b0 "select
pg_export_snapshot();")
    at postgres.c:986
#7  PostgresMain (argc=<optimized out>, argv=argv@entry=0x244a5f8,
dbname=0x244a4a8 "pod_10003_1",
    username=<optimized out>) at postgres.c:4074
#8  0x0000000000462922 in BackendRun (port=0x247b1d0) at postmaster.c:4164
#9  BackendStartup (port=0x247b1d0) at postmaster.c:3829
#10 ServerLoop () at postmaster.c:1597
#11 0x000000000062d33c in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x2449300)
    at postmaster.c:1244
#12 0x0000000000463569 in main (argc=3, argv=0x2449300) at main.c:228

Killing all backends (i.e. including those accessing other databases)
unblocked serializable snapshotting.

Is this expected behavior?

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #13656: table inheritance, pg_dump emits same constraint for all inheritors causing errors
Next
From: Michael Paquier
Date:
Subject: Re: BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.