Re: BUG #11732: Non-serializable outcomes under serializable isolation - Mailing list pgsql-bugs
From | Peter Bailis |
---|---|
Subject | Re: BUG #11732: Non-serializable outcomes under serializable isolation |
Date | |
Msg-id | CALgH=-MOEhRm2Lzq8sxeaHDwaJaUnaYSEAV0ZQ7FfZs08_4XKw@mail.gmail.com Whole thread Raw |
In response to | BUG #11732: Non-serializable outcomes under serializable isolation (pbailis@cs.berkeley.edu) |
List | pgsql-bugs |
I am also able to reproduce the behavior in PL/pgSQL given sufficiently high concurrency: CREATE FUNCTION stress_fn(target_key varchar) RETURNS varchar AS $$ DECLARE found_key varchar; BEGIN PERFORM 1 AS one FROM "stress" WHERE "stress"."key" =3D target_key; IF FOUND THEN RAISE EXCEPTION 'already inserted'; END IF; INSERT INTO "stress" ("key") VALUES (target_key) RETURNING "id" INTO found_key; RETURN found_key; END; Here's an updated script using this function (still using Python to generate concurrent transactions): https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-stored-= procedure-py On Tue, Oct 21, 2014 at 12:14 AM, <pbailis@cs.berkeley.edu> wrote: > The following bug has been logged on the website: > > Bug reference: 11732 > Logged by: Peter Bailis > Email address: pbailis@cs.berkeley.edu > PostgreSQL version: 9.3.5 > Operating system: Ubuntu 14.04.1 LTS > Description: > > > When issuing a high-contention workload under SERIALIZABLE isolation, I a= m > able to produce non-serializable outcomes. I am able to reproduce this > behavior both via a Ruby on Rails deployment and via a standalone Python > script that generates concurrent transactions > (https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-py). > > BASIC WORKLOAD: > > The basic workload consists of a series of concurrent transactions > performing read-modify-insert operations on a non-indexed varchar column. > The workload repeatedly issues a set of K transactions in parallel, > incrementing a sequence number for each wave. Each transaction within a > wave > checks whether or not the non-indexed varchar column contains the sequenc= e > number encoded as a string (via a SELECT WHERE query). If so, the query > rolls back. If not, attempts to insert a new row with the sequence number= . > (This is effectively a poor man=E2=80=99s version of enforcing uniqueness > constraints.) Under SERIALIZABLE isolation, only one transaction should > successfully insert an entry into the varchar column per sequence number. > > (Reasoning: If, within a wave, a transaction T_1 observes that another > transaction T_2 has already updated the column, it will not attempt to > insert. If, within a wave, a transaction T_1 does not observe another > transaction T_2's successful, committed insertion to the same sequence > number, T_1 should abort to preserve serializability and will not insert. > In > the event that all transactions within a wave abort, there will be no > insertions to that sequence number.) > > As a minimal working example, consider the following table, with the =E2= =80=9Ckey=E2=80=9D > column holding the sequence number varchar column from above: > > TABLE "stress" ("id" serial primary key, "key" character varying(255)= ) > > and the following transaction, executed in parallel and parameterized by = K, > a varchar: > > BEGIN TRANSACTION; > result =3D 'SELECT 1 AS one FROM "stress" WHERE "stress"."key" =3D K= ;' > > if result is not NULL: > 'INSERT INTO "stress" ("key") VALUES (K) RETURNING "id";' > COMMIT TRANSACTION; > else: > ROLLBACK TRANSACTION; > > (In both implementations I've built, the logic has been written in the > application, and the single-quoted queries run in PostgreSQL.) > > EXPECTED BEHAVIOR: Under serializable isolation, there should only be one > record per 'key': > > SELECT key, COUNT(*) from stress GROUP BY key HAVING COUNT(*) > 1; > key | count > -----+------- > (0 rows) > > ACTUAL BEHAVIOR: Under serializable behavior, there are duplicates in the > "key" column. > > SELECT key, COUNT(*) from stress GROUP BY key HAVING COUNT(*) > 1; > key | count > -----+------- > 745 | 2 > 274 | 2 > 55 | 2 > (3 rows) > > I realize this is a weird workload, but it's derived from a set of querie= s > in Rails's ORM. > > ENVIRONMENT DETAILS: > > I have reproduced this behavior in two separate environments: > > PostgreSQL version(): PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, > compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit > PostgreSQL installation method: vanilla install via apt ('postgres-9.3') > OS: Ubuntu 14.04 LTS > Hardware: EC2 m2.4xlarge instance with 8 cores, 68.4GB memory, and 840GB > local ephemeral storage > > PostgreSQL version: PostgreSQL 9.3.5 on x86_64-apple-darwin13.3.0, > compiled > by Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit > PostgreSQL installation method: vanilla install via brew ('postgres') > OS: Apple OS X 10.9.4 > Hardware: Macbook Pro, 2.8GHz Intel i5, 16GB RAM, SSD > > Configuration changes: I've changed the default postgresql.conf > default_transaction_isolation to 'serializable' and increasing the number > of > connections and memory to accommodate multiple parallel connections (in t= he > script I've provided, 128): > default_transaction_isolation =3D 'serializable' > max_connections =3D 200 > shared_buffers =3D 1GB > > I originally thought this must be a bug in Rails, but I've now implemente= d > the transaction logic in both Rails 4 and Python (using psycopg2) and > reproduced this behavior in both environments. Autocommit is disabled, an= d > the logs (see > > https://gist.github.com/pbailis/503430309104e3f7ab79#file-postgres-output= -txt > ) > demonstrate that serializable isolation is enabled (i.e., "Reason code: > Canceled on identification as a pivot, during write"). This leads me to > believe it is a consequence of the SSI implementation. > > REPRODUCING: > > Set up a new PostgreSQL database. The Python script assumes a database > called 'stress' and a user called 'stress'. The following command will se= t > up a new database in '/tmp/pgdata': > export PGDATA=3D/tmp/pgdata; rm -rf $PGDATA; pg_ctl init; printf > "default_transaction_isolation =3D > 'serializable'\nshared_buffers=3D1GB\nmax_connections=3D200\n" >> > $PGDATA/postgresql.conf; postgres -D /tmp/pgdata & sleep 2; psql postgres > -c > "CREATE USER stress WITH PASSWORD 'stress'"; psql postgres -c "CREATE > DATABASE stress"; psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE stre= ss > to stress"; fg > > Run a series of concurrent sets of transactions as described above, or ju= st > run the Python tool > (https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-py). > > Check whether the 'key' column contains duplicate entries (e.g., > > https://gist.github.com/pbailis/503430309104e3f7ab79#file-state-of-databa= se-after-pg-stress-py > ). > > This may take a few tries. I'd estimate that over half of my runs in both > environments (going through the whole setup -- database setup plus Python > script) reproduce this behavior. The current script is a bit of a hack, > leading to high variance in timing behavior. > > OTHER NOTES: > > I haven't had much luck reproducing with a non-varchar key (e.g., (key > integer)) and without RETURNING "id". Putting a unique index on "key" see= ms > to work as expected. > The behavior seems to appear more often when run on a new PostgreSQL > installation. > If the script hangs, check the contents of the table -- there may be a > duplicate. However, not all duplicates freeze the script. > > I'm happy to debug further, but I'm sending what I have so far. > > >
pgsql-bugs by date: