Thread: High rate of transaction failure with the Serializable Isolation Level

High rate of transaction failure with the Serializable Isolation Level

From
Reza Taheri
Date:

Hello PGSQL performance community,

[By way of introduction, we are a TPC subcommittee that is developing a benchmark with cloud-like characteristics for virtualized databases. The end-to-end benchmarking kit will be publicly available, and will run on PGSQL]

 

I am running into very high failure rates when I run with the Serializable Isolation Level. I have simplified our configuration to a single database with a constant workload, a TPC-E workload if you will, to focus on this this problem. We are running with PGSQL 9.2.4, ODBC 2.2.14 (as well as 2.3.3pre, which didn’t help), RHEL 6.4, and a 6-way VM with 96GB of memory on a 4-socket Westmere server.

 

With our 9 transactions running with a mix of SQL_TXN_READ_COMMITTED and SQL_TXN_REPEATABLE_READ, we get less than 1% deadlocks, all of which occur because each row in one table, BROKER, may be read or written by multiple transactions at the same time. So, there are legitimate conflicts, which we deal with using an exponential backoff algorithm that sleeps for 10ms/30ms/90ms/etc.

 

When we raise the Trade-Result transaction to SQL_TXN_SERIALIZABLE, we face a storm of conflicts. Out of 37,342 Trade-Result transactions, 15,707 hit an error, and have to be rolled back and retired one or more times. The total failure count (due to many transactions failing more than once) is 31,388.

 

What is unusual is that the majority of the failures occur in a statement that should not have any isolation conflicts. About 17K of failures are from the statement below:

2014-07-23 11:27:15 PDT 26085 ERROR:  could not serialize access due to read/write dependencies among transactions

2014-07-23 11:27:15 PDT 26085 DETAIL:  Reason code: Canceled on identification as a pivot, during write.

2014-07-23 11:27:15 PDT 26085 HINT:  The transaction might succeed if retried.

2014-07-23 11:27:15 PDT 26085 CONTEXT:  SQL statement "update   TRADE

                set     T_COMM = comm_amount,

                        T_DTS = trade_dts,

                        T_ST_ID = st_completed_id,

                        T_TRADE_PRICE = trade_price

                where   T_ID = trade_id"

        PL/pgSQL function traderesultframe5(ident_t,value_t,character,timestamp without time zone,trade_t,s_price_t) line 15 at SQL statement

 

This doesn’t make sense since at any given time, only one transaction might possibly be accessing the row that is being updated. There should be no conflicts if we have row-level locking/isolation


The second most common conflict happens 7.6K times in the statement below:

2014-07-23 11:27:23 PDT 26039 ERROR:  could not serialize access due to read/write dependencies among transactions

2014-07-23 11:27:23 PDT 26039 DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.

2014-07-23 11:27:23 PDT 26039 HINT:  The transaction might succeed if retried.

2014-07-23 11:27:23 PDT 26039 CONTEXT:  SQL statement "insert

                into    SETTLEMENT (    SE_T_ID,

                                        SE_CASH_TYPE,

                                        SE_CASH_DUE_DATE,

                                        SE_AMT)

                values (        trade_id,

                                cash_type,

                                due_date,

                                se_amount

                        )"

        PL/pgSQL function traderesultframe6(ident_t,timestamp without time zone,character varying,value_t,timestamp without time zone,trade_t,smallint,s_qty_t,character) line 23 at SQL statement

 

I don’t understand why an insert would hit a serialization conflict

 

We also have 4.5K conflicts when we try to commit:

2014-07-23 11:27:23 PDT 26037 ERROR:  could not serialize access due to read/write dependencies among transactions

2014-07-23 11:27:23 PDT 26037 DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.

2014-07-23 11:27:23 PDT 26037 HINT:  The transaction might succeed if retried.

2014-07-23 11:27:23 PDT 26037 STATEMENT:  COMMIT

 

 

Does PGSQL raise locks to page level when we run with SQL_TXN_SERIALIZABLE? Are there any knobs I can play with to alleviate this?  FWIW, the same transactions on MS SQL Server see almost no conflicts.

 

Thanks,
Reza

Re: High rate of transaction failure with the Serializable Isolation Level

From
Craig Ringer
Date:
On 07/24/2014 09:18 AM, Reza Taheri wrote:
> What is unusual is that the majority of the failures occur in a
> statement that should not have any isolation conflicts. About 17K of
> failures are from the statement below:

It's not just that statement that is relevant.

At SERIALIZABLE isolation the entire transaction's actions must be
considered, as must the conflicting transaction.

> This doesn’t make sense since at any given time, only one transaction
> might possibly be accessing the row that is being updated. There should
> be no conflicts if we have row-level locking/isolation.

Is that statement run standalone, or as part of a larger transaction?

> The second most common conflict happens 7.6K times in the statement below:
...
> I don’t understand why an insert would hit a serialization conflict

If the INSERTing transaction previously queried for a key that was
created by a concurrent transaction this can occur as there is no
serialization execution order of the transactions that could produce the
same result.

This doesn't produce exactly the same error, but demonstrates one such case:


regress=> CREATE TABLE demo (id integer primary key, value integer);
CREATE TABLE
regress=> INSERT INTO demo(id, value) VALUES (1, 42);
INSERT 0 1

then

regress=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
regress=> SELECT id FROM demo WHERE id = 2;
 id
----
(0 rows)


session1=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
session2=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN

session1=> SELECT id FROM demo WHERE id = 2;
 id
----
(0 rows)

session2=> SELECT id FROM demo WHERE id = 3;
 id
----
(0 rows)


session1=> INSERT INTO demo VALUES (3, 43);
INSERT 0 1

session2=> INSERT INTO demo VALUES (2, 43);
INSERT 0 1

session2=> COMMIT;
COMMIT

session1=> COMMIT;
ERROR:  could not serialize access due to read/write dependencies among
transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during
commit attempt.
HINT:  The transaction might succeed if retried.

> Does PGSQL raise locks to page level when we run with
> SQL_TXN_SERIALIZABLE?

From the documentation
(http://www.postgresql.org/docs/current/static/transaction-iso.html):

> Predicate locks in PostgreSQL, like in most other database systems, are based on data actually accessed by a
transaction.These will show up in the pg_locks system view with a mode of SIReadLock. The particular locks acquired
duringexecution of a query will depend on the plan used by the query, and multiple finer-grained locks (e.g., tuple
locks)may be combined into fewer coarser-grained locks (e.g., page locks) during the course of the transaction to
preventexhaustion of the memory used to track the locks. 

... so yes, it may raise locks to page level. That doesn't mean that's
necessarily what's happening here.

> Are there any knobs I can play with to alleviate
> this?

A lower FILLFACTOR can spread data out at the cost of wasted space.

> FWIW, the same transactions on MS SQL Server see almost no conflicts.

Many DBMSs don't detect all serialization anomalies. PostgreSQL doesn't
detect all possible anomalies but it detects many that other systems may
not.

To see what's going on and why MS SQL Server (version?) doesn't
complain, it'd be best to boil each case down to a minimal reproducible
test case that can be analyzed in isolation.

PostgreSQL's isolationtester tool, in src/test/isolation, can be handy
for automating this kind of conflict, and provides some useful examples
of cases that are detected.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: High rate of transaction failure with the Serializable Isolation Level

From
Craig Ringer
Date:
On 07/24/2014 09:18 AM, Reza Taheri wrote:
> Does PGSQL raise locks to page level when we run with
> SQL_TXN_SERIALIZABLE? Are there any knobs I can play with to alleviate
> this?  FWIW, the same transactions on MS SQL Server see almost no conflicts.
>

Also, in the documentation
(http://www.postgresql.org/docs/current/static/transaction-iso.html):

> When the system is forced to combine multiple page-level predicate locks into a single relation-level predicate lock
becausethe predicate lock table is short of memory, an increase in the rate of serialization failures may occur. You
canavoid this by increasing max_pred_locks_per_transaction. 

... so I suggest experimenting with higher
max_pred_locks_per_transaction values.

http://www.postgresql.org/docs/9.1/static/runtime-config-locks.html#GUC-MAX-PRED-LOCKS-PER-TRANSACTION

... though that should only really affect object level locks (tables,
etc) according to the docs. I'd need to dig further to determine how to
reduce or eliminate lock combining of row-level to page-level and
page-level to object-level locks.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: High rate of transaction failure with the Serializable Isolation Level

From
Kevin Grittner
Date:
Reza Taheri <rtaheri@vmware.com> wrote:

> I am running into very high failure rates when I run with the
> Serializable Isolation Level. I have simplified our configuration
> to a single database with a constant workload, a TPC-E workload
> if you will, to focus on this this problem. We are running with
> PGSQL 9.2.4

I don't remember any bug fixes that would be directly related to
what you describe in the last 15 months, but it might be better to
do any testing with fixes for known bugs:

http://www.postgresql.org/support/versioning/

> When we raise the Trade-Result transaction to
> SQL_TXN_SERIALIZABLE, we face a storm of conflicts. Out of
> 37,342 Trade-Result transactions, 15,707 hit an error, and have
> to be rolled back and retired one or more times. The total
> failure count (due to many transactions failing more than once)
> is 31,388.
>
> What is unusual is that the majority of the failures occur in a
> statement that should not have any isolation conflicts.

As already pointed out by Craig, statements don't have
serialization failures; transactions do.  In some cases a
transaction may become "doomed to fail" by the action of a
concurrent transaction, but the actual failure cannot occur until
the next statement is run on the connection with the doomed
transaction; it may have nothing to do with the statement itself.

If you want to understand the theory of how SERIALIZABLE
transactions are implemented in PostgreSQL, these links may help:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/storage/lmgr/README-SSI;hb=master

http://wiki.postgresql.org/wiki/Serializable

For a more practical set of examples about the differences in
using REPEATABLE READ and SERIALIZABLE transaction isolation levels
in PostgreSQL, see:

http://wiki.postgresql.org/wiki/SSI

If you are just interested in reducing the number of serialization
failures, see the suggestions near the end of this section of the
documentation:

http://www.postgresql.org/docs/9.2/interactive/transaction-iso.html#XACT-SERIALIZABLE

Any of these items (or perhaps a combination of them) may
ameliorate the problem.  Note that I have seen reports of cases
where max_pred_locks_per_transaction needed to be set to 20x the
default to reduce serialization failures to an acceptable level.
The default is intentionally set very low because so many people do
not use this isolation level, and this setting reserves shared
memory for purposes of tracking serializable transactions; the
space is wasted for those who don't choose to use them.

There is still a lot of work possible to reduce the rate of false
positives, which has largely gone undone so far due to a general
lack of problem reports from people which could not be solved
through tuning.  If you have such a case, it would be interesting
to have all relevant details, so that we can target which of the
many enhancements are relevant to your case.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Hi Craig,
> It's not just that statement that is relevant.
> Is that statement run standalone, or as part of a larger transaction?

Yes, the "size" of the transaction seems to matter here. It is a complex transaction (attached). Each "frame" is one
storedprocedure, and the 6 frames are called one after the other with no pause. After frame6 returns, we call
SQLTransact(...,...,  SQL_COMMIT). Below is the failure rate of the various frames: 

    112 tid 18883: SQL Failed: DoTradeResultFrame3
    102 tid 18883: SQL Failed: DoTradeResultFrame4
  18188 tid 18883: SQL Failed: DoTradeResultFrame5
   8566 tid 18883: SQL Failed: DoTradeResultFrame6
   4492 tid 18883: ERROR: TradeResultDB: commit failed

So, no failures in frames 1 and 2, and then the failure rate grows as we approach the end of the transaction.

> If the INSERTing transaction previously queried for a key that was created by a concurrent transaction this can occur
asthere is no serialization 
> execution order of the transactions that could produce the same result.

As far as the inserts, your point is well-taken. But in this case, I have eliminated the transactions that query or
otherwisemanipulate the SETTELEMENT table. The only access to it is the single insert in this transaction 

> A lower FILLFACTOR can spread data out at the cost of wasted space.

Interesting idea!  Let me look into this. Even if this is not practical (our tables are 10s and 100s of GBs), if I can
forcea single row per page and the problem goes away, then we learn something 

> PostgreSQL's isolationtester tool, in src/test/isolation, can be handy for automating this kind of conflict,
> and provides some useful examples of cases that are detected.

Didn't know about this tool. Let me look into it!

Thanks again for the reply,
Reza

> -----Original Message-----
> From: Craig Ringer [mailto:craig@2ndquadrant.com]
> Sent: Wednesday, July 23, 2014 9:58 PM
> To: Reza Taheri; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High rate of transaction failure with the Serializable
> Isolation Level
>
> On 07/24/2014 09:18 AM, Reza Taheri wrote:
> > What is unusual is that the majority of the failures occur in a
> > statement that should not have any isolation conflicts. About 17K of
> > failures are from the statement below:
>
> It's not just that statement that is relevant.
>
> At SERIALIZABLE isolation the entire transaction's actions must be
> considered, as must the conflicting transaction.
>
> > This doesn't make sense since at any given time, only one transaction
> > might possibly be accessing the row that is being updated. There
> > should be no conflicts if we have row-level locking/isolation.
>
> Is that statement run standalone, or as part of a larger transaction?
>
> > The second most common conflict happens 7.6K times in the statement
> below:
> ...
> > I don't understand why an insert would hit a serialization conflict
>
> If the INSERTing transaction previously queried for a key that was created by
> a concurrent transaction this can occur as there is no serialization execution
> order of the transactions that could produce the same result.
>
> This doesn't produce exactly the same error, but demonstrates one such
> case:
>
>
> regress=> CREATE TABLE demo (id integer primary key, value integer);
> CREATE TABLE regress=> INSERT INTO demo(id, value) VALUES (1, 42);
> INSERT 0 1
>
> then
>
> regress=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN
> regress=> SELECT id FROM demo WHERE id = 2;  id
> ----
> (0 rows)
>
>
> session1=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN
> session2=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN
>
> session1=> SELECT id FROM demo WHERE id = 2;  id
> ----
> (0 rows)
>
> session2=> SELECT id FROM demo WHERE id = 3;  id
> ----
> (0 rows)
>
>
> session1=> INSERT INTO demo VALUES (3, 43); INSERT 0 1
>
> session2=> INSERT INTO demo VALUES (2, 43); INSERT 0 1
>
> session2=> COMMIT;
> COMMIT
>
> session1=> COMMIT;
> ERROR:  could not serialize access due to read/write dependencies among
> transactions
> DETAIL:  Reason code: Canceled on identification as a pivot, during commit
> attempt.
> HINT:  The transaction might succeed if retried.
>
> > Does PGSQL raise locks to page level when we run with
> > SQL_TXN_SERIALIZABLE?
>
> From the documentation
> (https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/d
> ocs/current/static/transaction-
> iso.html%29:&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPj
> roD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=TLPOH83mhBZDaYDaC
> sh%2F8g2qVmFXtdg7HcUqXymxn40%3D%0A&s=32832df25ebb8166a18523bd
> 9d6ec00f5ad545ea3bc1f8e95808ba65b4766130
>
> > Predicate locks in PostgreSQL, like in most other database systems, are
> based on data actually accessed by a transaction. These will show up in the
> pg_locks system view with a mode of SIReadLock. The particular locks
> acquired during execution of a query will depend on the plan used by the
> query, and multiple finer-grained locks (e.g., tuple locks) may be combined
> into fewer coarser-grained locks (e.g., page locks) during the course of the
> transaction to prevent exhaustion of the memory used to track the locks.
>
> ... so yes, it may raise locks to page level. That doesn't mean that's
> necessarily what's happening here.
>
> > Are there any knobs I can play with to alleviate this?
>
> A lower FILLFACTOR can spread data out at the cost of wasted space.
>
> > FWIW, the same transactions on MS SQL Server see almost no conflicts.
>
> Many DBMSs don't detect all serialization anomalies. PostgreSQL doesn't
> detect all possible anomalies but it detects many that other systems may not.
>
> To see what's going on and why MS SQL Server (version?) doesn't complain,
> it'd be best to boil each case down to a minimal reproducible test case that
> can be analyzed in isolation.
>
> PostgreSQL's isolationtester tool, in src/test/isolation, can be handy for
> automating this kind of conflict, and provides some useful examples of cases
> that are detected.
>
> --
>  Craig Ringer
> https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com
> /&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjroD2HLPTH
> U27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=TLPOH83mhBZDaYDaCsh%2F8g2q
> VmFXtdg7HcUqXymxn40%3D%0A&s=3b1587fc43a994ddcf59e658e2521e9a9c
> 847393fb4ab8dc48df009b547cca55
>  PostgreSQL Development, 24x7 Support, Training & Services

Attachment
Hi Kevin,
Thanks for the reply

> As already pointed out by Craig, statements don't have serialization failures; transactions do.  In some cases a
transactionmay become 
> "doomed to fail" by the action of a concurrent transaction, but the actual failure cannot occur until the next
statementis run on the 
> connection with the doomed transaction; it may have nothing to do with the statement itself.

That's an interesting concept. I suppose I could test it by moving statements around to see what happens.


> Note that I have seen reports of cases where max_pred_locks_per_transaction needed to be set to 20x the default to
> reduce serialization failures to an acceptable level.


I was running with the following two parameters set to 640; I then raised them to 6400, and saw no difference

max_locks_per_transaction = 6400
max_pred_locks_per_transaction = 6400

Thanks,
Reza

> -----Original Message-----
> From: Kevin Grittner [mailto:kgrittn@ymail.com]
> Sent: Thursday, July 24, 2014 7:03 AM
> To: Reza Taheri; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High rate of transaction failure with the Serializable
> Isolation Level
>
> Reza Taheri <rtaheri@vmware.com> wrote:
>
> > I am running into very high failure rates when I run with the
> > Serializable Isolation Level. I have simplified our configuration to a
> > single database with a constant workload, a TPC-E workload if you
> > will, to focus on this this problem. We are running with PGSQL 9.2.4
>
> I don't remember any bug fixes that would be directly related to what you
> describe in the last 15 months, but it might be better to do any testing with
> fixes for known bugs:
>
> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/su
> pport/versioning/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKm
> A0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=H8C2sdUv2dsUC
> 9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0A&s=6522bd258d0a034429522b
> 61239134b07f1cabc086e8c2cb330aa9c9bc4a337d
>
> > When we raise the Trade-Result transaction to SQL_TXN_SERIALIZABLE, we
> > face a storm of conflicts. Out of
> > 37,342 Trade-Result transactions, 15,707 hit an error, and have to be
> > rolled back and retired one or more times. The total failure count
> > (due to many transactions failing more than once) is 31,388.
> >
> > What is unusual is that the majority of the failures occur in a
> > statement that should not have any isolation conflicts.
>
> As already pointed out by Craig, statements don't have serialization failures;
> transactions do.  In some cases a transaction may become "doomed to fail"
> by the action of a concurrent transaction, but the actual failure cannot occur
> until the next statement is run on the connection with the doomed
> transaction; it may have nothing to do with the statement itself.
>
> If you want to understand the theory of how SERIALIZABLE transactions are
> implemented in PostgreSQL, these links may help:
>
> https://urldefense.proofpoint.com/v1/url?u=http://vldb.org/pvldb/vol5/p1
> 850_danrkports_vldb2012.pdf&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0
> A&r=b9TKmA0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=H8C
> 2sdUv2dsUC9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0A&s=d1b8cd62c431
> c267124c21d4e639c98eebb650caaf8fd05ba47aa825a9b54a52
>
> https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitwe
> b/?p%3Dpostgresql.git%3Ba%3Dblob_plain%3Bf%3Dsrc/backend/storage/lm
> gr/README-
> SSI%3Bhb%3Dmaster&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9T
> KmA0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=H8C2sdUv2d
> sUC9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0A&s=1f60010253b8012dbe5
> e5a51af48fcb831dae81200708f620438e6afb48c0eef
>
> https://urldefense.proofpoint.com/v1/url?u=http://wiki.postgresql.org/wiki
> /Serializable&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPj
> roD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=H8C2sdUv2dsUC9oH2
> yzDssdTbCEBF5mbQZbZ871laGw%3D%0A&s=040078780771088975f2abe3668
> 5b182ca626557ed2cd1c7241c78b9f417d325
>
> For a more practical set of examples about the differences in using
> REPEATABLE READ and SERIALIZABLE transaction isolation levels in
> PostgreSQL, see:
>
> https://urldefense.proofpoint.com/v1/url?u=http://wiki.postgresql.org/wiki
> /SSI&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjroD2HLP
> THU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=H8C2sdUv2dsUC9oH2yzDssdTb
> CEBF5mbQZbZ871laGw%3D%0A&s=3c2629d0256b802ed7b701be6bff7443480
> 5f94beb1c400c772ace91c7204bc5
>
> If you are just interested in reducing the number of serialization failures, see
> the suggestions near the end of this section of the
> documentation:
>
> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/do
> cs/9.2/interactive/transaction-iso.html%23XACT-
> SERIALIZABLE&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=H8C2sdUv2dsUC9oH2
> yzDssdTbCEBF5mbQZbZ871laGw%3D%0A&s=ae5349ae3cafcd86c6ba6be9404
> 990ae800d93d6ccfe892402c2d8d463bd8574
>
> Any of these items (or perhaps a combination of them) may ameliorate the
> problem.  Note that I have seen reports of cases where
> max_pred_locks_per_transaction needed to be set to 20x the default to
> reduce serialization failures to an acceptable level.
> The default is intentionally set very low because so many people do not use
> this isolation level, and this setting reserves shared memory for purposes of
> tracking serializable transactions; the space is wasted for those who don't
> choose to use them.
>
> There is still a lot of work possible to reduce the rate of false positives, which
> has largely gone undone so far due to a general lack of problem reports from
> people which could not be solved through tuning.  If you have such a case, it
> would be interesting to have all relevant details, so that we can target which
> of the many enhancements are relevant to your case.
>
> --
> Kevin Grittner
> EDB:
> https://urldefense.proofpoint.com/v1/url?u=http://www.enterprisedb.com
> /&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjroD2HLPTH
> U27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=H8C2sdUv2dsUC9oH2yzDssdTbCE
> BF5mbQZbZ871laGw%3D%0A&s=ca419a3d34bca730a6a153fe027150e4975396
> 4be76b78a2b81dc84378d091e6
> The Enterprise PostgreSQL Company


Re: High rate of transaction failure with the Serializable Isolation Level

From
Craig Ringer
Date:
On 07/25/2014 03:50 AM, Reza Taheri wrote:
> Hi Craig,
>> It's not just that statement that is relevant.
>> Is that statement run standalone, or as part of a larger transaction?
>
> Yes, the "size" of the transaction seems to matter here. It is a complex transaction (attached). Each "frame" is one
storedprocedure, and the 6 frames are called one after the other with no pause. After frame6 returns, we call
SQLTransact(...,...,  SQL_COMMIT). Below is the failure rate of the various frames: 
>
>     112 tid 18883: SQL Failed: DoTradeResultFrame3
>     102 tid 18883: SQL Failed: DoTradeResultFrame4
>   18188 tid 18883: SQL Failed: DoTradeResultFrame5
>    8566 tid 18883: SQL Failed: DoTradeResultFrame6
>    4492 tid 18883: ERROR: TradeResultDB: commit failed
>
> So, no failures in frames 1 and 2, and then the failure rate grows as we approach the end of the transaction.

According to the attached SQL, each frame is a separate phase in the
operation and performs many different operations.

There's a *lot* going on here, so identifying possible interdependencies
isn't something I can do in a ten minute skim read over my morning coffee.

I think the most useful thing to do here is to start cutting and
simplifying the case, trying to boil it down to the smallest thing that
still causes the problem.

That'll likely either find a previously unidentified interdependency
between transactions or, if you're unlucky, a Pg bug. Given the
complexity of the operations there I'd be very surprised if it wasn't
the former.

>> If the INSERTing transaction previously queried for a key that was created by a concurrent transaction this can
occuras there is no serialization 
>> execution order of the transactions that could produce the same result.
>
> As far as the inserts, your point is well-taken. But in this case, I have eliminated the transactions that query or
otherwisemanipulate the SETTELEMENT table. The only access to it is the single insert in this transaction 

If there are foreign keys to it from other tables, they count too.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Hi Craig,

> According to the attached SQL, each frame is a separate phase in the operation and performs many different
operations.
> There's a *lot* going on here, so identifying possible interdependencies isn't something I can do in a ten minute
skim
> read over my morning coffee.

You didn't think I was going to bug you all with a trivial problem, did you? :-) :-)

Yes, I am going to have to take an axe to the code and see what pops out. Just to put this in perspective, the
transactionflow and its statements are borrowed verbatim from the TPC-E benchmark. There have been dozens of TPC-E
disclosureswith MS SQL Server, and there are Oracle and DB2 kits that, although not used in public disclosures for
variousnon-technical reasons, are used internally in by the DB and server companies. These 3 products, and perhaps
more,were used extensively in the prototyping phase of TPC-E. 

So, my hope is that if there is a "previously unidentified interdependency between transactions" as you point out, it
willbe due to a mistake we made in coding this for PGSQL. Otherwise, we will have a hard time convincing all the
councilmember companies that we need to change the schema or the business logic to make the kit work with PGSQL. 

Just pointing out my uphill battle!!

> If there are foreign keys to it from other tables, they count too.

Yes, we have a lot of foreign keys. I dropped them all a few weeks ago with no impact. But when I start the axing
process,they will be one of the first to go 

Thanks,
Reza

> -----Original Message-----
> From: Craig Ringer [mailto:craig@2ndquadrant.com]
> Sent: Thursday, July 24, 2014 6:30 PM
> To: Reza Taheri; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High rate of transaction failure with the Serializable
> Isolation Level
>
> On 07/25/2014 03:50 AM, Reza Taheri wrote:
> > Hi Craig,
> >> It's not just that statement that is relevant.
> >> Is that statement run standalone, or as part of a larger transaction?
> >
> > Yes, the "size" of the transaction seems to matter here. It is a complex
> transaction (attached). Each "frame" is one stored procedure, and the 6
> frames are called one after the other with no pause. After frame6 returns,
> we call SQLTransact(..., ...,  SQL_COMMIT). Below is the failure rate of the
> various frames:
> >
> >     112 tid 18883: SQL Failed: DoTradeResultFrame3
> >     102 tid 18883: SQL Failed: DoTradeResultFrame4
> >   18188 tid 18883: SQL Failed: DoTradeResultFrame5
> >    8566 tid 18883: SQL Failed: DoTradeResultFrame6
> >    4492 tid 18883: ERROR: TradeResultDB: commit failed
> >
> > So, no failures in frames 1 and 2, and then the failure rate grows as we
> approach the end of the transaction.
>
> According to the attached SQL, each frame is a separate phase in the
> operation and performs many different operations.
>
> There's a *lot* going on here, so identifying possible interdependencies isn't
> something I can do in a ten minute skim read over my morning coffee.
>
> I think the most useful thing to do here is to start cutting and simplifying the
> case, trying to boil it down to the smallest thing that still causes the problem.
>
> That'll likely either find a previously unidentified interdependency between
> transactions or, if you're unlucky, a Pg bug. Given the complexity of the
> operations there I'd be very surprised if it wasn't the former.
>
> >> If the INSERTing transaction previously queried for a key that was
> >> created by a concurrent transaction this can occur as there is no
> serialization execution order of the transactions that could produce the same
> result.
> >
> > As far as the inserts, your point is well-taken. But in this case, I
> > have eliminated the transactions that query or otherwise manipulate
> > the SETTELEMENT table. The only access to it is the single insert in
> > this transaction
>
> If there are foreign keys to it from other tables, they count too.
>
> --
>  Craig Ringer
> https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com
> /&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjroD2HLPTH
> U27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=SLSpdQUFSC%2BXlQIgotLSghfyEB
> qC7q8Sh1AEizZ3pBw%3D%0A&s=ceb740d5d6686cda7ed9dd31b4dce2de0eda
> 3cf3a46ffead645c5bb6d9e7ec5c
>  PostgreSQL Development, 24x7 Support, Training & Services


Re: High rate of transaction failure with the Serializable Isolation Level

From
Ryan Johnson
Date:
On 25/07/2014 2:58 PM, Reza Taheri wrote:
> Hi Craig,
>
>> According to the attached SQL, each frame is a separate phase in the operation and performs many different
operations.
>> There's a *lot* going on here, so identifying possible interdependencies isn't something I can do in a ten minute
skim
>> read over my morning coffee.
> You didn't think I was going to bug you all with a trivial problem, did you? :-) :-)
>
> Yes, I am going to have to take an axe to the code and see what pops out. Just to put this in perspective, the
transactionflow and its statements are borrowed verbatim from the TPC-E benchmark. There have been dozens of TPC-E
disclosureswith MS SQL Server, and there are Oracle and DB2 kits that, although not used in public disclosures for
variousnon-technical reasons, are used internally in by the DB and server companies. These 3 products, and perhaps
more,were used extensively in the prototyping phase of TPC-E. 
>
> So, my hope is that if there is a "previously unidentified interdependency between transactions" as you point out, it
willbe due to a mistake we made in coding this for PGSQL. Otherwise, we will have a hard time convincing all the
councilmember companies that we need to change the schema or the business logic to make the kit work with PGSQL. 
>
> Just pointing out my uphill battle!!
You might compare against dbt-5 [1], just to see if the same problem
occurs. I didn't notice such high abort rates when I ran that workload a
few weeks ago. Just make sure to use the latest commit, because the
"released" version has fatal bugs.

[1] https://github.com/petergeoghegan/dbt5

Ryan



On 07/23/2014 06:18 PM, Reza Taheri wrote:
> [By way of introduction, we are a TPC subcommittee that is developing a
> benchmark with cloud-like characteristics for virtualized databases. The
> end-to-end benchmarking kit will be publicly available, and will run on
> PGSQL]

Awesome!  Any idea when it will be available?  Our community could
really use some updated benchmark tooling ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


We are hoping the spec will get wrapped up in the next 6 months, but industry standard councils move very slowly!
However,if there is interest in getting involved and helping, the TPC might be receptive to earlier access.
 

BTW, just to let folks know how large of an undertaking this has been, this is an estimate of the lines of code
developedso far:
 

§  700 lines of run-time shell scripts
§  700 lines of build-time shell scripts
§  3.5K lines of DDL and DML
§  4K lines of C code to test the DML
§  22K lines of C, C++,  and Java code in the benchmark driver
§  45K lines of C++ code in VGen

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-owner@postgresql.org] On Behalf Of Josh Berkus
> Sent: Friday, July 25, 2014 2:47 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: High rate of transaction failure with the Serializable Isolation
> Level
> 
> On 07/23/2014 06:18 PM, Reza Taheri wrote:
> > [By way of introduction, we are a TPC subcommittee that is developing
> > a benchmark with cloud-like characteristics for virtualized databases.
> > The end-to-end benchmarking kit will be publicly available, and will
> > run on PGSQL]
> 
> Awesome!  Any idea when it will be available?  Our community could really
> use some updated benchmark tooling ...
> 
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> https://urldefense.proofpoint.com/v1/url?u=http://pgexperts.com/&k=oIv
> Rg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjroD2HLPTHU27nI9P
> Jr8wgKO2rU9QZyZZU%3D%0A&m=tao9MSozUNfqQO2rwn3fQPjQbSY7t6i7va
> Qnrs%2F%2B%2FWI%3D%0A&s=3cc04ce20eaf8319b9d2727420905b24ad64be
> b8d283142b321dde138026ed8d
> 
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m
> ailpref/pgsql-
> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=tao9MSozUNfqQO2r
> wn3fQPjQbSY7t6i7vaQnrs%2F%2B%2FWI%3D%0A&s=9c73db2dfd89fef26217
> f410046c8557b726bc8c5d0bf53053ea0e33294032f7

Hi Ryan,
That's a very good point. We are looking at dbt5. One question: what throughput rate, and how many threads of execution
didyou use for dbt5?  The failure rates I reported were at ~120 tps with 15 trade-result threads. 

Thanks,
Reza

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
> Sent: Friday, July 25, 2014 2:36 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: High rate of transaction failure with the Serializable Isolation
> Level
>
> On 25/07/2014 2:58 PM, Reza Taheri wrote:
> > Hi Craig,
> >
> >> According to the attached SQL, each frame is a separate phase in the
> operation and performs many different operations.
> >> There's a *lot* going on here, so identifying possible
> >> interdependencies isn't something I can do in a ten minute skim read over
> my morning coffee.
> > You didn't think I was going to bug you all with a trivial problem,
> > did you? :-) :-)
> >
> > Yes, I am going to have to take an axe to the code and see what pops out.
> Just to put this in perspective, the transaction flow and its statements are
> borrowed verbatim from the TPC-E benchmark. There have been dozens of
> TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that,
> although not used in public disclosures for various non-technical reasons, are
> used internally in by the DB and server companies. These 3 products, and
> perhaps more, were used extensively in the prototyping phase of TPC-E.
> >
> > So, my hope is that if there is a "previously unidentified interdependency
> between transactions" as you point out, it will be due to a mistake we made
> in coding this for PGSQL. Otherwise, we will have a hard time convincing all
> the council member companies that we need to change the schema or the
> business logic to make the kit work with PGSQL.
> >
> > Just pointing out my uphill battle!!
> You might compare against dbt-5 [1], just to see if the same problem occurs. I
> didn't notice such high abort rates when I ran that workload a few weeks
> ago. Just make sure to use the latest commit, because the "released" version
> has fatal bugs.
>
> [1]
> https://urldefense.proofpoint.com/v1/url?u=https://github.com/petergeog
> hegan/dbt5&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjr
> oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMyP
> xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=b3f269216d419410f3f07bb
> 774a27b7d377744c9d423df52a3e62324d9279958
>
> Ryan
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m
> ailpref/pgsql-
> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMy
> PxtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=45ab94ce068dbe28956af
> 8bb3f999e9a91138dd1e3c3345c036e87e902da1ef1


Re: High rate of transaction failure with the Serializable Isolation Level

From
Ryan Johnson
Date:
Dredging through some old run logs, 12 dbt-5 clients gave the following
when everything was run under SSI (fully serializable, even the
transactions that allow repeatable read isolation). Not sure how that
translates to your results. Abort rates were admittedly rather high,
though perhaps lower than what you report.

Transaction             % Average: 90th %   Total Rollbacks    % Warning Invalid
----------------- ------- --------------- ------- -------------- ------- -------
Trade Result        5.568   0.022:  0.056    2118    417  19.69%       0      91
Broker Volume       5.097   0.009:  0.014    1557      0   0.00%       0       0
Customer Position  13.530   0.016:  0.034    4134      1   0.02%       0       0
Market Feed         0.547   0.033:  0.065     212     45  21.23%       0      69
Market Watch       18.604   0.031:  0.061    5683      0   0.00%       0       0
Security Detail    14.462   0.015:  0.020    4418      0   0.00%       0       0
Trade Lookup        8.325   0.059:  0.146    2543      0   0.00%     432       0
Trade Order         9.110   0.006:  0.008    3227    444  13.76%       0       0
Trade Status       19.795   0.030:  0.046    6047      0   0.00%       0       0
Trade Update        1.990   0.064:  0.145     608      0   0.00%     432       0
Data Maintenance      N/A   0.012:  0.012       1      0   0.00%       0       0
----------------- ------- --------------- ------- -------------- ------- -------
28.35 trade-result transactions per second (trtps)

Regards,
Ryan

On 26/07/2014 3:55 PM, Reza Taheri wrote:
> Hi Ryan,
> That's a very good point. We are looking at dbt5. One question: what throughput rate, and how many threads of
executiondid you use for dbt5?  The failure rates I reported were at ~120 tps with 15 trade-result threads. 
>
> Thanks,
> Reza
>
>> -----Original Message-----
>> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
>> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
>> Sent: Friday, July 25, 2014 2:36 PM
>> To: pgsql-performance@postgresql.org
>> Subject: Re: High rate of transaction failure with the Serializable Isolation
>> Level
>>
>> On 25/07/2014 2:58 PM, Reza Taheri wrote:
>>> Hi Craig,
>>>
>>>> According to the attached SQL, each frame is a separate phase in the
>> operation and performs many different operations.
>>>> There's a *lot* going on here, so identifying possible
>>>> interdependencies isn't something I can do in a ten minute skim read over
>> my morning coffee.
>>> You didn't think I was going to bug you all with a trivial problem,
>>> did you? :-) :-)
>>>
>>> Yes, I am going to have to take an axe to the code and see what pops out.
>> Just to put this in perspective, the transaction flow and its statements are
>> borrowed verbatim from the TPC-E benchmark. There have been dozens of
>> TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that,
>> although not used in public disclosures for various non-technical reasons, are
>> used internally in by the DB and server companies. These 3 products, and
>> perhaps more, were used extensively in the prototyping phase of TPC-E.
>>> So, my hope is that if there is a "previously unidentified interdependency
>> between transactions" as you point out, it will be due to a mistake we made
>> in coding this for PGSQL. Otherwise, we will have a hard time convincing all
>> the council member companies that we need to change the schema or the
>> business logic to make the kit work with PGSQL.
>>> Just pointing out my uphill battle!!
>> You might compare against dbt-5 [1], just to see if the same problem occurs. I
>> didn't notice such high abort rates when I ran that workload a few weeks
>> ago. Just make sure to use the latest commit, because the "released" version
>> has fatal bugs.
>>
>> [1]
>> https://urldefense.proofpoint.com/v1/url?u=https://github.com/petergeog
>> hegan/dbt5&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjr
>> oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMyP
>> xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=b3f269216d419410f3f07bb
>> 774a27b7d377744c9d423df52a3e62324d9279958
>>
>> Ryan
>>
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m
>> ailpref/pgsql-
>> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
>> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMy
>> PxtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=45ab94ce068dbe28956af
>> 8bb3f999e9a91138dd1e3c3345c036e87e902da1ef1



Hi Ryan,
Thanks a lot for sharing this. When I run with 12 CE threads and 3-5 MEE threads (how many MEE threads do you have?) @
80-90tps, I get something in the 20-30% of trade-result transactions rolled back depending on how I count. E.g., in a
5.5-minuterun with 3 MEE threads, I saw 87.5 tps. There were 29200 successful trade-result transactions. Of these, 5800
wererolled back, some more than once for a total of 8450 rollbacks. So I'd say your results and ours tell similar
stories!

Thanks,
Reza

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
> Sent: Saturday, July 26, 2014 2:06 PM
> To: Reza Taheri
> Cc: pgsql-performance@postgresql.org
> Subject: Re: High rate of transaction failure with the Serializable Isolation
> Level
>
> Dredging through some old run logs, 12 dbt-5 clients gave the following when
> everything was run under SSI (fully serializable, even the transactions that
> allow repeatable read isolation). Not sure how that translates to your results.
> Abort rates were admittedly rather high, though perhaps lower than what
> you report.
>
> Transaction             % Average: 90th %   Total Rollbacks    % Warning Invalid
> ----------------- ------- --------------- ------- -------------- ------- -------
> Trade Result        5.568   0.022:  0.056    2118    417  19.69%       0      91
> Broker Volume       5.097   0.009:  0.014    1557      0   0.00%       0       0
> Customer Position  13.530   0.016:  0.034    4134      1   0.02%       0       0
> Market Feed         0.547   0.033:  0.065     212     45  21.23%       0      69
> Market Watch       18.604   0.031:  0.061    5683      0   0.00%       0       0
> Security Detail    14.462   0.015:  0.020    4418      0   0.00%       0       0
> Trade Lookup        8.325   0.059:  0.146    2543      0   0.00%     432       0
> Trade Order         9.110   0.006:  0.008    3227    444  13.76%       0       0
> Trade Status       19.795   0.030:  0.046    6047      0   0.00%       0       0
> Trade Update        1.990   0.064:  0.145     608      0   0.00%     432       0
> Data Maintenance      N/A   0.012:  0.012       1      0   0.00%       0       0
> ----------------- ------- --------------- ------- -------------- ------- -------
> 28.35 trade-result transactions per second (trtps)
>
> Regards,
> Ryan
>
> On 26/07/2014 3:55 PM, Reza Taheri wrote:
> > Hi Ryan,
> > That's a very good point. We are looking at dbt5. One question: what
> throughput rate, and how many threads of execution did you use for dbt5?
> The failure rates I reported were at ~120 tps with 15 trade-result threads.
> >
> > Thanks,
> > Reza
> >
> >> -----Original Message-----
> >> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> >> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
> >> Sent: Friday, July 25, 2014 2:36 PM
> >> To: pgsql-performance@postgresql.org
> >> Subject: Re: High rate of transaction failure with the Serializable
> >> Isolation Level
> >>
> >> On 25/07/2014 2:58 PM, Reza Taheri wrote:
> >>> Hi Craig,
> >>>
> >>>> According to the attached SQL, each frame is a separate phase in
> >>>> the
> >> operation and performs many different operations.
> >>>> There's a *lot* going on here, so identifying possible
> >>>> interdependencies isn't something I can do in a ten minute skim
> >>>> read over
> >> my morning coffee.
> >>> You didn't think I was going to bug you all with a trivial problem,
> >>> did you? :-) :-)
> >>>
> >>> Yes, I am going to have to take an axe to the code and see what pops
> out.
> >> Just to put this in perspective, the transaction flow and its
> >> statements are borrowed verbatim from the TPC-E benchmark. There
> have
> >> been dozens of TPC-E disclosures with MS SQL Server, and there are
> >> Oracle and DB2 kits that, although not used in public disclosures for
> >> various non-technical reasons, are used internally in by the DB and
> >> server companies. These 3 products, and perhaps more, were used
> extensively in the prototyping phase of TPC-E.
> >>> So, my hope is that if there is a "previously unidentified
> >>> interdependency
> >> between transactions" as you point out, it will be due to a mistake
> >> we made in coding this for PGSQL. Otherwise, we will have a hard time
> >> convincing all the council member companies that we need to change
> >> the schema or the business logic to make the kit work with PGSQL.
> >>> Just pointing out my uphill battle!!
> >> You might compare against dbt-5 [1], just to see if the same problem
> >> occurs. I didn't notice such high abort rates when I ran that
> >> workload a few weeks ago. Just make sure to use the latest commit,
> >> because the "released" version has fatal bugs.
> >>
> >> [1]
> >>
> https://urldefense.proofpoint.com/v1/url?u=https://github.com/peterge
> >> og
> hegan/dbt5&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjr
> >>
> oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMyP
> >>
> xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=b3f269216d419410f3f07bb
> >> 774a27b7d377744c9d423df52a3e62324d9279958
> >>
> >> Ryan
> >>
> >>
> >>
> >> --
> >> Sent via pgsql-performance mailing list
> >> (pgsql-performance@postgresql.org)
> >> To make changes to your subscription:
> >>
> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/
> >> m
> >> ailpref/pgsql-
> >>
> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
> >>
> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMy
> >>
> PxtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=45ab94ce068dbe28956af
> >> 8bb3f999e9a91138dd1e3c3345c036e87e902da1ef1
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m
> ailpref/pgsql-
> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=gzdXAra2QlJIiMTFSjH
> cKAsSKNR5LST%2FrsLWdeb7Y9c%3D%0A&s=673454322b6239edd9d02472e95
> e8a6c15cb1a095d2afb9c981642e44fb40672


Re: High rate of transaction failure with the Serializable Isolation Level

From
Ryan Johnson
Date:
That does sound pretty similar, modulo the raw performance difference. I
have no idea how many MEE threads there were; it was just a quick run
with exactly zero tuning, so I use whatever dbt5 does out of the box.
Actually, though, if you have any general tuning tips for TPC-E I'd be
interested to learn them (PM if that's off topic for this discussion).

Regards,
Ryan

On 26/07/2014 7:33 PM, Reza Taheri wrote:
> Hi Ryan,
> Thanks a lot for sharing this. When I run with 12 CE threads and 3-5 MEE threads (how many MEE threads do you have?)
@ 80-90 tps, I get something in the 20-30% of trade-result transactions rolled back depending on how I count. E.g., in
a5.5-minute run with 3 MEE threads, I saw 87.5 tps. There were 29200 successful trade-result transactions. Of these,
5800were rolled back, some more than once for a total of 8450 rollbacks. So I'd say your results and ours tell similar
stories!
>
> Thanks,
> Reza
>
>> -----Original Message-----
>> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
>> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
>> Sent: Saturday, July 26, 2014 2:06 PM
>> To: Reza Taheri
>> Cc: pgsql-performance@postgresql.org
>> Subject: Re: High rate of transaction failure with the Serializable Isolation
>> Level
>>
>> Dredging through some old run logs, 12 dbt-5 clients gave the following when
>> everything was run under SSI (fully serializable, even the transactions that
>> allow repeatable read isolation). Not sure how that translates to your results.
>> Abort rates were admittedly rather high, though perhaps lower than what
>> you report.
>>
>> Transaction             % Average: 90th %   Total Rollbacks    % Warning Invalid
>> ----------------- ------- --------------- ------- -------------- ------- -------
>> Trade Result        5.568   0.022:  0.056    2118    417  19.69%       0      91
>> Broker Volume       5.097   0.009:  0.014    1557      0   0.00%       0       0
>> Customer Position  13.530   0.016:  0.034    4134      1   0.02%       0       0
>> Market Feed         0.547   0.033:  0.065     212     45  21.23%       0      69
>> Market Watch       18.604   0.031:  0.061    5683      0   0.00%       0       0
>> Security Detail    14.462   0.015:  0.020    4418      0   0.00%       0       0
>> Trade Lookup        8.325   0.059:  0.146    2543      0   0.00%     432       0
>> Trade Order         9.110   0.006:  0.008    3227    444  13.76%       0       0
>> Trade Status       19.795   0.030:  0.046    6047      0   0.00%       0       0
>> Trade Update        1.990   0.064:  0.145     608      0   0.00%     432       0
>> Data Maintenance      N/A   0.012:  0.012       1      0   0.00%       0       0
>> ----------------- ------- --------------- ------- -------------- ------- -------
>> 28.35 trade-result transactions per second (trtps)
>>
>> Regards,
>> Ryan
>>
>> On 26/07/2014 3:55 PM, Reza Taheri wrote:
>>> Hi Ryan,
>>> That's a very good point. We are looking at dbt5. One question: what
>> throughput rate, and how many threads of execution did you use for dbt5?
>> The failure rates I reported were at ~120 tps with 15 trade-result threads.
>>> Thanks,
>>> Reza
>>>
>>>> -----Original Message-----
>>>> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
>>>> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
>>>> Sent: Friday, July 25, 2014 2:36 PM
>>>> To: pgsql-performance@postgresql.org
>>>> Subject: Re: High rate of transaction failure with the Serializable
>>>> Isolation Level
>>>>
>>>> On 25/07/2014 2:58 PM, Reza Taheri wrote:
>>>>> Hi Craig,
>>>>>
>>>>>> According to the attached SQL, each frame is a separate phase in
>>>>>> the
>>>> operation and performs many different operations.
>>>>>> There's a *lot* going on here, so identifying possible
>>>>>> interdependencies isn't something I can do in a ten minute skim
>>>>>> read over
>>>> my morning coffee.
>>>>> You didn't think I was going to bug you all with a trivial problem,
>>>>> did you? :-) :-)
>>>>>
>>>>> Yes, I am going to have to take an axe to the code and see what pops
>> out.
>>>> Just to put this in perspective, the transaction flow and its
>>>> statements are borrowed verbatim from the TPC-E benchmark. There
>> have
>>>> been dozens of TPC-E disclosures with MS SQL Server, and there are
>>>> Oracle and DB2 kits that, although not used in public disclosures for
>>>> various non-technical reasons, are used internally in by the DB and
>>>> server companies. These 3 products, and perhaps more, were used
>> extensively in the prototyping phase of TPC-E.
>>>>> So, my hope is that if there is a "previously unidentified
>>>>> interdependency
>>>> between transactions" as you point out, it will be due to a mistake
>>>> we made in coding this for PGSQL. Otherwise, we will have a hard time
>>>> convincing all the council member companies that we need to change
>>>> the schema or the business logic to make the kit work with PGSQL.
>>>>> Just pointing out my uphill battle!!
>>>> You might compare against dbt-5 [1], just to see if the same problem
>>>> occurs. I didn't notice such high abort rates when I ran that
>>>> workload a few weeks ago. Just make sure to use the latest commit,
>>>> because the "released" version has fatal bugs.
>>>>
>>>> [1]
>>>>
>> https://urldefense.proofpoint.com/v1/url?u=https://github.com/peterge
>>>> og
>> hegan/dbt5&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjr
>> oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMyP
>> xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=b3f269216d419410f3f07bb
>>>> 774a27b7d377744c9d423df52a3e62324d9279958
>>>>
>>>> Ryan
>>>>
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-performance mailing list
>>>> (pgsql-performance@postgresql.org)
>>>> To make changes to your subscription:
>>>>
>> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/
>>>> m
>>>> ailpref/pgsql-
>>>>
>> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
>> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMy
>> PxtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=45ab94ce068dbe28956af
>>>> 8bb3f999e9a91138dd1e3c3345c036e87e902da1ef1
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m
>> ailpref/pgsql-
>> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
>> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=gzdXAra2QlJIiMTFSjH
>> cKAsSKNR5LST%2FrsLWdeb7Y9c%3D%0A&s=673454322b6239edd9d02472e95
>> e8a6c15cb1a095d2afb9c981642e44fb40672



Re: High rate of transaction failure with the Serializable Isolation Level

From
Craig Ringer
Date:
On 07/26/2014 02:58 AM, Reza Taheri wrote:
> Hi Craig,
>
>> According to the attached SQL, each frame is a separate phase in the operation and performs many different
operations.
>> There's a *lot* going on here, so identifying possible interdependencies isn't something I can do in a ten minute
skim
>> read over my morning coffee.
>
> You didn't think I was going to bug you all with a trivial problem, did you? :-) :-)

One can hope, but usually in vain...

> Yes, I am going to have to take an axe to the code and see what pops out. Just to put this in perspective, the
transactionflow and its statements are borrowed verbatim from the TPC-E benchmark. There have been dozens of TPC-E
disclosureswith MS SQL Server, and there are Oracle and DB2 kits that, although not used in public disclosures for
variousnon-technical reasons, are used internally in by the DB and server companies. These 3 products, and perhaps
more,were used extensively in the prototyping phase of TPC-E. 
>
> So, my hope is that if there is a "previously unidentified interdependency between transactions" as you point out, it
willbe due to a mistake we made in coding this for PGSQL. Otherwise, we will have a hard time convincing all the
councilmember companies that we need to change the schema or the business logic to make the kit work with PGSQL. 

Hopefully so.

Personally I think it's moderately likely that PostgreSQL's much
stricter enforcement of serializable isolation is detecting anomalies
that other products do not, so it's potentially preventing errors.

It would be nice to have the ability to tune this; sometimes there are
anomalies you wish to ignore or permit. At present it is an all or
nothing affair - no predicate locking (REPEATABLE READ isolation) or
strict predicate locking (SERIALIZABLE isolation).

I recommend running some of the examples in the SERIALIZABLE
documentation on other products. If they don't fail where they do in Pg,
then the other products either have less strict (and arguably therefor
less correct) serialisable isolation enforcement or they rely on
blocking predicate locks. In the latter case it should be easy to tell
because statements will block on locks where no ordinary row or table
level lock could be taken.

If you do run comparative tests I would be quite interested in seeing
the results.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: High rate of transaction failure with the Serializable Isolation Level

From
Reza Taheri
Date:
Hi Ryan,
I just noticed that the mail alias manager has stalled the post below because of the attachment size. But you should
havegotten it directly. 

If anyone else is interested in a copy, let me know, and I will forward it

Thanks,
Reza

> -----Original Message-----
> From: Reza Taheri
> Sent: Monday, July 28, 2014 8:57 PM
> To: 'Ryan Johnson'
> Cc: pgsql-performance@postgresql.org
> Subject: RE: High rate of transaction failure with the Serializable Isolation
> Level
>
> Hi Ryan,
> We presented a paper at the TPCTC of last year's VLDB (attached). It
> described the architecture of the kit, and some of the tuning. Another tuning
> change was setting /proc/sys/vm/dirty_background_bytes to a small value
> (like 10000000) on very-large memory machines, which was a problem I
> brought up on this same mailing list a while ago and got great advice. Also,
> make sure you do a SQLFreeStmt(stmt, SQL_DROP) at the end of
> transactions, not SQL_CLOSE.
>
> Let me know if you have any question about the paper
>
> Thanks,
> Reza
>
> > -----Original Message-----
> > From: Ryan Johnson [mailto:ryan.johnson@cs.utoronto.ca]
> > Sent: Saturday, July 26, 2014 4:51 PM
> > To: Reza Taheri
> > Cc: pgsql-performance@postgresql.org
> > Subject: Re: High rate of transaction failure with the Serializable
> > Isolation Level
> >
> > That does sound pretty similar, modulo the raw performance difference.
> > I have no idea how many MEE threads there were; it was just a quick
> > run with exactly zero tuning, so I use whatever dbt5 does out of the box.
> > Actually, though, if you have any general tuning tips for TPC-E I'd be
> > interested to learn them (PM if that's off topic for this discussion).
> >
> > Regards,
> > Ryan
> >
> > On 26/07/2014 7:33 PM, Reza Taheri wrote:
> > > Hi Ryan,
> > > Thanks a lot for sharing this. When I run with 12 CE threads and 3-5
> > > MEE
> > threads (how many MEE threads do you have?) @  80-90 tps, I get
> > something in the 20-30% of trade-result transactions rolled back
> > depending on how I count. E.g., in a 5.5-minute run with 3 MEE
> > threads, I saw 87.5 tps. There were 29200 successful trade-result
> > transactions. Of these, 5800 were rolled back, some more than once for
> > a total of 8450 rollbacks. So I'd say your results and ours tell similar stories!
> > >
> > > Thanks,
> > > Reza
> > >
> > >> -----Original Message-----
> > >> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> > >> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
> > >> Sent: Saturday, July 26, 2014 2:06 PM
> > >> To: Reza Taheri
> > >> Cc: pgsql-performance@postgresql.org
> > >> Subject: Re: High rate of transaction failure with the Serializable
> > >> Isolation Level
> > >>
> > >> Dredging through some old run logs, 12 dbt-5 clients gave the
> > >> following when everything was run under SSI (fully serializable,
> > >> even the transactions that allow repeatable read isolation). Not
> > >> sure how that
> > translates to your results.
> > >> Abort rates were admittedly rather high, though perhaps lower than
> > >> what you report.
> > >>
> > >> Transaction             % Average: 90th %   Total Rollbacks    % Warning Invalid
> > >> ----------------- ------- --------------- ------- -------------- ------- -------
> > >> Trade Result        5.568   0.022:  0.056    2118    417  19.69%       0      91
> > >> Broker Volume       5.097   0.009:  0.014    1557      0   0.00%       0       0
> > >> Customer Position  13.530   0.016:  0.034    4134      1   0.02%       0       0
> > >> Market Feed         0.547   0.033:  0.065     212     45  21.23%       0      69
> > >> Market Watch       18.604   0.031:  0.061    5683      0   0.00%       0       0
> > >> Security Detail    14.462   0.015:  0.020    4418      0   0.00%       0       0
> > >> Trade Lookup        8.325   0.059:  0.146    2543      0   0.00%     432       0
> > >> Trade Order         9.110   0.006:  0.008    3227    444  13.76%       0       0
> > >> Trade Status       19.795   0.030:  0.046    6047      0   0.00%       0       0
> > >> Trade Update        1.990   0.064:  0.145     608      0   0.00%     432       0
> > >> Data Maintenance      N/A   0.012:  0.012       1      0   0.00%       0       0
> > >> ----------------- ------- --------------- ------- --------------
> > >> ------- -------
> > >> 28.35 trade-result transactions per second (trtps)
> > >>
> > >> Regards,
> > >> Ryan
> > >>
> > >> On 26/07/2014 3:55 PM, Reza Taheri wrote:
> > >>> Hi Ryan,
> > >>> That's a very good point. We are looking at dbt5. One question:
> > >>> what
> > >> throughput rate, and how many threads of execution did you use for
> > dbt5?
> > >> The failure rates I reported were at ~120 tps with 15 trade-result
> threads.
> > >>> Thanks,
> > >>> Reza
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> > >>>> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
> > >>>> Sent: Friday, July 25, 2014 2:36 PM
> > >>>> To: pgsql-performance@postgresql.org
> > >>>> Subject: Re: High rate of transaction failure with the
> > >>>> Serializable Isolation Level
> > >>>>
> > >>>> On 25/07/2014 2:58 PM, Reza Taheri wrote:
> > >>>>> Hi Craig,
> > >>>>>
> > >>>>>> According to the attached SQL, each frame is a separate phase
> > >>>>>> in the
> > >>>> operation and performs many different operations.
> > >>>>>> There's a *lot* going on here, so identifying possible
> > >>>>>> interdependencies isn't something I can do in a ten minute skim
> > >>>>>> read over
> > >>>> my morning coffee.
> > >>>>> You didn't think I was going to bug you all with a trivial
> > >>>>> problem, did you? :-) :-)
> > >>>>>
> > >>>>> Yes, I am going to have to take an axe to the code and see what
> > >>>>> pops
> > >> out.
> > >>>> Just to put this in perspective, the transaction flow and its
> > >>>> statements are borrowed verbatim from the TPC-E benchmark.
> There
> > >> have
> > >>>> been dozens of TPC-E disclosures with MS SQL Server, and there
> > >>>> are Oracle and DB2 kits that, although not used in public
> > >>>> disclosures for various non-technical reasons, are used
> > >>>> internally in by the DB and server companies. These 3 products,
> > >>>> and perhaps more, were
> > used
> > >> extensively in the prototyping phase of TPC-E.
> > >>>>> So, my hope is that if there is a "previously unidentified
> > >>>>> interdependency
> > >>>> between transactions" as you point out, it will be due to a
> > >>>> mistake we made in coding this for PGSQL. Otherwise, we will have
> > >>>> a hard time convincing all the council member companies that we
> > >>>> need to change the schema or the business logic to make the kit
> > >>>> work with
> > PGSQL.
> > >>>>> Just pointing out my uphill battle!!
> > >>>> You might compare against dbt-5 [1], just to see if the same
> > >>>> problem occurs. I didn't notice such high abort rates when I ran
> > >>>> that workload a few weeks ago. Just make sure to use the latest
> > >>>> commit, because the "released" version has fatal bugs.
> > >>>>
> > >>>> [1]
> > >>>>
> > >>
> > https://urldefense.proofpoint.com/v1/url?u=https://github.com/peterge
> > >>>> og
> > >>
> >
> hegan/dbt5&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjr
> > >>
> >
> oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMyP
> > >>
> >
> xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=b3f269216d419410f3f07bb
> > >>>> 774a27b7d377744c9d423df52a3e62324d9279958
> > >>>>
> > >>>> Ryan
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Sent via pgsql-performance mailing list
> > >>>> (pgsql-performance@postgresql.org)
> > >>>> To make changes to your subscription:
> > >>>>
> > >>
> > https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/
> > >>>> m
> > >>>> ailpref/pgsql-
> > >>>>
> > >>
> >
> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
> > >>
> >
> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMy
> > >>
> >
> PxtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=45ab94ce068dbe28956af
> > >>>> 8bb3f999e9a91138dd1e3c3345c036e87e902da1ef1
> > >>
> > >>
> > >> --
> > >> Sent via pgsql-performance mailing list
> > >> (pgsql-performance@postgresql.org)
> > >> To make changes to your subscription:
> > >>
> > https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/
> > >> m
> > >> ailpref/pgsql-
> > >>
> >
> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
> > >>
> >
> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=gzdXAra2QlJIiMTFSjH
> > >>
> >
> cKAsSKNR5LST%2FrsLWdeb7Y9c%3D%0A&s=673454322b6239edd9d02472e95
> > >> e8a6c15cb1a095d2afb9c981642e44fb40672



Re: High rate of transaction failure with the Serializable Isolation Level

From
Ryan Johnson
Date:
Great, thanks. I'll look into when I get a few minutes.

Ryan

On 28/07/2014 11:57 PM, Reza Taheri wrote:
> Hi Ryan,
> We presented a paper at the TPCTC of last year's VLDB (attached). It described the architecture of the kit, and some
ofthe tuning. Another tuning change was setting /proc/sys/vm/dirty_background_bytes to a small value (like 10000000) on
very-largememory machines, which was a problem I brought up on this same mailing list a while ago and got great advice.
Also,make sure you do a SQLFreeStmt(stmt, SQL_DROP) at the end of transactions, not SQL_CLOSE. 
>
> Let me know if you have any question about the paper
>
> Thanks,
> Reza
>
>> -----Original Message-----
>> From: Ryan Johnson [mailto:ryan.johnson@cs.utoronto.ca]
>> Sent: Saturday, July 26, 2014 4:51 PM
>> To: Reza Taheri
>> Cc: pgsql-performance@postgresql.org
>> Subject: Re: High rate of transaction failure with the Serializable Isolation
>> Level
>>
>> That does sound pretty similar, modulo the raw performance difference. I
>> have no idea how many MEE threads there were; it was just a quick run with
>> exactly zero tuning, so I use whatever dbt5 does out of the box.
>> Actually, though, if you have any general tuning tips for TPC-E I'd be
>> interested to learn them (PM if that's off topic for this discussion).
>>
>> Regards,
>> Ryan
>>
>> On 26/07/2014 7:33 PM, Reza Taheri wrote:
>>> Hi Ryan,
>>> Thanks a lot for sharing this. When I run with 12 CE threads and 3-5 MEE
>> threads (how many MEE threads do you have?) @  80-90 tps, I get something
>> in the 20-30% of trade-result transactions rolled back depending on how I
>> count. E.g., in a 5.5-minute run with 3 MEE threads, I saw 87.5 tps. There
>> were 29200 successful trade-result transactions. Of these, 5800 were rolled
>> back, some more than once for a total of 8450 rollbacks. So I'd say your
>> results and ours tell similar stories!
>>> Thanks,
>>> Reza
>>>
>>>> -----Original Message-----
>>>> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
>>>> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
>>>> Sent: Saturday, July 26, 2014 2:06 PM
>>>> To: Reza Taheri
>>>> Cc: pgsql-performance@postgresql.org
>>>> Subject: Re: High rate of transaction failure with the Serializable
>>>> Isolation Level
>>>>
>>>> Dredging through some old run logs, 12 dbt-5 clients gave the
>>>> following when everything was run under SSI (fully serializable, even
>>>> the transactions that allow repeatable read isolation). Not sure how that
>> translates to your results.
>>>> Abort rates were admittedly rather high, though perhaps lower than
>>>> what you report.
>>>>
>>>> Transaction             % Average: 90th %   Total Rollbacks    % Warning Invalid
>>>> ----------------- ------- --------------- ------- -------------- ------- -------
>>>> Trade Result        5.568   0.022:  0.056    2118    417  19.69%       0      91
>>>> Broker Volume       5.097   0.009:  0.014    1557      0   0.00%       0       0
>>>> Customer Position  13.530   0.016:  0.034    4134      1   0.02%       0       0
>>>> Market Feed         0.547   0.033:  0.065     212     45  21.23%       0      69
>>>> Market Watch       18.604   0.031:  0.061    5683      0   0.00%       0       0
>>>> Security Detail    14.462   0.015:  0.020    4418      0   0.00%       0       0
>>>> Trade Lookup        8.325   0.059:  0.146    2543      0   0.00%     432       0
>>>> Trade Order         9.110   0.006:  0.008    3227    444  13.76%       0       0
>>>> Trade Status       19.795   0.030:  0.046    6047      0   0.00%       0       0
>>>> Trade Update        1.990   0.064:  0.145     608      0   0.00%     432       0
>>>> Data Maintenance      N/A   0.012:  0.012       1      0   0.00%       0       0
>>>> ----------------- ------- --------------- ------- --------------
>>>> ------- -------
>>>> 28.35 trade-result transactions per second (trtps)
>>>>
>>>> Regards,
>>>> Ryan
>>>>
>>>> On 26/07/2014 3:55 PM, Reza Taheri wrote:
>>>>> Hi Ryan,
>>>>> That's a very good point. We are looking at dbt5. One question: what
>>>> throughput rate, and how many threads of execution did you use for
>> dbt5?
>>>> The failure rates I reported were at ~120 tps with 15 trade-result threads.
>>>>> Thanks,
>>>>> Reza
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
>>>>>> performance-owner@postgresql.org] On Behalf Of Ryan Johnson
>>>>>> Sent: Friday, July 25, 2014 2:36 PM
>>>>>> To: pgsql-performance@postgresql.org
>>>>>> Subject: Re: High rate of transaction failure with the Serializable
>>>>>> Isolation Level
>>>>>>
>>>>>> On 25/07/2014 2:58 PM, Reza Taheri wrote:
>>>>>>> Hi Craig,
>>>>>>>
>>>>>>>> According to the attached SQL, each frame is a separate phase in
>>>>>>>> the
>>>>>> operation and performs many different operations.
>>>>>>>> There's a *lot* going on here, so identifying possible
>>>>>>>> interdependencies isn't something I can do in a ten minute skim
>>>>>>>> read over
>>>>>> my morning coffee.
>>>>>>> You didn't think I was going to bug you all with a trivial
>>>>>>> problem, did you? :-) :-)
>>>>>>>
>>>>>>> Yes, I am going to have to take an axe to the code and see what
>>>>>>> pops
>>>> out.
>>>>>> Just to put this in perspective, the transaction flow and its
>>>>>> statements are borrowed verbatim from the TPC-E benchmark. There
>>>> have
>>>>>> been dozens of TPC-E disclosures with MS SQL Server, and there are
>>>>>> Oracle and DB2 kits that, although not used in public disclosures
>>>>>> for various non-technical reasons, are used internally in by the DB
>>>>>> and server companies. These 3 products, and perhaps more, were
>> used
>>>> extensively in the prototyping phase of TPC-E.
>>>>>>> So, my hope is that if there is a "previously unidentified
>>>>>>> interdependency
>>>>>> between transactions" as you point out, it will be due to a mistake
>>>>>> we made in coding this for PGSQL. Otherwise, we will have a hard
>>>>>> time convincing all the council member companies that we need to
>>>>>> change the schema or the business logic to make the kit work with
>> PGSQL.
>>>>>>> Just pointing out my uphill battle!!
>>>>>> You might compare against dbt-5 [1], just to see if the same
>>>>>> problem occurs. I didn't notice such high abort rates when I ran
>>>>>> that workload a few weeks ago. Just make sure to use the latest
>>>>>> commit, because the "released" version has fatal bugs.
>>>>>>
>>>>>> [1]
>>>>>>
>> https://urldefense.proofpoint.com/v1/url?u=https://github.com/peterge
>>>>>> og
>> hegan/dbt5&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjr
>> oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMyP
>> xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=b3f269216d419410f3f07bb
>>>>>> 774a27b7d377744c9d423df52a3e62324d9279958
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent via pgsql-performance mailing list
>>>>>> (pgsql-performance@postgresql.org)
>>>>>> To make changes to your subscription:
>>>>>>
>> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/
>>>>>> m
>>>>>> ailpref/pgsql-
>>>>>>
>> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
>> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=6E%2F9fWJPMGjpMy
>> PxtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0A&s=45ab94ce068dbe28956af
>>>>>> 8bb3f999e9a91138dd1e3c3345c036e87e902da1ef1
>>>>
>>>> --
>>>> Sent via pgsql-performance mailing list
>>>> (pgsql-performance@postgresql.org)
>>>> To make changes to your subscription:
>>>>
>> https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/
>>>> m
>>>> ailpref/pgsql-
>>>>
>> performance&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CP
>> jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=gzdXAra2QlJIiMTFSjH
>> cKAsSKNR5LST%2FrsLWdeb7Y9c%3D%0A&s=673454322b6239edd9d02472e95
>>>> e8a6c15cb1a095d2afb9c981642e44fb40672



Re: High rate of transaction failure with the Serializable Isolation Level

From
Reza Taheri
Date:
An update: following the recommendations on this list, I ran a number of experiments:

- I ran with all foreign keys deleted. There was a 4% drop in the rate of deadlocks/transaction, which went from 0.32
pertransaction to 0.31. So we still have pretty much the same failure rate. One interesting side effect was that
throughputwent up by 9%. So maintaining (a lot of) foreign constraints costs us a lot 

- I ran with max_pred_locks_per_tran as high as 26,400. No difference

- I rebuilt the database with fillfactor=15 for all the tables and indexes that are involved in the transactions that
fail.This was to see if the problem is PGSQL upgrading row level locks to page level locks. With a low fillfactor, the
chancesof two transactions landing on the same page is low. We have around 9 rows per data page instead of the original
~60.(The index pages are more tightly packed).   I ran a range of thread counts from 5 to 60 for the threads that issue
transactions. The failure rate per transaction dropped to around half for the thread count of 5, but that's misleading
sincewith a fillfactor of15, our database size went up by around 6X, reducing the effectiveness of PGSQL and OS disk
caches,resulting in a throughput of around half of what we used to see. So the reduced failure rate is just a result of
fewerthreads competing for resources. When I run with enough threads to max out the system with fillfactor=15 or 100, I
getthe same failure rates 

- In case folks hadn't noticed, Ryan Johnson is getting very similar failure rates with dbt-5. So this isn't a case of
ourhaving made a silly mistake in our coding of the app or the stored procedures 

Above experiments were the easy one. I am now working on rewriting the app code and the 6 stored procedures to see if I
canexecute the whole transaction in a single stored procedure 

Thanks,
Reza

> -----Original Message-----
> From: Craig Ringer [mailto:craig@2ndquadrant.com]
> Sent: Sunday, July 27, 2014 8:58 PM
> To: Reza Taheri; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High rate of transaction failure with the Serializable
> Isolation Level
>
> On 07/26/2014 02:58 AM, Reza Taheri wrote:
> > Hi Craig,
> >
> >> According to the attached SQL, each frame is a separate phase in the
> operation and performs many different operations.
> >> There's a *lot* going on here, so identifying possible
> >> interdependencies isn't something I can do in a ten minute skim read over
> my morning coffee.
> >
> > You didn't think I was going to bug you all with a trivial problem,
> > did you? :-) :-)
>
> One can hope, but usually in vain...
>
> > Yes, I am going to have to take an axe to the code and see what pops out.
> Just to put this in perspective, the transaction flow and its statements are
> borrowed verbatim from the TPC-E benchmark. There have been dozens of
> TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that,
> although not used in public disclosures for various non-technical reasons, are
> used internally in by the DB and server companies. These 3 products, and
> perhaps more, were used extensively in the prototyping phase of TPC-E.
> >
> > So, my hope is that if there is a "previously unidentified interdependency
> between transactions" as you point out, it will be due to a mistake we made
> in coding this for PGSQL. Otherwise, we will have a hard time convincing all
> the council member companies that we need to change the schema or the
> business logic to make the kit work with PGSQL.
>
> Hopefully so.
>
> Personally I think it's moderately likely that PostgreSQL's much stricter
> enforcement of serializable isolation is detecting anomalies that other
> products do not, so it's potentially preventing errors.
>
> It would be nice to have the ability to tune this; sometimes there are
> anomalies you wish to ignore or permit. At present it is an all or nothing affair
> - no predicate locking (REPEATABLE READ isolation) or strict predicate locking
> (SERIALIZABLE isolation).
>
> I recommend running some of the examples in the SERIALIZABLE
> documentation on other products. If they don't fail where they do in Pg,
> then the other products either have less strict (and arguably therefor less
> correct) serialisable isolation enforcement or they rely on blocking predicate
> locks. In the latter case it should be easy to tell because statements will block
> on locks where no ordinary row or table level lock could be taken.
>
> If you do run comparative tests I would be quite interested in seeing the
> results.
>
> --
>  Craig Ringer
> https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com
> /&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjroD2HLPTH
> U27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=jyJSmq%2BgXoPIgY%2BNtgswlUg
> zHSm45s%2FmevjxBmPKrIs%3D%0A&s=401415fb62d6f76b22bc76469cbefb85
> 8342612f1fea2d359fe2bb3f18cab1ab
>  PostgreSQL Development, 24x7 Support, Training & Services