Thread: Serialization errors on single threaded request stream

Serialization errors on single threaded request stream

From
"Kevin Grittner"
Date:
I have an odd one here.  I was unable to find it with a search of the maili=
ng lists.  I've spent a few hours trying to create a simple test case, but =
so far these simple cases aren't showing the problem.  I want to make sure =
this isn't a know problem before investing more time trying to come up with=
 a test case suffiently complex to expose the problem.
=20
The problem is this:  a single thread is submitting database updates throug=
h a middle tier which has a pool of connections.  There are no guarantees o=
f which connection will be used for any request.  Each request is commited =
as its own database transaction before the middle tier responds to the requ=
ester, which then immediately submits the next request.  Nothing else it hi=
tting the database.  We are getting serialization errors.
=20
If we add a 1 ms delay on the client side between requests to the middle ti=
er, the frequency of these errors drops by about two orders of magnitude.  =
With a 100 ms delay, we haven't seen any.
=20
The pattern of activity which causes the problem involves a single database=
 transaction with inserts and updates to many tables, including one with a =
potentially large blob, followed by an update to a numeric column in a row =
which tracks progress.  The serialization errors are happening on this fina=
l update.  My simple test cases use a single thread on two JDBC connection =
emulating just this final update, and the problem does not show up.
=20
We have the same behavior on 8.0.3 and the develpment snapshot from yesterd=
ay.  (I haven't gotten a test run from today's beta release yet -- I need t=
o coordinate the test with someone else who's not here right now.  I'll fol=
low up if the beta release changes this behavior.)
=20
The server is SuSE 9.3 with dual xeons and xfs on a SAN.  The client and mi=
ddle tier for these tests have been on Windows XP.  The requests are going =
through JDBC.
=20
Does this behavior sound familiar to anyone?
=20
-Kevin
=20

Re: Serialization errors on single threaded request stream

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> The problem is this:  a single thread is submitting database updates through a middle tier which has a pool of
connections. There are no guarantees of which connection will be used for any request.  Each request is commited as its
owndatabase transaction before the middle tier responds to the requester, which then immediately submits the next
request. Nothing else it hitting the database.  We are getting serialization errors. 

Hm.  Are you sure your middle tier is actually waiting for the commit
to come back before it claims the transaction is done?

            regards, tom lane

Re: Serialization errors on single threaded request stream

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> What happens if the timestamp of the commit is an exact match for the
> timestamp of the next transaction start?  What is the resolution of
> the time sampling?

It's not done via timestamps: rather, each transaction takes a census
of the transaction XIDs that are running in other backends when it
starts (there is an array in shared memory that lets it get this
information cheaply).  Reliability of the system clock is not a factor.

Are you sure the server is 8.0.3?  There was a bug in prior releases
that might possibly be related:

2005-05-07 17:22  tgl

    * src/backend/utils/time/: tqual.c (REL7_3_STABLE), tqual.c
    (REL7_4_STABLE), tqual.c (REL7_2_STABLE), tqual.c (REL8_0_STABLE),
    tqual.c: Adjust time qual checking code so that we always check
    TransactionIdIsInProgress before we check commit/abort status.
    Formerly this was done in some paths but not all, with the result
    that a transaction might be considered committed for some purposes
    before it became committed for others.    Per example found by Jan
    Wieck.

My recollection though is that this only affected applications that were
using SELECT FOR UPDATE.  In any case, it's pretty hard to see how this
would affect an application that is in fact waiting for the backend to
report commit-done before it launches the next transaction; the
race-condition window we were concerned about no longer exists by the
time the backend sends CommandComplete.  So my suspicion remains fixed
on that point.  Do you have any way of sniffing the network traffic of
the middle-tier to confirm that it's doing what it's supposed to?

            regards, tom lane