Re: Serialization errors on single threaded request - Mailing list pgsql-bugs

From Kevin Grittner
Subject Re: Serialization errors on single threaded request
Date
Msg-id s30f18df.021@gwmta.wicourts.gov
Whole thread Raw
Responses Re: Serialization errors on single threaded request
List pgsql-bugs
Unfortunately, the original test environment has been blown away in favor o=
f testing the 8.1 beta release.  I can confirm that the problem exists on a=
 build of the 8.1 beta.  If it would be helpful I could set it up again on =
8.0.3 to confirm.  I THINK it was actually the tip of the 8.0 stable branch=
 as opposed to the 8.0.3 release proper.
=20
We have a little more information about the failure pattern -- when we get =
these, it is always after there has been a rollback on the thread which eve=
ntually generates the serialization error.  So I think the pattern is:
=20
ConnectionA:
  -  A series of insert/update/deletes (on tables OTHER than the progress t=
able).
  -  Update the progress table.
  -  Commit the transaction.
ConnectionB:
  -  A series of insert/update/deletes (on tables OTHER than the progress t=
able) fails.
  -  Rollback the transaction.
  -  Attempt each insert/update/delete individually.   Commit or rollback e=
ach as we go.
  -  Attempt to update the progress table -- fail on serialization error.
=20
To avoid any ambiguity in my former posts -- introducing even a very small =
delay between the operations on ConnectionA and ConnectionB makes the seria=
lization error very infrequent; introducing a larger delay seems to make it=
 go away.  I hate to consider that as a solution, however.
=20
I'm afraid I'm not familiar with a good way to capture the stream of commun=
ications with the database server.  If you could point me in the right dire=
ction, I'll give it my best shot.
=20
I did just have a thought, though -- is there any chance that the JDBC Conn=
ection.commit is returning once the command is written to the TCP buffer, a=
nd I'm getting hurt by some network latency issues -- the Nagle algorithm o=
r some such?  (I assume that the driver is waiting for a response from the =
server before returning, so this shouldn't be the issue.)  At the point tha=
t the commit confirmation is sent by the server, I assume the shared memory=
 changes are visible to the other processes?
=20
-Kevin
=20
=20
>>> Tom Lane <tgl@sss.pgh.pa.us> 08/26/05 12:16 PM >>>
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> What happens if the timestamp of the commit is an exact match for the
> timestamp of the next transaction start?  What is the resolution of
> the time sampling?

It's not done via timestamps: rather, each transaction takes a census
of the transaction XIDs that are running in other backends when it
starts (there is an array in shared memory that lets it get this
information cheaply).  Reliability of the system clock is not a factor.

Are you sure the server is 8.0.3?  There was a bug in prior releases
that might possibly be related:

2005-05-07 17:22  tgl

    * src/backend/utils/time/: tqual.c (REL7_3_STABLE), tqual.c
    (REL7_4_STABLE), tqual.c (REL7_2_STABLE), tqual.c (REL8_0_STABLE),
    tqual.c: Adjust time qual checking code so that we always check
    TransactionIdIsInProgress before we check commit/abort status.=20
    Formerly this was done in some paths but not all, with the result
    that a transaction might be considered committed for some purposes
    before it became committed for others.    Per example found by Jan
    Wieck.

My recollection though is that this only affected applications that were
using SELECT FOR UPDATE.  In any case, it's pretty hard to see how this
would affect an application that is in fact waiting for the backend to
report commit-done before it launches the next transaction; the
race-condition window we were concerned about no longer exists by the
time the backend sends CommandComplete.  So my suspicion remains fixed
on that point.  Do you have any way of sniffing the network traffic of
the middle-tier to confirm that it's doing what it's supposed to?

            regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Serialization errors on single threaded request stream
Next
From: "Danilo Barbosa"
Date:
Subject: BUG #1850: parameter WITH HOLD (of function DECLARE CURSOR) not acepted inside CREATE FUNCTION.