Re: Transactions over pathological TCP connections - Mailing list pgsql-hackers

From Leon Smith
Subject Re: Transactions over pathological TCP connections
Date
Msg-id CAPwAf1kGXm6YVKgAumdPR+Hu98Lpbzf8fWxwf4xwS6bmk9DRHg@mail.gmail.com
Whole thread Raw
In response to Re: Transactions over pathological TCP connections  (Leon Smith <leon.p.smith@gmail.com>)
List pgsql-hackers
I just realized this is essentially an instance of the Two General's Problem;  which is something I feel should have been more obvious to me.

On Tue, Jun 19, 2012 at 5:50 PM, Leon Smith <leon.p.smith@gmail.com> wrote:
On Tue, Jun 19, 2012 at 11:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The transaction would be committed before a command success report is
> delivered to the client, so I don't think delivered-and-not-marked is
> possible.

...unless you have configured synchronous_commit=off, or fsync=off.

Or unless your disk melts into a heap of slag and you have to restore
from backup.  You can protect against that last case using synchronous
replication.


But hard disk failure isn't in the failure model I was concerned about.   =)    To be perfectly honest,  I'm not too concerned with either hard drive failure or network failure,   as we are deploying on Raid 1+0 database server talking to the client over a redundant LAN,  and using asynchronous (Slony) replication to an identical database server just in case.   No single point of failure is a key philosophy of this app from top to bottom.     Like I said,  this is mostly idle curiosity.

But I'm also accustomed to trying to get work done on shockingly unreliable internet connections.   As a result, network failure is something I think about quite a lot when writing networked applications.   So this is not entirely idle curiosity either.

And thinking about this a bit more,   it's clear that the database has to commit before the result is sent,  on the off chance that the transaction fails and needs to be retried.   And that an explicit transaction block isn't really a solution either,  because   a "BEGIN;  SELECT dequeue_row()"  would get the row to the client without marking it as taken,   but the pathological TCP disconnect could then attack the  following "COMMIT;",  leaving the client to think that the row has not been actually taken when it in fact has.  

It's not clear to me that this is even a solvable problem without modifying the schema to include both a "taken" and a "finished processing" state,  and then letting elements be re-delievered after a period of time.   But this would then allow a pathological demon with the power to cause TCP connects have a single element delivered and processed multiple times.

In any case, thanks for the responses...

Best,
Leon

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Next
From: Simon Riggs
Date:
Subject: Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node