Re: Batches, error handling and transaction in the protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Batches, error handling and transaction in the protocol
Date
Msg-id CADT4RqDrM6rpOxk59M17vAaMpk_H2EXkUU4DhUwL9ufERiCPsQ@mail.gmail.com
Whole thread Raw
In response to Batches, error handling and transaction in the protocol  (Shay Rojansky <roji@roji.org>)
List pgsql-hackers
> More generally speaking, the protocol appears to couple two different things which may be unrelated. On the one hand, we have a protocol
> sync mechanism for error recovery (skip until Sync). One the other hand, we have an implicit transaction for extended query messages until
> that same Sync. It seems valid to want to have error recovery without an implicit transaction, but this doesn't seem supported by the current
> protocol (I could add a note for v4).

In the absence of any response on my message from September 28th, I've added a todo item for wire protocol v4 (separate transaction delineation from protocol error recovery).


On Wed, Sep 28, 2016 at 6:04 PM, Shay Rojansky <roji@roji.org> wrote:
Hi everyone, I'd appreciate some guidance on an issue that's been raised with Npgsql, input from other driver writers would be especially helpful.

Npgsql currently supports batching (or pipelining) to avoid roundtrips, and sends a Sync message only at the end of the batch (so Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync). The reasoning is that if the first statement in the batch fails, the others shouldn't be processed. This seems to be the standard approach (the proposed patch for libpq seems to do the same).

At the same time, if the batch doesn't occur within an explicit transaction (i.e. after BEGIN), it is automatically wrapped in an implicit transaction, with Sync committing it. This can, for example, provoke deadlocks if two batches try to update the same rows in reverse order. The problem is that the user didn't request a transaction in any way - they're just using batching to avoid roundtrips and their intention is to be in autocommit mode.

One possible solution for this would be to insert a Sync after every execute in the batch, rather than a single Sync at the very end. This would make batches work the same as unbatched statements, and would resolve the deadlocks. However, behavior in case of error would be problematic: PostgreSQL would continue executing later messages if earlier ones failed, Npgsql would have to deal with multiple errors, etc.

More generally speaking, the protocol appears to couple two different things which may be unrelated. On the one hand, we have a protocol sync mechanism for error recovery (skip until Sync). One the other hand, we have an implicit transaction for extended query messages until that same Sync. It seems valid to want to have error recovery without an implicit transaction, but this doesn't seem supported by the current protocol (I could add a note for v4).

Finally, to give more context, a Microsoft developer ran into this while running ASP.NET benchmarks over Npgsql and its Entity Framework Core ORM provider. One of EFCore's great new features is that it batches database updates into a single roundtrip, but this triggered deadlocks. Whereas in many cases it's OK to tell users to solve the deadlocks by properly ordering their statements, when an ORM is creating the batch it's a more difficult proposition.

Thanks for any thoughts or guidance!

Shay

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Query cancel seems to be broken in master since Oct 17
Next
From: Vladimir Gordiychuk
Date:
Subject: Re: Query cancel seems to be broken in master since Oct 17