Re: Repeatable read and serializable transactions see data committed after tx start - Mailing list pgsql-hackers

From Álvaro Hernández Tortosa
Subject Re: Repeatable read and serializable transactions see data committed after tx start
Date
Msg-id 545ABE14.4090508@8Kdata.com
Whole thread Raw
In response to Re: Repeatable read and serializable transactions see data committed after tx start  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Repeatable read and serializable transactions see data committed after tx start  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 06/11/14 00:42, Robert Haas wrote:
> On Mon, Nov 3, 2014 at 2:14 PM, Álvaro Hernández Tortosa <aht@8kdata.com> wrote:
>>      Given a transaction started with "BEGIN.... (REPEATABLE READ |
>> SERIALIZABLE)", if a concurrent session commits some data before *any* query
>> within the first transaction, that committed data is seen by the
>> transaction. This is not what I'd expect.
> I think the problem is with your expectation, not the behavior.
    But my expectation is derived from the documentation:

"The Repeatable Read isolation level only sees data committed before the 
transaction began;"
    In PostgreSQL you will see data committed after a BEGIN ... 
(REPEATABLE READ | SERIALIZABLE) statement (only before the first 
query). And it's reasonable to "think" that transaction begins when you 
issue a BEGIN statement. It's also reasonable to think this way as:

- now() is frozen at BEGIN time, as Nasby pointed out
- pg_stat_activity says that the transaction is started, as Kevin mentioned
    So if the behavior is different from what the documentation says 
and what other external indicators may point out, I think at least 
documentation should be clear about this precise behavior, to avoid 
confusing users.

> Serializable means that the transactions execute in such a fashion
> that their parallel execution is equivalent to some serial order of
> execution.  It doesn't say that it must be equivalent to any
> particular order that you might imagine, such as the order in which
> the transactions commit, or, as you propose, the order in which they
> begin.  Generally, I think that's a good thing, because transaction
> isolation is expensive: even at repeatable read, but for the need to
> provide transaction isolation, there would be no such thing as bloat.
> The repeatable read and serializable levels add further overhead of
> various kinds.  We could provide a super-duper serializable level that
> provides even tighter guarantees, but (1) I can't imagine many people
> are keen to make the cost of serialization even higher than it already
> is and (2) if you really want that behavior, just do some trivial
> operation sufficient to cause a snapshot to be taken immediately after
> the BEGIN.
>
    I'm not really asking for a new isolation level, just that either 
BEGIN really freezes (for repeatable read and serializable) or if that's 
expensive and not going to happen, that the documentation clearly states 
the fact that freeze starts at first-query-time, and that if you need to 
freeze before your first real query, you should do a dummy one instead 
(like SELECT 1). Also, if this "early freeze" is a performance hit -and 
for that reason BEGIN is not going to be changed to freeze- then that 
also should be pointed out in the documentation, so that users that 
freeze early with "SELECT 1"-type queries understand that.
    Regards,
    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata




pgsql-hackers by date:

Previous
From: Álvaro Hernández Tortosa
Date:
Subject: Re: Repeatable read and serializable transactions see data committed after tx start
Next
From: Steve Singer
Date:
Subject: Re: tracking commit timestamps