Thread: Repeatable read and serializable transactions see data committed after tx start

Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

03 November 2014, 19:14:17

    Hi!<br /><br />     Given a transaction started with "BEGIN.... (REPEATABLE READ | SERIALIZABLE)", if a concurrent
sessioncommits some data before *any* query within the first transaction, that committed data is seen by the
transaction.This is not what I'd expect. Specifically, the documentation states that:<br /><br /> "The Repeatable Read
isolationlevel only sees data committed before the transaction began;" [1]<br /><br />     IMHO, from a user
perspectivethe transaction begins when the BEGIN command is issued. If I really want to see a "frozen" view of the
databasebefore any real SELECT, I have to issue another query like "SELECT 1". This seems odd to me. I understand tx
snapshotmay be deferred until real execution for performance reasons, but it is confusing from a user perspective. Is
thisreally expected, or is it a bug? Am I having a bad day and missing some point here? ^_^<br /><br />     Regards,<br
/><br/>     Álvaro<br /><br /><br /> [1] <a class="moz-txt-link-freetext"
href="http://www.postgresql.org/docs/devel/static/transaction-iso.html">http://www.postgresql.org/docs/devel/static/transaction-iso.html</a><br
/><br/><br /> P.S. In case it wasn't clear what I meant, here's an example:<br /><br /><br /> Session 1               
                                                                           Session 2<br /><br /> CREATE TABLE i (i
integer);<br/> BEGIN ISOLATION LEVEL REPEATABLE READ;<br />                                                            
                                                       INSERT INTO i VALUES (1);<br /> SELECT i FROM i; -- returns 1
row,value 1<br /> -- should return empty set<br />                                                                    
                                               INSERT INTO i VALUES (2);<br /> SELECT i FROM i; -- returns 1 row, value
1<br/> -- returns, as it should, the same as the previous query<br /><br /><br /> In the first select, I'd have
expectedto have no rows. If a "SELECT 1" is issued after BEGIN, there are no rows found.<br /><br /><pre
class="moz-signature"cols="72">-- 
 
Álvaro Hernández Tortosa


-----------
8Kdata

</pre>

Re: Repeatable read and serializable transactions see data committed after tx start

From

Tom Lane

Date:

03 November 2014, 21:19:22

Álvaro Hernández Tortosa <aht@8Kdata.com> writes:
>      IMHO, from a user perspective the transaction begins when the BEGIN 
> command is issued. If I really want to see a "frozen" view of the 
> database before any real SELECT, I have to issue another query like 
> "SELECT 1". This seems odd to me. I understand tx snapshot may be 
> deferred until real execution for performance reasons, but it is 
> confusing from a user perspective. Is this really expected, or is it a 
> bug? Am I having a bad day and missing some point here? ^_^

It's expected.  Without this behavior, you could not take out any locks
before freezing the transaction snapshot, which would be a bad thing.
I think there are some examples in the "concurrency control" chapter
of the manual.
        regards, tom lane

Re: Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

03 November 2014, 23:31:54

On 03/11/14 22:19, Tom Lane wrote:
> Álvaro Hernández Tortosa <aht@8Kdata.com> writes:
>>       IMHO, from a user perspective the transaction begins when the BEGIN
>> command is issued. If I really want to see a "frozen" view of the
>> database before any real SELECT, I have to issue another query like
>> "SELECT 1". This seems odd to me. I understand tx snapshot may be
>> deferred until real execution for performance reasons, but it is
>> confusing from a user perspective. Is this really expected, or is it a
>> bug? Am I having a bad day and missing some point here? ^_^
> It's expected.  Without this behavior, you could not take out any locks
> before freezing the transaction snapshot, which would be a bad thing.
    Thank you for your comment, Tom. However I think this behavior, as 
seen from a user perspective, it's not the expected one. There may be 
some internal reasons for it, but for the user executing the 
transaction, it's normal to expect the freezing to happen right after 
the BEGIN, rather than *after* the first query.
    If it is still the intended behavior, I think it should be clearly 
documented as such, and a recommendation similar to "issue a 'SELECT 1' 
right after BEGIN to freeze the data before any own query" or similar 
comment should be added. Again, as I said in my email, the documentation 
clearly says that "only sees data committed before the transaction 
began". And this is clearly not the real behavior.


> I think there are some examples in the "concurrency control" chapter
> of the manual.
    Sure, there are, that was the link I pointed out, but I found no 
explicit mention to the fact that I'm raising here.

    Regards,
    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata

Re: Repeatable read and serializable transactions see data committed after tx start

From

Craig Ringer

Date:

04 November 2014, 08:07:23

On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote:
>     Thank you for your comment, Tom. However I think this behavior, as
> seen from a user perspective, it's not the expected one.

That may be the case, but I think it's the SQL-standard behaviour, so we
can't really mess with it.

The spec requires SET TRANSACTION ISOLATION, and you can't implement
that if you take a snapshot at BEGIN.

>     If it is still the intended behavior, I think it should be clearly
> documented as such, and a recommendation similar to "issue a 'SELECT 1'
> right after BEGIN to freeze the data before any own query" or similar
> comment should be added. Again, as I said in my email, the documentation
> clearly says that "only sees data committed before the transaction
> began". And this is clearly not the real behavior.

It's more of a difference in when the transaction "begins".

Arguably, "BEGIN" says "I intend to begin a new transaction with the
next query" rather than "immediately begin executing a new transaction".

This concept could be clearer in the docs.

>     Sure, there are, that was the link I pointed out, but I found no
> explicit mention to the fact that I'm raising here.

I'm sure it's documented *somewhere*, in that I remember reading about
this detail in the docs, but I can't find _where_ in the docs.

It doesn't seem to be in:

http://www.postgresql.org/docs/current/static/transaction-iso.html

where I'd expect.

In any case, we simply cannot take the snapshot at BEGIN time, because
it's permitted to:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

in a DB that has default serializable isolation or has a SET SESSION
CHARACTERISTICS isolation mode of serializable. Note that SET
TRANSACTION is SQL-standard.

AFAIK deferring the snapshot that's consistent with other RDBMSes that
use snapshots, too.

The docs of that command allude to, but doesn't explicitly state, the
behaviour you mention.

http://www.postgresql.org/docs/current/static/sql-set-transaction.html

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

05 November 2014, 00:11:54

On 04/11/14 09:07, Craig Ringer wrote:
> On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote:
>>      Thank you for your comment, Tom. However I think this behavior, as
>> seen from a user perspective, it's not the expected one.
> That may be the case, but I think it's the SQL-standard behaviour, so we
> can't really mess with it.
>
> The spec requires SET TRANSACTION ISOLATION, and you can't implement
> that if you take a snapshot at BEGIN.
    It's true that the standard mandates SET TRANSACTION rather than 
setting the isolation level with the BEGIN statement, and in any case 
you can raise/lower the isolation level with SET regardless of what the 
session or the begin command said. However, is it really a problem 
taking a snapshot at BEGIN time --only if the tx is started with BEGIN 
... (REPEATABLE READ | SERIALIZABLE)? AFAIK, and I may be missing some 
internal details here, the worst that can happen is that you took one 
extra, unnecessary snapshot. I don't see that as a huge problem.
    The standard (92) says that transaction is initiated when a 
transaction-initiating SQL-statement is executed. To be fair, that 
sounds to me more of a "SELECT" rather than a "BEGIN", but I may be wrong.
>
>>      If it is still the intended behavior, I think it should be clearly
>> documented as such, and a recommendation similar to "issue a 'SELECT 1'
>> right after BEGIN to freeze the data before any own query" or similar
>> comment should be added. Again, as I said in my email, the documentation
>> clearly says that "only sees data committed before the transaction
>> began". And this is clearly not the real behavior.
> It's more of a difference in when the transaction "begins".
>
> Arguably, "BEGIN" says "I intend to begin a new transaction with the
> next query" rather than "immediately begin executing a new transaction".
>
> This concept could be clearer in the docs.
    If this is really how it should behave, I'd +1000 to make it 
clearer in the docs, and to explicitly suggest the user to perform a 
query discarding the results early after BEGIN if the user wants the 
state freezed if there may span time between BEGIN and the real queries 
to be executed (like doing a SELECT 1).

>
>>      Sure, there are, that was the link I pointed out, but I found no
>> explicit mention to the fact that I'm raising here.
> I'm sure it's documented *somewhere*, in that I remember reading about
> this detail in the docs, but I can't find _where_ in the docs.
>
> It doesn't seem to be in:
>
> http://www.postgresql.org/docs/current/static/transaction-iso.html
>
> where I'd expect.
    Yepp, there's no mention there.

>
> In any case, we simply cannot take the snapshot at BEGIN time, because
> it's permitted to:
>
> SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
>
> in a DB that has default serializable isolation or has a SET SESSION
> CHARACTERISTICS isolation mode of serializable. Note that SET
> TRANSACTION is SQL-standard.
    As I said, AFAIK it shouldn't matter a lot to take the snapshot at 
BEGIN. The worst that can happen is that you end up in read committed 
and you need to take more snapshots, one per query.

>
> AFAIK deferring the snapshot that's consistent with other RDBMSes that
> use snapshots, too.
    I tried Oracle and SQL Server. SQL Server seems to behave as 
PostgreSQL, but just because it locks the table if accessed in a 
serializable transaction, so it definitely waits until select to lock 
it. However, Oracle behaved as I expected: data is frozen at BEGIN time. 
I haven't tested others.

>
>
> The docs of that command allude to, but doesn't explicitly state, the
> behaviour you mention.
>
> http://www.postgresql.org/docs/current/static/sql-set-transaction.html
>
>
    Should we improve then the docs stating this more clearly? Any 
objection to do this?
    Regards,
    Álvaro


-- 
Álvaro Hernández Tortosa


-----------
8Kdata

Re: Repeatable read and serializable transactions see data committed after tx start

From

Jim Nasby

Date:

05 November 2014, 16:46:45

On 11/4/14, 6:11 PM, Álvaro Hernández Tortosa wrote:
>      Should we improve then the docs stating this more clearly? Any objection to do this?

If we go that route we should also mention that now() will no longer be doing what you probably hope it would (AFAIK
it'sdriven by BEGIN and not the first snapshot).
 

Perhaps we should change how now() works, but I'm worried about what that might do to existing applications...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Repeatable read and serializable transactions see data committed after tx start

From

Kevin Grittner

Date:

05 November 2014, 20:49:38

Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:

> If we go that route we should also mention that now() will no
> longer be doing what you probably hope it would (AFAIK it's
> driven by BEGIN and not the first snapshot).

There is also the fact that pg_stat_activity shows a connection as
being "idle in transaction" as soon as BEGIN is executed; it does
not wait for the snapshot to be acquired.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Repeatable read and serializable transactions see data committed after tx start

From

Robert Haas

Date:

05 November 2014, 23:43:05

On Mon, Nov 3, 2014 at 2:14 PM, Álvaro Hernández Tortosa <aht@8kdata.com> wrote:
>     Given a transaction started with "BEGIN.... (REPEATABLE READ |
> SERIALIZABLE)", if a concurrent session commits some data before *any* query
> within the first transaction, that committed data is seen by the
> transaction. This is not what I'd expect.

I think the problem is with your expectation, not the behavior.
Serializable means that the transactions execute in such a fashion
that their parallel execution is equivalent to some serial order of
execution.  It doesn't say that it must be equivalent to any
particular order that you might imagine, such as the order in which
the transactions commit, or, as you propose, the order in which they
begin.  Generally, I think that's a good thing, because transaction
isolation is expensive: even at repeatable read, but for the need to
provide transaction isolation, there would be no such thing as bloat.
The repeatable read and serializable levels add further overhead of
various kinds.  We could provide a super-duper serializable level that
provides even tighter guarantees, but (1) I can't imagine many people
are keen to make the cost of serialization even higher than it already
is and (2) if you really want that behavior, just do some trivial
operation sufficient to cause a snapshot to be taken immediately after
the BEGIN.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

06 November 2014, 00:04:44

On 05/11/14 17:46, Jim Nasby wrote:
> On 11/4/14, 6:11 PM, Álvaro Hernández Tortosa wrote:
>>      Should we improve then the docs stating this more clearly? Any 
>> objection to do this?
>
> If we go that route we should also mention that now() will no longer 
> be doing what you probably hope it would (AFAIK it's driven by BEGIN 
> and not the first snapshot).
    If I understand you correctly, you mean that if we add a note to 
the documentation stating that the transaction really freezes when you 
do the first query, people would expect now() to be also frozen when the 
first query is done, which is not what happens (it's frozen at tx 
start). Then, yes, you're right, probably *both* the isolation levels 
and the now() function documentation should be patched to become more 
precise.

>
> Perhaps we should change how now() works, but I'm worried about what 
> that might do to existing applications...
    Perhaps, I also believe it might not be good for existing 
applications, but it definitely has a different freeze behavior, which 
seems inconsistent too.
    Thanks,
    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata

Re: Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

06 November 2014, 00:17:34

On 06/11/14 00:42, Robert Haas wrote:
> On Mon, Nov 3, 2014 at 2:14 PM, Álvaro Hernández Tortosa <aht@8kdata.com> wrote:
>>      Given a transaction started with "BEGIN.... (REPEATABLE READ |
>> SERIALIZABLE)", if a concurrent session commits some data before *any* query
>> within the first transaction, that committed data is seen by the
>> transaction. This is not what I'd expect.
> I think the problem is with your expectation, not the behavior.
    But my expectation is derived from the documentation:

"The Repeatable Read isolation level only sees data committed before the 
transaction began;"
    In PostgreSQL you will see data committed after a BEGIN ... 
(REPEATABLE READ | SERIALIZABLE) statement (only before the first 
query). And it's reasonable to "think" that transaction begins when you 
issue a BEGIN statement. It's also reasonable to think this way as:

- now() is frozen at BEGIN time, as Nasby pointed out
- pg_stat_activity says that the transaction is started, as Kevin mentioned
    So if the behavior is different from what the documentation says 
and what other external indicators may point out, I think at least 
documentation should be clear about this precise behavior, to avoid 
confusing users.

> Serializable means that the transactions execute in such a fashion
> that their parallel execution is equivalent to some serial order of
> execution.  It doesn't say that it must be equivalent to any
> particular order that you might imagine, such as the order in which
> the transactions commit, or, as you propose, the order in which they
> begin.  Generally, I think that's a good thing, because transaction
> isolation is expensive: even at repeatable read, but for the need to
> provide transaction isolation, there would be no such thing as bloat.
> The repeatable read and serializable levels add further overhead of
> various kinds.  We could provide a super-duper serializable level that
> provides even tighter guarantees, but (1) I can't imagine many people
> are keen to make the cost of serialization even higher than it already
> is and (2) if you really want that behavior, just do some trivial
> operation sufficient to cause a snapshot to be taken immediately after
> the BEGIN.
>
    I'm not really asking for a new isolation level, just that either 
BEGIN really freezes (for repeatable read and serializable) or if that's 
expensive and not going to happen, that the documentation clearly states 
the fact that freeze starts at first-query-time, and that if you need to 
freeze before your first real query, you should do a dummy one instead 
(like SELECT 1). Also, if this "early freeze" is a performance hit -and 
for that reason BEGIN is not going to be changed to freeze- then that 
also should be pointed out in the documentation, so that users that 
freeze early with "SELECT 1"-type queries understand that.
    Regards,
    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata

Re: Repeatable read and serializable transactions see data committed after tx start

From

Jim Nasby

Date:

06 November 2014, 01:06:41

On 11/5/14, 6:04 PM, Álvaro Hernández Tortosa wrote:
>
> On 05/11/14 17:46, Jim Nasby wrote:
>> On 11/4/14, 6:11 PM, Álvaro Hernández Tortosa wrote:
>>>      Should we improve then the docs stating this more clearly? Any objection to do this?
>>
>> If we go that route we should also mention that now() will no longer be doing what you probably hope it would (AFAIK
it'sdriven by BEGIN and not the first snapshot).
 
>
>      If I understand you correctly, you mean that if we add a note to the documentation stating that the transaction
reallyfreezes when you do the first query, people would expect now() to be also frozen when the first query is done,
whichis not what happens (it's frozen at tx start). Then, yes, you're right, probably *both* the isolation levels and
thenow() function documentation should be patched to become more precise.
 

Bingo.

Hrm, is there anything else that differs between the two?

>> Perhaps we should change how now() works, but I'm worried about what that might do to existing applications...
>
>      Perhaps, I also believe it might not be good for existing applications, but it definitely has a different freeze
behavior,which seems inconsistent too.
 

Yeah, I'd really rather fix it...

Hmm... we do have transaction_timestamp(); perhaps we could leave that as the time BEGIN executed and shift everything
elseto use the snapshot time.
 

Or we could do the opposite. I suspect that would be more likely to cause data quality errors, but anyone that treats
timestampsas magic values is going to have those anyway.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Repeatable read and serializable transactions see data committed after tx start

From

Tom Lane

Date:

06 November 2014, 06:13:06

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> Hmm... we do have transaction_timestamp(); perhaps we could leave that as the time BEGIN executed and shift
everythingelse to use the snapshot time.
 

It's not possible to take a timestamp that *exactly* matches the snapshot
time.   We could rearrange the code so that we ask the kernel for
timeofday just before or after capturing some relevant snapshot, but
there's still going to be some skew there.

In any case, I believe we document those timestamps as being the time of
arrival of a command from the client, not in terms of any snapshots.
To the extent that people use now() to represent the time of insertion of
data, I think the command arrival time is arguably the best definition.
Delaying it until the server code gets around to capturing a snapshot
would not be an improvement.
        regards, tom lane

Re: Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

06 November 2014, 12:11:19

On 06/11/14 02:06, Jim Nasby wrote:
> On 11/5/14, 6:04 PM, Álvaro Hernández Tortosa wrote:
>>
>> On 05/11/14 17:46, Jim Nasby wrote:
>>> On 11/4/14, 6:11 PM, Álvaro Hernández Tortosa wrote:
>>>>      Should we improve then the docs stating this more clearly? Any 
>>>> objection to do this?
>>>
>>> If we go that route we should also mention that now() will no longer 
>>> be doing what you probably hope it would (AFAIK it's driven by BEGIN 
>>> and not the first snapshot).
>>
>>      If I understand you correctly, you mean that if we add a note to 
>> the documentation stating that the transaction really freezes when 
>> you do the first query, people would expect now() to be also frozen 
>> when the first query is done, which is not what happens (it's frozen 
>> at tx start). Then, yes, you're right, probably *both* the isolation 
>> levels and the now() function documentation should be patched to 
>> become more precise.
>
> Bingo.
>
> Hrm, is there anything else that differs between the two?
>
>>> Perhaps we should change how now() works, but I'm worried about what 
>>> that might do to existing applications...
>>
>>      Perhaps, I also believe it might not be good for existing 
>> applications, but it definitely has a different freeze behavior, 
>> which seems inconsistent too.
>
> Yeah, I'd really rather fix it...
    There has been two comments which seem to state that changing this 
may introduce some performance problems and some limitations when you 
need to take out some locks. I still believe, however, that current 
behavior is confusing for the user. Sure, one option is to patch the 
documentation, as I was suggesting.
    But what about creating a flag to BEGIN and SET TRANSACTION 
commands, called "IMMEDIATE FREEZE" (or something similar), which 
applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set 
(and may be off by default, but of course the default may be 
configurable via a guc parameter), freeze happens when it is present 
(BEGIN or SET TRANSACTION) time. This would be a backwards-compatible 
change, while would provide the option of freezing without the nasty 
hack of having to do a "SELECT 1" prior to your real queries, and 
everything will of course be well documented.
    What do you think?

    Best regards,
    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata

Re: Repeatable read and serializable transactions see data committed after tx start

From

Kevin Grittner

Date:

06 November 2014, 14:00:12

Álvaro Hernández Tortosa <aht@8Kdata.com> wrote:

>     There has been two comments which seem to state that changing this
> may introduce some performance problems and some limitations when you
> need to take out some locks. I still believe, however, that current
> behavior is confusing for the user. Sure, one option is to patch the
> documentation, as I was suggesting.

Yeah, I thought that's what we were talking about, and in that
regard I agree that the docs could be more clear.  I'm not quite
sure what to say where to fix that, but I can see how someone could
be confused and have the expectation that once they have run BEGIN
TRANSACTION ISOLATION LEVEL SERIALIZABLE the transaction will not
see the work of transactions committing after that.  The fact that
this is possible is implied, if one reads carefully and thinks
about it, by the statement right near the start of the "Transaction
Isolation" section which says "any concurrent execution of a set of
Serializable transactions is guaranteed to produce the same effect
as running them one at a time in some order."  As Robert pointed
out, this is not necessarily the commit order or the transaction
start order.

It is entirely possible that if you have serializable transactions
T1 and T2, where T1 executes BEGIN first (and even runs a query
before T2 executes BEGIN) and T1 commits first, that T2 will
"appear" to have run first because it will look at a set of data
which T1 modifies and not see the changes.  If T1 were to *also*
look at a set of data which T2 modifies, then one of the
transactions would be rolled back with a serialization failure, to
prevent a cycle in the apparent order of execution; so the
requirements of the standard (and of most software which is
attempting to handle race conditions) is satisfied.  For many
popular benchmarks (and I suspect most common workloads) this
provides the necessary protections with better performance than is
possible using blocking to provide the required guarantees.[1]

At any rate, the language in that section is a little fuzzy on the
concept of the "start of the transaction."  Perhaps it would be
enough to change language like:

| sees a snapshot as of the start of the transaction, not as of the
| start of the current query within the transaction.

to:

| sees a snapshot as of the start of the first query within the
| transaction, not as of the start of the current query within the
| transaction.

Would that have prevented the confusion here?

>     But what about creating a flag to BEGIN and SET TRANSACTION
> commands, called "IMMEDIATE FREEZE" (or something similar), which
> applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set
> (and may be off by default, but of course the default may be
> configurable via a guc parameter), freeze happens when it is present
> (BEGIN or SET TRANSACTION) time. This would be a backwards-compatible
> change, while would provide the option of freezing without the nasty
> hack of having to do a "SELECT 1" prior to your real queries, and
> everything will of course be well documented.

What is the use case where you are having a problem?  This seems
like an odd solution, so it would be helpful to know what problem
it is attempting to solve.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Dan R. K. Ports and Kevin Grittner.  Serializable Snapshot
Isolation in PostgreSQL.  In VLDB, pages 1850--1861, 2012.
http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf
(see section 8 for performance graphs and numbers)

Re: Repeatable read and serializable transactions see data committed after tx start

From

"Greg Sabino Mullane"

Date:

06 November 2014, 14:24:45

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Kevin Grittner wrote:

(wording change suggestion)
>> | sees a snapshot as of the start of the first query within the
>> | transaction, not as of the start of the current query within the
>> | transaction.
>
> Would that have prevented the confusion here?

I think it may have, but I also think the wording should be much 
stronger and clearer, as this is unintuitive behavior. Consider 
this snippet from Bruce's excellent MVCC Unmasked presentation:

"A snapshot is recorded at the start of each SQL statement in 
READ COMMITTED transaction isolation mode, and at transaction start 
in SERIALIZABLE transaction isolation mode."

This is both correct and incorrect, depending on whether you consider 
a transaction to start with BEGIN; or with the first statement 
after the BEGIN. :) I think most people have always assumed that 
BEGIN starts the transaction and that is the point at which the snapshot 
is obtained.

>>     But what about creating a flag to BEGIN and SET TRANSACTION
>> commands, called "IMMEDIATE FREEZE" (or something similar), which
>> applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set
>> (and may be off by default, but of course the default may be
>> configurable via a guc parameter), freeze happens when it is present
>> (BEGIN or SET TRANSACTION) time. This would be a backwards-compatible
>> change, while would provide the option of freezing without the nasty
>> hack of having to do a "SELECT 1" prior to your real queries, and
>> everything will of course be well documented.

> What is the use case where you are having a problem?  This seems
> like an odd solution, so it would be helpful to know what problem
> it is attempting to solve.

Seems like a decent solution to me. The problem it that having to execute 
a dummy SQL statement to start a serializable transaction, rather 
than simply a BEGIN, is ugly.and error prone. Perhaps their app 
assumes (or even requires) that BEGIN starts the snapshot.

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201411060922
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlRbhD4ACgkQvJuQZxSWSsg/kwCdE9E+d3jDDpLOo4+08wCOMMxE
EHkAnj4uMO8cY6Jl0R19C/6lE6n3bae5
=syg9
-----END PGP SIGNATURE-----

Re: Repeatable read and serializable transactions see data committed after tx start

From

Kevin Grittner

Date:

06 November 2014, 14:46:11

Greg Sabino Mullane <greg@turnstep.com> wrote:
> Kevin Grittner wrote:

> (wording change suggestion)
>>> | sees a snapshot as of the start of the first query within the
>>> | transaction, not as of the start of the current query within the
>>> | transaction.
>>
>> Would that have prevented the confusion here?
>
> I think it may have, but I also think the wording should be much
> stronger and clearer, as this is unintuitive behavior. Consider
> this snippet from Bruce's excellent MVCC Unmasked presentation:
>
> "A snapshot is recorded at the start of each SQL statement in
> READ COMMITTED transaction isolation mode, and at transaction start
> in SERIALIZABLE transaction isolation mode."
>
> This is both correct and incorrect, depending on whether you consider
> a transaction to start with BEGIN; or with the first statement
> after the BEGIN. :) I think most people have always assumed that
> BEGIN starts the transaction and that is the point at which the snapshot
> is obtained.

But there is so much evidence to the contrary.  Not only does the
*name* of the command (BEGIN or START) imply a start, but
pg_stat_activity shows the connection "idle in transaction" after
the command (and before a snapshot is acquired), the Explicit
Locking section of the docs asserts that "Once acquired, a lock is
normally held till end of transaction", and the docs for the SET
command assert that "The effects of SET LOCAL last only till the
end of the current transaction, whether committed or not."  The end
of *which* transaction?  The one that started with BEGIN or START
and which might not (or in some cases *must* not) yet have a
snapshot.

>>>    But what about creating a flag to BEGIN and SET TRANSACTION
>>> commands, called "IMMEDIATE FREEZE" (or something similar), which
>>> applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set
>>> (and may be off by default, but of course the default may be
>>> configurable via a guc parameter), freeze happens when it is present
>>> (BEGIN or SET TRANSACTION) time. This would be a backwards-compatible
>>> change, while would provide the option of freezing without the nasty
>>> hack of having to do a "SELECT 1" prior to your real queries, and
>>> everything will of course be well documented.
>>
>> What is the use case where you are having a problem?  This seems
>> like an odd solution, so it would be helpful to know what problem
>> it is attempting to solve.
>
> Seems like a decent solution to me. The problem it that having to execute
> a dummy SQL statement to start a serializable transaction, rather
> than simply a BEGIN, is ugly.and error prone. Perhaps their app
> assumes (or even requires) that BEGIN starts the snapshot.

Why?  This "fix" might not deal with the bigger issues that I
discussed, like that the later-to-start and
later-to-acquire-a-snapshot transaction might logically be first in
the apparent order of execution.  You can't "fix" that without a
lot of blocking -- that most of us don't want.  Depending on *why*
they think this is important, they might need to be acquiring
various locks to prevent behavior they don't want, in which case
having acquired a snapshot at BEGIN would be exactly the *wrong*
thing to do.  The exact nature of the problem we're trying to solve
here does matter.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Repeatable read and serializable transactions see data committed after tx start

From

Tom Lane

Date:

06 November 2014, 14:56:08

Kevin Grittner <kgrittn@ymail.com> writes:
> Why?  This "fix" might not deal with the bigger issues that I
> discussed, like that the later-to-start and
> later-to-acquire-a-snapshot transaction might logically be first in
> the apparent order of execution.  You can't "fix" that without a
> lot of blocking -- that most of us don't want.  Depending on *why*
> they think this is important, they might need to be acquiring
> various locks to prevent behavior they don't want, in which case
> having acquired a snapshot at BEGIN would be exactly the *wrong*
> thing to do.  The exact nature of the problem we're trying to solve
> here does matter.

Another thing that I think hasn't been mentioned in this thread is
that we used to have severe problems with client libraries that like
to issue BEGIN and then go idle until they have something to do.
Which, for some reason, is a prevalent behavior.  That used to result
in problems like VACUUM not being able to clean up dead rows promptly.
We fixed that some time ago by making sure we didn't acquire an XID until
the first actual statement after BEGIN.  Snapshots as such were never a
problem for this, because we've never acquired a snapshot immediately at
BEGIN ... but if we did so, this problem would come right back.
        regards, tom lane

Re: Repeatable read and serializable transactions see data committed after tx start

From

Robert Haas

Date:

07 November 2014, 20:48:46

On Wed, Nov 5, 2014 at 7:17 PM, Álvaro Hernández Tortosa <aht@8kdata.com> wrote:
>>>      Given a transaction started with "BEGIN.... (REPEATABLE READ |
>>> SERIALIZABLE)", if a concurrent session commits some data before *any*
>>> query
>>> within the first transaction, that committed data is seen by the
>>> transaction. This is not what I'd expect.
>>
>> I think the problem is with your expectation, not the behavior.
>
>     But my expectation is derived from the documentation:
>
> "The Repeatable Read isolation level only sees data committed before the
> transaction began;"

Yes, that's inaccurate.  We should fix it.

(But we should not change the behavior, because I'm pretty sure that
the villagers would show up with pitchforks when various
currently-harmless scenarios caused massive database bloat.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Repeatable read and serializable transactions see data committed after tx start

From

"Greg Sabino Mullane"

Date:

07 November 2014, 21:02:14

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Kevin Grittner wrote:
>> I think most people have always assumed that
>> BEGIN starts the transaction and that is the point at 
>> which the snapshot is obtained.

> But there is so much evidence to the contrary.  Not only does the
> *name* of the command (BEGIN or START) imply a start, but
> pg_stat_activity shows the connection "idle in transaction" after
> the command (and before a snapshot is acquired)

Er...I think we are arguing the same thing here. So no contrary 
needed? :)

> Why?  This "fix" might not deal with the bigger issues that I
> discussed, like that the later-to-start and
> later-to-acquire-a-snapshot transaction might logically be first in
> the apparent order of execution.  You can't "fix" that without a
> lot of blocking -- that most of us don't want.

Right, which is why the suggestion of a user-controllable switch, 
that defaults to the current behavior, seems an excellent compromise.

> Depending on *why* they think this is important, they might need to 
> be acquiring various locks to prevent behavior they don't want, in which case
> having acquired a snapshot at BEGIN would be exactly the *wrong*
> thing to do.  The exact nature of the problem we're trying to solve
> here does matter.

I cannot speak to the OP, but I also do not think we should try and 
figure out every possible scenario people may have. Certainly the 
long-standing documentation bug may have caused some unexpected or 
unwanted behavior, so let's start by fixing that.

Tom Lane wrote:
> Another thing that I think hasn't been mentioned in this thread is
> that we used to have severe problems with client libraries that like
> to issue BEGIN and then go idle until they have something to do.
> Which, for some reason, is a prevalent behavior.

I'm not advocating changing the default behavior, but I would not want 
to see bad client libraries used a reason for any change we make. Clients 
should not be doing this, period, and there is no reason for us to 
support that.

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201411071600
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlRdMwwACgkQvJuQZxSWSsh/mgCeMdrj15bNVtzBhecG+QT2SlKh
jboAnAjctUcrlA2aCCQmIsSM87ulmFEn
=U5ld
-----END PGP SIGNATURE-----

Re: Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

08 November 2014, 20:48:39

<br /><div class="moz-cite-prefix">On 06/11/14 15:00, Kevin Grittner wrote:<br /></div><blockquote
cite="mid:1415282403.10098.YahooMailNeo@web122304.mail.ne1.yahoo.com"type="cite"><pre wrap="">Álvaro Hernández Tortosa
<aclass="moz-txt-link-rfc2396E" href="mailto:aht@8Kdata.com"><aht@8Kdata.com></a> wrote:
 

</pre><blockquote type="cite"><pre wrap="">    There has been two comments which seem to state that changing this
may introduce some performance problems and some limitations when you
need to take out some locks. I still believe, however, that current
behavior is confusing for the user. Sure, one option is to patch the
documentation, as I was suggesting.
</pre></blockquote><pre wrap="">
Yeah, I thought that's what we were talking about, and in that
regard I agree that the docs could be more clear.  I'm not quite
sure what to say where to fix that, but I can see how someone could
be confused and have the expectation that once they have run BEGIN
TRANSACTION ISOLATION LEVEL SERIALIZABLE the transaction will not
see the work of transactions committing after that.  The fact that
this is possible is implied, if one reads carefully and thinks
about it, by the statement right near the start of the "Transaction
Isolation" section which says "any concurrent execution of a set of
Serializable transactions is guaranteed to produce the same effect
as running them one at a time in some order."  As Robert pointed
out, this is not necessarily the commit order or the transaction
start order.

It is entirely possible that if you have serializable transactions
T1 and T2, where T1 executes BEGIN first (and even runs a query
before T2 executes BEGIN) and T1 commits first, that T2 will
"appear" to have run first because it will look at a set of data
which T1 modifies and not see the changes.  If T1 were to *also*
look at a set of data which T2 modifies, then one of the
transactions would be rolled back with a serialization failure, to
prevent a cycle in the apparent order of execution; so the
requirements of the standard (and of most software which is
attempting to handle race conditions) is satisfied.  For many
popular benchmarks (and I suspect most common workloads) this
provides the necessary protections with better performance than is
possible using blocking to provide the required guarantees.[1]</pre></blockquote><br />     Yes, you're right in that
the"any concurrent execution..." phrase implicitly means that snapshot may not be taken at BEGIN or SET TRANSACTION
time,but it's definitely not clear enough for the average user. Yet this may apply to the serializable case, but it
doesn'tto the repeatable read where the docs read " The Repeatable Read isolation level only sees data committed before
thetransaction began; it never sees either uncommitted data or changes committed during transaction execution by
concurrenttransactions". The first part is confusing, as we discussed; the second part is even more confusing as it
says"during transaction execution", and isn't the transaction -not the snapshot- beginning at BEGIN time?<br /><br />
   Surprisingly, the language is way more clear in the SET TRANSACTION doc page [2].<br />  <br /><blockquote
cite="mid:1415282403.10098.YahooMailNeo@web122304.mail.ne1.yahoo.com"type="cite"><pre wrap="">
 

At any rate, the language in that section is a little fuzzy on the
concept of the "start of the transaction."  Perhaps it would be
enough to change language like:

| sees a snapshot as of the start of the transaction, not as of the
| start of the current query within the transaction.

to:

| sees a snapshot as of the start of the first query within the
| transaction, not as of the start of the current query within the
| transaction.

Would that have prevented the confusion here?</pre></blockquote><br />     I think that definitely helps. But it may be
betterto make it even more clear, more explicit. And offering a solution for the user who may like the snapshot to be
taken"at begin time", like suggesting to do a "SELECT 1" query.<br /><blockquote
cite="mid:1415282403.10098.YahooMailNeo@web122304.mail.ne1.yahoo.com"type="cite"><pre wrap="">
 

</pre><blockquote type="cite"><pre wrap="">    But what about creating a flag to BEGIN and SET TRANSACTION
commands, called "IMMEDIATE FREEZE" (or something similar), which
applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set
(and may be off by default, but of course the default may be
configurable via a guc parameter), freeze happens when it is present
(BEGIN or SET TRANSACTION) time. This would be a backwards-compatible
change, while would provide the option of freezing without the nasty
hack of having to do a "SELECT 1" prior to your real queries, and
everything will of course be well documented.
</pre></blockquote><pre wrap="">
What is the use case where you are having a problem?  This seems
like an odd solution, so it would be helpful to know what problem
it is attempting to solve.</pre></blockquote><br />     I don't have a particular use case. I just came across the
issueand thought the documentation and behavior wasn't consistent. So the first aim is not to have users surprised (in
abad way). But I see a clear use case: users who might want to open a (repeatable read | serializable) transaction to
havetheir view of the database frozen, to perform any later operation on that frozen view. Sure, that comes at a
penalty,but I see that potentially interesting too.<br /><br />     Regards,<br />     <br />     Álvaro<br /><br /><br
/><br/> [1] <a class="moz-txt-link-freetext"
href="http://www.postgresql.org/docs/9.4/static/transaction-iso.html">http://www.postgresql.org/docs/9.4/static/transaction-iso.html</a><br
/>[2] <a class="moz-txt-link-freetext"
href="http://www.postgresql.org/docs/9.4/static/sql-set-transaction.html">http://www.postgresql.org/docs/9.4/static/sql-set-transaction.html</a><br
/><preclass="moz-signature" cols="72">-- 
 
Álvaro Hernández Tortosa


-----------
8Kdata

</pre>

Re: Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

08 November 2014, 20:53:28

On 07/11/14 22:02, Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
>
> Kevin Grittner wrote:
>>> I think most people have always assumed that
>>> BEGIN starts the transaction and that is the point at
>>> which the snapshot is obtained.
>> But there is so much evidence to the contrary.  Not only does the
>> *name* of the command (BEGIN or START) imply a start, but
>> pg_stat_activity shows the connection "idle in transaction" after
>> the command (and before a snapshot is acquired)
> Er...I think we are arguing the same thing here. So no contrary
> needed? :)
    So do we agree to fix the docs? ^_^

>
>> Why?  This "fix" might not deal with the bigger issues that I
>> discussed, like that the later-to-start and
>> later-to-acquire-a-snapshot transaction might logically be first in
>> the apparent order of execution.  You can't "fix" that without a
>> lot of blocking -- that most of us don't want.
> Right, which is why the suggestion of a user-controllable switch,
> that defaults to the current behavior, seems an excellent compromise.
    I also think so. It's backwards-compatible and opt-in. It also 
makes the documentation very clear, as there is an specific option for this.

>
>> Depending on *why* they think this is important, they might need to
>> be acquiring various locks to prevent behavior they don't want, in which case
>> having acquired a snapshot at BEGIN would be exactly the *wrong*
>> thing to do.  The exact nature of the problem we're trying to solve
>> here does matter.
> I cannot speak to the OP, but I also do not think we should try and
> figure out every possible scenario people may have. Certainly the
> long-standing documentation bug may have caused some unexpected or
> unwanted behavior, so let's start by fixing that.
    +1
>
> Tom Lane wrote:
>> Another thing that I think hasn't been mentioned in this thread is
>> that we used to have severe problems with client libraries that like
>> to issue BEGIN and then go idle until they have something to do.
>> Which, for some reason, is a prevalent behavior.
> I'm not advocating changing the default behavior, but I would not want
> to see bad client libraries used a reason for any change we make. Clients
> should not be doing this, period, and there is no reason for us to
> support that.
    If the "IMMEDIATE FREEZE" mode is not the default, as I suggested, 
it shouldn't introduce any problem with past code.
    Regards,
    Álvaro

Re: Repeatable read and serializable transactions see data committed after tx start

From

Bruce Momjian

Date:

20 March 2015, 20:43:50

On Sat, Nov  8, 2014 at 09:53:18PM +0100, Álvaro Hernández Tortosa wrote:
>
> On 07/11/14 22:02, Greg Sabino Mullane wrote:
> >Kevin Grittner wrote:
> >>>I think most people have always assumed that
> >>>BEGIN starts the transaction and that is the point at
> >>>which the snapshot is obtained.
> >>But there is so much evidence to the contrary.  Not only does the
> >>*name* of the command (BEGIN or START) imply a start, but
> >>pg_stat_activity shows the connection "idle in transaction" after
> >>the command (and before a snapshot is acquired)
> >Er...I think we are arguing the same thing here. So no contrary
> >needed? :)
>
>     So do we agree to fix the docs? ^_^

Doc patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

mvcc.diff

Re: Repeatable read and serializable transactions see data committed after tx start

From

Bruce Momjian

Date:

25 March 2015, 00:56:36

On Fri, Mar 20, 2015 at 04:43:42PM -0400, Bruce Momjian wrote:
> On Sat, Nov  8, 2014 at 09:53:18PM +0100, Álvaro Hernández Tortosa wrote:
> > 
> > On 07/11/14 22:02, Greg Sabino Mullane wrote:
> > >Kevin Grittner wrote:
> > >>>I think most people have always assumed that
> > >>>BEGIN starts the transaction and that is the point at
> > >>>which the snapshot is obtained.
> > >>But there is so much evidence to the contrary.  Not only does the
> > >>*name* of the command (BEGIN or START) imply a start, but
> > >>pg_stat_activity shows the connection "idle in transaction" after
> > >>the command (and before a snapshot is acquired)
> > >Er...I think we are arguing the same thing here. So no contrary
> > >needed? :)
> > 
> >     So do we agree to fix the docs? ^_^
> 
> Doc patch attached.

Patch applied.  Thanks for the report.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Repeatable read and serializable transactions see data committed after tx start

From

Álvaro Hernández Tortosa

Date:

25 March 2015, 02:08:57

On 24/03/15 20:56, Bruce Momjian wrote:
> On Fri, Mar 20, 2015 at 04:43:42PM -0400, Bruce Momjian wrote:
>> On Sat, Nov  8, 2014 at 09:53:18PM +0100, Álvaro Hernández Tortosa wrote:
>>> On 07/11/14 22:02, Greg Sabino Mullane wrote:
>>>> Kevin Grittner wrote:
>>>>>> I think most people have always assumed that
>>>>>> BEGIN starts the transaction and that is the point at
>>>>>> which the snapshot is obtained.
>>>>> But there is so much evidence to the contrary.  Not only does the
>>>>> *name* of the command (BEGIN or START) imply a start, but
>>>>> pg_stat_activity shows the connection "idle in transaction" after
>>>>> the command (and before a snapshot is acquired)
>>>> Er...I think we are arguing the same thing here. So no contrary
>>>> needed? :)
>>>      So do we agree to fix the docs? ^_^
>> Doc patch attached.
> Patch applied.  Thanks for the report.


    Awesome! Thanks! :)
    Álvaro


-- 
Álvaro Hernández Tortosa


-----------
8Kdata