Thread: Read Uncommitted

Read Uncommitted

From

Simon Riggs

Date:

25 May 2008, 20:43:06

Currently, PostgreSQL implements the SQL Standard transaction isolation
level of Read Uncommitted as being equivalent to Read Committed. That is
a valid decision, though I wish to propose an alternative.

At the moment, a long running SQL Statement can prevent xmin moving
forward, which can result in VACUUM and HOT not being able to remove row
versions effectively. My proposal is that a Read Uncommitted transaction
would not prevent row removal, which then offers no guarantee that the
"correct" answer would be returned. Which is *exactly* what that
transaction isolation level was designed for.

In many cases, an application designer may be able to tell that a
particular query will always return the correct answer. For example, we
may query against data which is known not to change, even though other
data in the same database cluster may be subject to frequent change.
e.g. queries against large insert-only tables.

By allowing the user to select a non-default isolation level we would
allow long running queries to have significantly less disruption on
other operations. Read Uncommitted would imply Read Only status, so
various data change operations would be rejected. No errors would
result, just that some data would be missing from the answer, iff data
had been removed during execution. It might be possible to get a table
does not exist error because non-transactional pg_class updates take
place; transactional pg_class updates would be prevented by
relation-level locking.

I would *not* allow this as an option or default for pg_dump, since this
would almost certainly result in error. No footguns allowed in so
critical a utility.

The implementation is trivial, namely that the calculation of global
xmin would ignore Read Uncommitted transactions.

Read Uncommitted would not be the default, so novice users would be in
no danger. Bear in mind that Read Uncommitted is commonly used in other
database systems, so this feature is already understood and even
expected by many database experts from other backgrounds. Many
PostgreSQL experts would also be able to take advantage of this feature.
Since its part of the SQL Standard this "dirty read" isn't likely to
take anybody by surprise that reads any of (PostgreSQL manual, SQL
Standard, other RDBMS manual).

Comments?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Read Uncommitted

From

Tom Lane

Date:

25 May 2008, 21:11:40

Simon Riggs <simon@2ndquadrant.com> writes:
> At the moment, a long running SQL Statement can prevent xmin moving
> forward, which can result in VACUUM and HOT not being able to remove row
> versions effectively. My proposal is that a Read Uncommitted transaction
> would not prevent row removal, which then offers no guarantee that the
> "correct" answer would be returned. Which is *exactly* what that
> transaction isolation level was designed for.

This seems like a remarkably bad idea.  It has no use that I can see for
anything except shooting oneself in the foot.

> The implementation is trivial, namely that the calculation of global
> xmin would ignore Read Uncommitted transactions.

This proposed implementation seems to have very little to do with
what most people would think Read Uncommitted does.  In particular it
does not agree with the SQL spec, which says that Read Uncommitted
permits transactions to see the results of as-yet-uncommitted
transactions.  As an example, suppose that old transaction T1 reads
a table that always contains exactly one live row, but T2 has modified
that row since T1 started.  Per spec what Read Uncommitted means is that
T1 may see the modified version before T2 commits.  This would not
implement that behavior, though.  What this implementation would mean is
that after T2 commits, we might vacuum away the old version, causing T1
to see *nothing* in the table:
* T1 scans the new row version, doesn't return it because T2 not  yet committed;* T2 commits;* VACUUM removes old row
version;*T1 never finds old row version to return.

Even if we thought that supporting Read Uncommitted would be a good
idea, this would be an unacceptably unstable implementation of it.
        regards, tom lane

Re: Read Uncommitted

From

Simon Riggs

Date:

25 May 2008, 21:47:16

On Sun, 2008-05-25 at 20:10 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > At the moment, a long running SQL Statement can prevent xmin moving
> > forward, which can result in VACUUM and HOT not being able to remove row
> > versions effectively. My proposal is that a Read Uncommitted transaction
> > would not prevent row removal, which then offers no guarantee that the
> > "correct" answer would be returned. Which is *exactly* what that
> > transaction isolation level was designed for.
> 
> This seems like a remarkably bad idea.  It has no use that I can see for
> anything except shooting oneself in the foot.
> 
> > The implementation is trivial, namely that the calculation of global
> > xmin would ignore Read Uncommitted transactions.
> 
> This proposed implementation seems to have very little to do with
> what most people would think Read Uncommitted does.  In particular it
> does not agree with the SQL spec,
> ...(snip)...
> 
> Even if we thought that supporting Read Uncommitted would be a good
> idea, this would be an unacceptably unstable implementation of it.

Well, the thought of an unstable "dirty read" seems strange, but I get
your point that it isn't the SQL Standard's Read Uncommitted mode.

The use of this is clear though: allowing long running transactions
against unchanging data to not interfere with other activities. It will
also have importance in a Hot Standby mode.

So, even if this isn't Read Uncommitted exactly, it is a transaction
isolation mode that has utility for us and I would like to support it,
well documented and with appropriate cautions.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Read Uncommitted

From

ITAGAKI Takahiro

Date:

26 May 2008, 08:01:52

Simon Riggs <simon@2ndquadrant.com> wrote:

> The use of this is clear though: allowing long running transactions
> against unchanging data to not interfere with other activities. It will
> also have importance in a Hot Standby mode.

I have an use of the dirty read -- pg_start_backup().
In 8.3, pg_start_backup takes long time, that is typically
{ checkpoint_timeout * checkpoint_completion_target }.

If we have some updating transaction during pg_start_backup,
updated tables cannot be vacuumed well. READ UNCOMMITTED mode
could help us in such a situation.
 BEGIN; SET TRANSACTION READ UNCOMMITTED; SELECT pg_start_backup(timeofday()); END;

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Re: Read Uncommitted

From

"Koichi Suzuki"

Date:

26 May 2008, 09:11:15

Hi,

Because Read Uncommitted shows all the "yet not committed" version, it
seems to me that there's no problem to show the new version of tuples
to Read Uncommitted transacations as follows:
SQLs              (old version)             (new version)
-------------------+----------------------------+------------------------------
INSERT            none                       new tuple
UPDATE           old tupe                 new tuple
DELETE           old tuple                 none

If update transaction fails, then new version is invalid and the old
version should be shown to Read Uncomitted transcactions.
Therefore, there're no Read Committed or Serializable transactions
which refers to old version tuples, it could be thought that old
version can be vacuumed or swept.

Obviously, to  "read committed" transactions, old version has to be shown.

Any advices?

2008/5/26 ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
>
> Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> The use of this is clear though: allowing long running transactions
>> against unchanging data to not interfere with other activities. It will
>> also have importance in a Hot Standby mode.
>
> I have an use of the dirty read -- pg_start_backup().
> In 8.3, pg_start_backup takes long time, that is typically
> { checkpoint_timeout * checkpoint_completion_target }.
>
> If we have some updating transaction during pg_start_backup,
> updated tables cannot be vacuumed well. READ UNCOMMITTED mode
> could help us in such a situation.
>
>  BEGIN;
>  SET TRANSACTION READ UNCOMMITTED;
>  SELECT pg_start_backup(timeofday());
>  END;
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
------
Koichi Suzuki

Re: Read Uncommitted

From

ITAGAKI Takahiro

Date:

26 May 2008, 09:42:33

"Koichi Suzuki" <koichi.szk@gmail.com> wrote:

> Because Read Uncommitted shows all the "yet not committed" version, it
> seems to me that there's no problem to show the new version of tuples
> to Read Uncommitted transacations as follows:

Another transaction could update the retuned tuple, which is the newest
at that time, and the reading transaction could read updated version of
the tuple. We might return different version of identical tuples,
although there is *no timing* multiple tuples exist.

This is an another side of the problem shown by Tom, where we have
no tuples if we hide new tuples and then old tuples are removed.

For example, "SELCT count(*) FROM tbl" by READ UNCOMMITTED transactions
should always return correct values even if we only runs UPDATEs
at the same time.

I guess defining READ UNCOMMITTED is quite difficult
-- it should be efficient, but should not be chaotic.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Re: Read Uncommitted

From

"Koichi Suzuki"

Date:

26 May 2008, 10:18:39

I agree that  such instant freeze of the whole database status is not
feasible.   We may be able to find compromise what to show and what to
hide.    Anyway, read uncommitted cannot guarantee repeatable read and
can provide only "approximate" result.   Even though we are successful
to show instant freeze, the same SQL may provide different result at
the next instant.

2008/5/26 ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
>
> "Koichi Suzuki" <koichi.szk@gmail.com> wrote:
>
>> Because Read Uncommitted shows all the "yet not committed" version, it
>> seems to me that there's no problem to show the new version of tuples
>> to Read Uncommitted transacations as follows:
>
> Another transaction could update the retuned tuple, which is the newest
> at that time, and the reading transaction could read updated version of
> the tuple. We might return different version of identical tuples,
> although there is *no timing* multiple tuples exist.
>
> This is an another side of the problem shown by Tom, where we have
> no tuples if we hide new tuples and then old tuples are removed.
>
> For example, "SELCT count(*) FROM tbl" by READ UNCOMMITTED transactions
> should always return correct values even if we only runs UPDATEs
> at the same time.
>
> I guess defining READ UNCOMMITTED is quite difficult
> -- it should be efficient, but should not be chaotic.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
>



-- 
------
Koichi Suzuki

Re: Read Uncommitted

From

Peter Eisentraut

Date:

26 May 2008, 12:21:25

Am Montag, 26. Mai 2008 schrieb Simon Riggs:
> At the moment, a long running SQL Statement can prevent xmin moving
> forward, which can result in VACUUM and HOT not being able to remove row
> versions effectively. My proposal is that a Read Uncommitted transaction
> would not prevent row removal, which then offers no guarantee that the
> "correct" answer would be returned. Which is *exactly* what that
> transaction isolation level was designed for.
>
> In many cases, an application designer may be able to tell that a
> particular query will always return the correct answer. For example, we
> may query against data which is known not to change, even though other
> data in the same database cluster may be subject to frequent change.
> e.g. queries against large insert-only tables.

If the data in a table never changes, why would VACUUM or HOT need to touch 
it?  The use case isn't clear to me.

Re: Read Uncommitted

From

Hannu Krosing

Date:

26 May 2008, 13:57:10

On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote:
> Am Montag, 26. Mai 2008 schrieb Simon Riggs:
> > At the moment, a long running SQL Statement can prevent xmin moving
> > forward, which can result in VACUUM and HOT not being able to remove row
> > versions effectively. My proposal is that a Read Uncommitted transaction
> > would not prevent row removal, which then offers no guarantee that the
> > "correct" answer would be returned. Which is *exactly* what that
> > transaction isolation level was designed for.
> >
> > In many cases, an application designer may be able to tell that a
> > particular query will always return the correct answer. For example, we
> > may query against data which is known not to change, even though other
> > data in the same database cluster may be subject to frequent change.
> > e.g. queries against large insert-only tables.
> 
> If the data in a table never changes, why would VACUUM or HOT need to touch 
> it?  The use case isn't clear to me.

I guess the use-case is about a long read-write transaction doing
read-only access to an update-only table and thus blocking vacuum on
other tables.

--------------
Hannu

Re: Read Uncommitted

From

Tom Lane

Date:

26 May 2008, 14:26:13

Hannu Krosing <hannu@krosing.net> writes:
> On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote:
>> If the data in a table never changes, why would VACUUM or HOT need to touch 
>> it?  The use case isn't clear to me.

> I guess the use-case is about a long read-write transaction doing
> read-only access to an update-only table and thus blocking vacuum on
> other tables.

... in which case the proposed kluge would result in unstable,
unpredictable answers, so there is still no plausible use-case.
        regards, tom lane

Re: Read Uncommitted

From

Hannu Krosing

Date:

26 May 2008, 15:06:47

On Mon, 2008-05-26 at 13:25 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@krosing.net> writes:
> > On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote:
> >> If the data in a table never changes, why would VACUUM or HOT need to touch 
> >> it?  The use case isn't clear to me.
> 
> > I guess the use-case is about a long read-write transaction doing
> > read-only access to an update-only table and thus blocking vacuum on
> > other tables.
> 
> ... in which case the proposed kluge would result in unstable,
> unpredictable answers, so there is still no plausible use-case.

maybe it was meant as a super-power-user tool (and a big footgun) .

btw, when is a transaction id currently assigned to a transaction - when
INSERT/UPDATE/DELETE statement is first seen, or when data is actually
modified ?

that is when doing

INSERT INTO logtable 
SELECT current_timestamp, count(*) FROM really_huge_table;

will there be a transaction id for just the tiny moment the returned row
is inserted or for the whole count(*) time ?

----------------
Hannu

Re: Read Uncommitted

From

Tom Lane

Date:

26 May 2008, 16:08:58

Hannu Krosing <hannu@krosing.net> writes:
> btw, when is a transaction id currently assigned to a transaction - when
> INSERT/UPDATE/DELETE statement is first seen, or when data is actually
> modified ?

AFAIR it doesn't happen until the XID is needed to stamp a tuple with.
(There might be some other places that demand the current XID, but
that's the normal case.)
        regards, tom lane

Re: Read Uncommitted

From

Simon Riggs

Date:

26 May 2008, 16:30:11

On Mon, 2008-05-26 at 13:25 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@krosing.net> writes:
> > On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote:
> >> If the data in a table never changes, why would VACUUM or HOT need to touch 
> >> it?  The use case isn't clear to me.
> 
> > I guess the use-case is about a long read-write transaction doing
> > read-only access to an update-only table and thus blocking vacuum on
> > other tables.
> 
> ... in which case the proposed kluge would result in unstable,
> unpredictable answers, so there is still no plausible use-case.

Separate databases?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Read Uncommitted

From

"Heikki Linnakangas"

Date:

26 May 2008, 17:35:28

Simon Riggs wrote:
> On Mon, 2008-05-26 at 13:25 -0400, Tom Lane wrote:
>> Hannu Krosing <hannu@krosing.net> writes:
>>> On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote:
>>>> If the data in a table never changes, why would VACUUM or HOT need to touch 
>>>> it?  The use case isn't clear to me.
>>> I guess the use-case is about a long read-write transaction doing
>>> read-only access to an update-only table and thus blocking vacuum on
>>> other tables.
>> ... in which case the proposed kluge would result in unstable,
>> unpredictable answers, so there is still no plausible use-case.
> 
> Separate databases?

OldestXmin calculation only includes transactions in the same database, 
except when vacuuming shared relations.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Read Uncommitted

From

Simon Riggs

Date:

26 May 2008, 17:38:14

On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote:
> Am Montag, 26. Mai 2008 schrieb Simon Riggs:
> > At the moment, a long running SQL Statement can prevent xmin moving
> > forward, which can result in VACUUM and HOT not being able to remove row
> > versions effectively. My proposal is that a Read Uncommitted transaction
> > would not prevent row removal, which then offers no guarantee that the
> > "correct" answer would be returned. Which is *exactly* what that
> > transaction isolation level was designed for.
> >
> > In many cases, an application designer may be able to tell that a
> > particular query will always return the correct answer. For example, we
> > may query against data which is known not to change, even though other
> > data in the same database cluster may be subject to frequent change.
> > e.g. queries against large insert-only tables.
> 
> If the data in a table never changes, why would VACUUM or HOT need to touch 
> it?  The use case isn't clear to me.

A long running transaction on an insert-only table will never not see a
row. It *will* prevent row removal by VACUUM on other tables that are
being updated. This proposal would remove the linkage between the two
situations, allowing benefit for general users.

Tom's objection has made me think some more, however. The user wouldn't
be aware that they had used (the proposal) inappropriately, so it's
really too unsafe for general use. That situation could change if we
were able to mark tables as READ ONLY, INSERT ONLY or INSERT UPDATE
DELETE ALLOWED (default). That's something I'd like to do in the future,
for a variety of optimisations but its not on my list yet.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Read Uncommitted

From

Simon Riggs

Date:

27 May 2008, 05:44:10

On Mon, 2008-05-26 at 20:01 +0900, ITAGAKI Takahiro wrote:
> Simon Riggs <simon@2ndquadrant.com> wrote:
> 
> > The use of this is clear though: allowing long running transactions
> > against unchanging data to not interfere with other activities. It will
> > also have importance in a Hot Standby mode.
> 
> I have an use of the dirty read -- pg_start_backup().
> In 8.3, pg_start_backup takes long time, that is typically
> { checkpoint_timeout * checkpoint_completion_target }.
> 
> If we have some updating transaction during pg_start_backup,
> updated tables cannot be vacuumed well. READ UNCOMMITTED mode
> could help us in such a situation.
> 
>   BEGIN;
>   SET TRANSACTION READ UNCOMMITTED;
>   SELECT pg_start_backup(timeofday());
>   END;
> 

OK, point noted, thanks. Not sure what is possible there yet, but I will
think some more.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Read Uncommitted

From

Simon Riggs

Date:

27 May 2008, 14:51:14

As an addendum for the archives only, I would add:

On Sun, 2008-05-25 at 20:10 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:

> > The implementation is trivial, namely that the calculation of global
> > xmin would ignore Read Uncommitted transactions.
> 
> This proposed implementation seems to have very little to do with
> what most people would think Read Uncommitted does.  In particular it
> does not agree with the SQL spec, which says that Read Uncommitted
> permits transactions to see the results of as-yet-uncommitted
> transactions.  As an example, suppose that old transaction T1 reads
> a table that always contains exactly one live row, but T2 has modified
> that row since T1 started.  Per spec what Read Uncommitted means is that
> T1 may see the modified version before T2 commits.  This would not
> implement that behavior, though.  What this implementation would mean is
> that after T2 commits, we might vacuum away the old version, causing T1
> to see *nothing* in the table:
> 
>     * T1 scans the new row version, doesn't return it because T2 not
>       yet committed;
>     * T2 commits;
>     * VACUUM removes old row version;
>     * T1 never finds old row version to return.
> 
> Even if we thought that supporting Read Uncommitted would be a good
> idea, this would be an unacceptably unstable implementation of it.

We might think we can widen the snapshot so we see the new row versions
after update. That won't work either, because we don't check to see
whether multiple row versions are part of the same row. We always assume
that each row can have at most one visible row version for each
snapshot. So trying to see the new row versions instead would likely
result in viewing too many rows in most cases.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support