Thread: Read Uncommitted
Currently, PostgreSQL implements the SQL Standard transaction isolation level of Read Uncommitted as being equivalent to Read Committed. That is a valid decision, though I wish to propose an alternative. At the moment, a long running SQL Statement can prevent xmin moving forward, which can result in VACUUM and HOT not being able to remove row versions effectively. My proposal is that a Read Uncommitted transaction would not prevent row removal, which then offers no guarantee that the "correct" answer would be returned. Which is *exactly* what that transaction isolation level was designed for. In many cases, an application designer may be able to tell that a particular query will always return the correct answer. For example, we may query against data which is known not to change, even though other data in the same database cluster may be subject to frequent change. e.g. queries against large insert-only tables. By allowing the user to select a non-default isolation level we would allow long running queries to have significantly less disruption on other operations. Read Uncommitted would imply Read Only status, so various data change operations would be rejected. No errors would result, just that some data would be missing from the answer, iff data had been removed during execution. It might be possible to get a table does not exist error because non-transactional pg_class updates take place; transactional pg_class updates would be prevented by relation-level locking. I would *not* allow this as an option or default for pg_dump, since this would almost certainly result in error. No footguns allowed in so critical a utility. The implementation is trivial, namely that the calculation of global xmin would ignore Read Uncommitted transactions. Read Uncommitted would not be the default, so novice users would be in no danger. Bear in mind that Read Uncommitted is commonly used in other database systems, so this feature is already understood and even expected by many database experts from other backgrounds. Many PostgreSQL experts would also be able to take advantage of this feature. Since its part of the SQL Standard this "dirty read" isn't likely to take anybody by surprise that reads any of (PostgreSQL manual, SQL Standard, other RDBMS manual). Comments? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndquadrant.com> writes: > At the moment, a long running SQL Statement can prevent xmin moving > forward, which can result in VACUUM and HOT not being able to remove row > versions effectively. My proposal is that a Read Uncommitted transaction > would not prevent row removal, which then offers no guarantee that the > "correct" answer would be returned. Which is *exactly* what that > transaction isolation level was designed for. This seems like a remarkably bad idea. It has no use that I can see for anything except shooting oneself in the foot. > The implementation is trivial, namely that the calculation of global > xmin would ignore Read Uncommitted transactions. This proposed implementation seems to have very little to do with what most people would think Read Uncommitted does. In particular it does not agree with the SQL spec, which says that Read Uncommitted permits transactions to see the results of as-yet-uncommitted transactions. As an example, suppose that old transaction T1 reads a table that always contains exactly one live row, but T2 has modified that row since T1 started. Per spec what Read Uncommitted means is that T1 may see the modified version before T2 commits. This would not implement that behavior, though. What this implementation would mean is that after T2 commits, we might vacuum away the old version, causing T1 to see *nothing* in the table: * T1 scans the new row version, doesn't return it because T2 not yet committed;* T2 commits;* VACUUM removes old row version;*T1 never finds old row version to return. Even if we thought that supporting Read Uncommitted would be a good idea, this would be an unacceptably unstable implementation of it. regards, tom lane
On Sun, 2008-05-25 at 20:10 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > At the moment, a long running SQL Statement can prevent xmin moving > > forward, which can result in VACUUM and HOT not being able to remove row > > versions effectively. My proposal is that a Read Uncommitted transaction > > would not prevent row removal, which then offers no guarantee that the > > "correct" answer would be returned. Which is *exactly* what that > > transaction isolation level was designed for. > > This seems like a remarkably bad idea. It has no use that I can see for > anything except shooting oneself in the foot. > > > The implementation is trivial, namely that the calculation of global > > xmin would ignore Read Uncommitted transactions. > > This proposed implementation seems to have very little to do with > what most people would think Read Uncommitted does. In particular it > does not agree with the SQL spec, > ...(snip)... > > Even if we thought that supporting Read Uncommitted would be a good > idea, this would be an unacceptably unstable implementation of it. Well, the thought of an unstable "dirty read" seems strange, but I get your point that it isn't the SQL Standard's Read Uncommitted mode. The use of this is clear though: allowing long running transactions against unchanging data to not interfere with other activities. It will also have importance in a Hot Standby mode. So, even if this isn't Read Uncommitted exactly, it is a transaction isolation mode that has utility for us and I would like to support it, well documented and with appropriate cautions. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndquadrant.com> wrote: > The use of this is clear though: allowing long running transactions > against unchanging data to not interfere with other activities. It will > also have importance in a Hot Standby mode. I have an use of the dirty read -- pg_start_backup(). In 8.3, pg_start_backup takes long time, that is typically { checkpoint_timeout * checkpoint_completion_target }. If we have some updating transaction during pg_start_backup, updated tables cannot be vacuumed well. READ UNCOMMITTED mode could help us in such a situation. BEGIN; SET TRANSACTION READ UNCOMMITTED; SELECT pg_start_backup(timeofday()); END; Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Hi, Because Read Uncommitted shows all the "yet not committed" version, it seems to me that there's no problem to show the new version of tuples to Read Uncommitted transacations as follows: SQLs (old version) (new version) -------------------+----------------------------+------------------------------ INSERT none new tuple UPDATE old tupe new tuple DELETE old tuple none If update transaction fails, then new version is invalid and the old version should be shown to Read Uncomitted transcactions. Therefore, there're no Read Committed or Serializable transactions which refers to old version tuples, it could be thought that old version can be vacuumed or swept. Obviously, to "read committed" transactions, old version has to be shown. Any advices? 2008/5/26 ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>: > > Simon Riggs <simon@2ndquadrant.com> wrote: > >> The use of this is clear though: allowing long running transactions >> against unchanging data to not interfere with other activities. It will >> also have importance in a Hot Standby mode. > > I have an use of the dirty read -- pg_start_backup(). > In 8.3, pg_start_backup takes long time, that is typically > { checkpoint_timeout * checkpoint_completion_target }. > > If we have some updating transaction during pg_start_backup, > updated tables cannot be vacuumed well. READ UNCOMMITTED mode > could help us in such a situation. > > BEGIN; > SET TRANSACTION READ UNCOMMITTED; > SELECT pg_start_backup(timeofday()); > END; > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- ------ Koichi Suzuki
"Koichi Suzuki" <koichi.szk@gmail.com> wrote: > Because Read Uncommitted shows all the "yet not committed" version, it > seems to me that there's no problem to show the new version of tuples > to Read Uncommitted transacations as follows: Another transaction could update the retuned tuple, which is the newest at that time, and the reading transaction could read updated version of the tuple. We might return different version of identical tuples, although there is *no timing* multiple tuples exist. This is an another side of the problem shown by Tom, where we have no tuples if we hide new tuples and then old tuples are removed. For example, "SELCT count(*) FROM tbl" by READ UNCOMMITTED transactions should always return correct values even if we only runs UPDATEs at the same time. I guess defining READ UNCOMMITTED is quite difficult -- it should be efficient, but should not be chaotic. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
I agree that such instant freeze of the whole database status is not feasible. We may be able to find compromise what to show and what to hide. Anyway, read uncommitted cannot guarantee repeatable read and can provide only "approximate" result. Even though we are successful to show instant freeze, the same SQL may provide different result at the next instant. 2008/5/26 ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>: > > "Koichi Suzuki" <koichi.szk@gmail.com> wrote: > >> Because Read Uncommitted shows all the "yet not committed" version, it >> seems to me that there's no problem to show the new version of tuples >> to Read Uncommitted transacations as follows: > > Another transaction could update the retuned tuple, which is the newest > at that time, and the reading transaction could read updated version of > the tuple. We might return different version of identical tuples, > although there is *no timing* multiple tuples exist. > > This is an another side of the problem shown by Tom, where we have > no tuples if we hide new tuples and then old tuples are removed. > > For example, "SELCT count(*) FROM tbl" by READ UNCOMMITTED transactions > should always return correct values even if we only runs UPDATEs > at the same time. > > I guess defining READ UNCOMMITTED is quite difficult > -- it should be efficient, but should not be chaotic. > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center > > > -- ------ Koichi Suzuki
Am Montag, 26. Mai 2008 schrieb Simon Riggs: > At the moment, a long running SQL Statement can prevent xmin moving > forward, which can result in VACUUM and HOT not being able to remove row > versions effectively. My proposal is that a Read Uncommitted transaction > would not prevent row removal, which then offers no guarantee that the > "correct" answer would be returned. Which is *exactly* what that > transaction isolation level was designed for. > > In many cases, an application designer may be able to tell that a > particular query will always return the correct answer. For example, we > may query against data which is known not to change, even though other > data in the same database cluster may be subject to frequent change. > e.g. queries against large insert-only tables. If the data in a table never changes, why would VACUUM or HOT need to touch it? The use case isn't clear to me.
On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote: > Am Montag, 26. Mai 2008 schrieb Simon Riggs: > > At the moment, a long running SQL Statement can prevent xmin moving > > forward, which can result in VACUUM and HOT not being able to remove row > > versions effectively. My proposal is that a Read Uncommitted transaction > > would not prevent row removal, which then offers no guarantee that the > > "correct" answer would be returned. Which is *exactly* what that > > transaction isolation level was designed for. > > > > In many cases, an application designer may be able to tell that a > > particular query will always return the correct answer. For example, we > > may query against data which is known not to change, even though other > > data in the same database cluster may be subject to frequent change. > > e.g. queries against large insert-only tables. > > If the data in a table never changes, why would VACUUM or HOT need to touch > it? The use case isn't clear to me. I guess the use-case is about a long read-write transaction doing read-only access to an update-only table and thus blocking vacuum on other tables. -------------- Hannu
Hannu Krosing <hannu@krosing.net> writes: > On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote: >> If the data in a table never changes, why would VACUUM or HOT need to touch >> it? The use case isn't clear to me. > I guess the use-case is about a long read-write transaction doing > read-only access to an update-only table and thus blocking vacuum on > other tables. ... in which case the proposed kluge would result in unstable, unpredictable answers, so there is still no plausible use-case. regards, tom lane
On Mon, 2008-05-26 at 13:25 -0400, Tom Lane wrote: > Hannu Krosing <hannu@krosing.net> writes: > > On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote: > >> If the data in a table never changes, why would VACUUM or HOT need to touch > >> it? The use case isn't clear to me. > > > I guess the use-case is about a long read-write transaction doing > > read-only access to an update-only table and thus blocking vacuum on > > other tables. > > ... in which case the proposed kluge would result in unstable, > unpredictable answers, so there is still no plausible use-case. maybe it was meant as a super-power-user tool (and a big footgun) . btw, when is a transaction id currently assigned to a transaction - when INSERT/UPDATE/DELETE statement is first seen, or when data is actually modified ? that is when doing INSERT INTO logtable SELECT current_timestamp, count(*) FROM really_huge_table; will there be a transaction id for just the tiny moment the returned row is inserted or for the whole count(*) time ? ---------------- Hannu
Hannu Krosing <hannu@krosing.net> writes: > btw, when is a transaction id currently assigned to a transaction - when > INSERT/UPDATE/DELETE statement is first seen, or when data is actually > modified ? AFAIR it doesn't happen until the XID is needed to stamp a tuple with. (There might be some other places that demand the current XID, but that's the normal case.) regards, tom lane
On Mon, 2008-05-26 at 13:25 -0400, Tom Lane wrote: > Hannu Krosing <hannu@krosing.net> writes: > > On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote: > >> If the data in a table never changes, why would VACUUM or HOT need to touch > >> it? The use case isn't clear to me. > > > I guess the use-case is about a long read-write transaction doing > > read-only access to an update-only table and thus blocking vacuum on > > other tables. > > ... in which case the proposed kluge would result in unstable, > unpredictable answers, so there is still no plausible use-case. Separate databases? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Mon, 2008-05-26 at 13:25 -0400, Tom Lane wrote: >> Hannu Krosing <hannu@krosing.net> writes: >>> On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote: >>>> If the data in a table never changes, why would VACUUM or HOT need to touch >>>> it? The use case isn't clear to me. >>> I guess the use-case is about a long read-write transaction doing >>> read-only access to an update-only table and thus blocking vacuum on >>> other tables. >> ... in which case the proposed kluge would result in unstable, >> unpredictable answers, so there is still no plausible use-case. > > Separate databases? OldestXmin calculation only includes transactions in the same database, except when vacuuming shared relations. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2008-05-26 at 16:55 +0200, Peter Eisentraut wrote: > Am Montag, 26. Mai 2008 schrieb Simon Riggs: > > At the moment, a long running SQL Statement can prevent xmin moving > > forward, which can result in VACUUM and HOT not being able to remove row > > versions effectively. My proposal is that a Read Uncommitted transaction > > would not prevent row removal, which then offers no guarantee that the > > "correct" answer would be returned. Which is *exactly* what that > > transaction isolation level was designed for. > > > > In many cases, an application designer may be able to tell that a > > particular query will always return the correct answer. For example, we > > may query against data which is known not to change, even though other > > data in the same database cluster may be subject to frequent change. > > e.g. queries against large insert-only tables. > > If the data in a table never changes, why would VACUUM or HOT need to touch > it? The use case isn't clear to me. A long running transaction on an insert-only table will never not see a row. It *will* prevent row removal by VACUUM on other tables that are being updated. This proposal would remove the linkage between the two situations, allowing benefit for general users. Tom's objection has made me think some more, however. The user wouldn't be aware that they had used (the proposal) inappropriately, so it's really too unsafe for general use. That situation could change if we were able to mark tables as READ ONLY, INSERT ONLY or INSERT UPDATE DELETE ALLOWED (default). That's something I'd like to do in the future, for a variety of optimisations but its not on my list yet. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, 2008-05-26 at 20:01 +0900, ITAGAKI Takahiro wrote: > Simon Riggs <simon@2ndquadrant.com> wrote: > > > The use of this is clear though: allowing long running transactions > > against unchanging data to not interfere with other activities. It will > > also have importance in a Hot Standby mode. > > I have an use of the dirty read -- pg_start_backup(). > In 8.3, pg_start_backup takes long time, that is typically > { checkpoint_timeout * checkpoint_completion_target }. > > If we have some updating transaction during pg_start_backup, > updated tables cannot be vacuumed well. READ UNCOMMITTED mode > could help us in such a situation. > > BEGIN; > SET TRANSACTION READ UNCOMMITTED; > SELECT pg_start_backup(timeofday()); > END; > OK, point noted, thanks. Not sure what is possible there yet, but I will think some more. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
As an addendum for the archives only, I would add: On Sun, 2008-05-25 at 20:10 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > The implementation is trivial, namely that the calculation of global > > xmin would ignore Read Uncommitted transactions. > > This proposed implementation seems to have very little to do with > what most people would think Read Uncommitted does. In particular it > does not agree with the SQL spec, which says that Read Uncommitted > permits transactions to see the results of as-yet-uncommitted > transactions. As an example, suppose that old transaction T1 reads > a table that always contains exactly one live row, but T2 has modified > that row since T1 started. Per spec what Read Uncommitted means is that > T1 may see the modified version before T2 commits. This would not > implement that behavior, though. What this implementation would mean is > that after T2 commits, we might vacuum away the old version, causing T1 > to see *nothing* in the table: > > * T1 scans the new row version, doesn't return it because T2 not > yet committed; > * T2 commits; > * VACUUM removes old row version; > * T1 never finds old row version to return. > > Even if we thought that supporting Read Uncommitted would be a good > idea, this would be an unacceptably unstable implementation of it. We might think we can widen the snapshot so we see the new row versions after update. That won't work either, because we don't check to see whether multiple row versions are part of the same row. We always assume that each row can have at most one visible row version for each snapshot. So trying to see the new row versions instead would likely result in viewing too many rows in most cases. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support