Thread: Commit visibility guarantees
Hello Everyone, I'm looking at an easy real-time application using PostgreSQL. Looking at the pg docs, I see lots of discussion about cases where MVCC may reflect different versions of the data to different connections. For example: http://www.postgresql.org/docs/8.1/static/mvcc.html "while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data" "Read Committed is the default isolation level ... a SELECT query sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. (However, the SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed.) In effect, a SELECT query sees a snapshot of the database as of the instant that that query begins to run." However, I don't actually see any statements giving guarantees about when the updated data _does_ become visible. The central question: So if I successfully commit an update transaction on one connection, then instantaneously issue a select on another previously-opened connection, under what circumstances am I guaranteed that the select will see the effects of the update? The db is in the default read committed mode. The select is being done on another long-running connection which has never done any updates, just selects within its implicit transaction. Maybe this is the statement I'm looking for: "in Read Committed mode each new command starts with a new snapshot that includes all transactions committed up to that instant, subsequent commands in the same transaction will see the effects of the committed concurrent transaction". But this statement is just an aside when making a different point, and I see other statements like "So the whole concept of "now" is somewhat ill-defined anyway." "This is not normally a big problem if the client applications are isolated from each other, but if the clients can communicate via channels outside the database then serious confusion may ensue." And communication via outside channels is exactly what the app is doing. Thanks, - Marsh
On Mon, May 18, 2009 at 04:38:36PM -0500, Marsh Ray wrote: > The central question: So if I successfully commit an update > transaction on one connection, then instantaneously issue a select on > another previously-opened connection, under what circumstances am I > guaranteed that the select will see the effects of the update? > > Maybe this is the statement I'm looking for: "in Read Committed mode > each new command starts with a new snapshot that includes all > transactions committed up to that instant, subsequent commands in the > same transaction will see the effects of the committed concurrent > transaction". For read committed that sounds like what I'd expect, row level locking buys you a bit more in implementation terms but just complicates the formal side as queries don't attempt to do any locking by default. I'm not aware of any formally defined semantics that PG tries to be a faithful implementation of---i.e. there may be bugs, but when they're fixed where are we aiming for. If somebody could come up with a nice set of inductive definitions I'd be interested in seeing what they implied as well. > But this statement is just an aside when making a > different point, and I see other statements like "So the whole concept > of "now" is somewhat ill-defined anyway." Not sure if it's quite as bad as that; transactional semantics go a long way to making large classes of problems simple. The interactions between two independent systems that both have transactional semantics get very awkward. Unbounded rollback being a term I remember, but can't remember when/why it applies. > "This is not normally a big problem if the client applications are > isolated from each other, but if the clients can communicate via > channels outside the database then serious confusion may ensue." And > communication via outside channels is exactly what the app is doing. Yes, things get awkward when you start doing this. Is there anyway to keep things "inside" the database, using NOTIFY or somesuch? Could you define what you mean by real-time, do you mean the strict academic meaning or just that you want "interactive" things happening and it would be annoying if they were delayed by a few tens of milliseconds (as opposed to someone dieing because something got delayed by a millisecond). -- Sam http://samason.me.uk/
On Mon, May 18, 2009 at 4:53 PM, Ben Chobot <bench@silentmedia.com> wrote: > On Mon, 18 May 2009, Marsh Ray wrote: >> Hello Everyone, >> I'm looking at an easy real-time application using PostgreSQL. > As I understand real-time applications, PostgreSQL is inherintly unsuited > for the task. There is absolutely no timing constraints on your queries, and > "large" sets of working data can sometimes spill to disk, which incurs the > obvious - but not always consistent - performance hit. Definitely true, but I don't think it's really the issue here. The app does have a hard real-time deadline, but the deadline is generally quite "easy" for such a system. Like any web app, there will be a hard deadline to meet before the browser times out, though the timeout value 60 or more seconds. There is a near-instantaneous context switch from the 'update' process to the 'select' process, and I am wondering if some behaviors change with that tight scheduling. - Marsh
On Mon, May 18, 2009 at 05:18:06PM -0500, Marsh Ray wrote: > On Mon, May 18, 2009 at 4:53 PM, Ben Chobot <bench@silentmedia.com> wrote: > > On Mon, 18 May 2009, Marsh Ray wrote: > >> Hello Everyone, > >> I'm looking at an easy real-time application using PostgreSQL. > > As I understand real-time applications, PostgreSQL is inherintly unsuited > > for the task. There is absolutely no timing constraints on your queries, and > > "large" sets of working data can sometimes spill to disk, which incurs the > > obvious - but not always consistent - performance hit. > > Definitely true, but I don't think it's really the issue here. The app > does have a hard real-time deadline, but the deadline is generally > quite "easy" for such a system. Like any web app, there will be a hard > deadline to meet before the browser times out, though the timeout > value 60 or more seconds. Even then it's not useful to class it as real-time; nothing "bad" happens if you don't get a response before timeout the user just gets an error message. Real-time applies when if you don't get a response the plane crashes or a heart stops because the pacemaker hasn't put out a signal in time. -- Sam http://samason.me.uk/
On Mon, May 18, 2009 at 5:24 PM, Sam Mason <sam@samason.me.uk> wrote: > Even then it's not useful to class it as real-time; nothing "bad" > happens if you don't get a response before timeout the user just gets an > error message. Real-time applies when if you don't get a response the > plane crashes or a heart stops because the pacemaker hasn't put out a > signal in time. Think of a network router: it is neither a pacemaker nor an airplane but is definitely a real-time system. If it cannot complete its processing on time, not only is it useless, it may actually bring down significant numbers of other systems. - Marsh
On Mon, May 18, 2009 at 5:14 PM, Sam Mason <sam@samason.me.uk> wrote: > On Mon, May 18, 2009 at 04:38:36PM -0500, Marsh Ray wrote: >> The central question: So if I successfully commit an update >> transaction on one connection, then instantaneously issue a select on >> another previously-opened connection, under what circumstances am I >> guaranteed that the select will see the effects of the update? >> >> Maybe this is the statement I'm looking for: "in Read Committed mode >> each new command starts with a new snapshot that includes all >> transactions committed up to that instant, subsequent commands in the >> same transaction will see the effects of the committed concurrent >> transaction". > > For read committed that sounds like what I'd expect, row level locking > buys you a bit more in implementation terms but just complicates the > formal side as queries don't attempt to do any locking by default. I'm > not aware of any formally defined semantics that PG tries to be a > faithful implementation of---i.e. there may be bugs, but when they're > fixed where are we aiming for. If somebody could come up with a nice > set of inductive definitions I'd be interested in seeing what they > implied as well. Do you know if this kind of concurrency test is included in pg's regression tests? >> But this statement is just an aside when making a >> different point, and I see other statements like "So the whole concept >> of "now" is somewhat ill-defined anyway." > > Not sure if it's quite as bad as that; transactional semantics go a > long way to making large classes of problems simple. The interactions > between two independent systems that both have transactional semantics > get very awkward. Unbounded rollback being a term I remember, but can't > remember when/why it applies. At some point, a committed update has got to show up in other connections' selects or users would obviously complain. However, is the lag guaranteed to be zero? >> "This is not normally a big problem if the client applications are >> isolated from each other, but if the clients can communicate via >> channels outside the database then serious confusion may ensue." And >> communication via outside channels is exactly what the app is doing. > > Yes, things get awkward when you start doing this. Yep, awkward city. I'm just trying to figure out what guarantees I do get out of pg in order to analyze it going forward. > Is there anyway to keep things "inside" the database, using NOTIFY or > somesuch? Unfortunately no, the db really is between external actors that also have their own kernel-object-fast signaling mechanism. The data currently being passed via the db could be duplicated over this side channel, but it would be far more interesting to learn if a basic assumption was wrong. > Could you define what you mean by real-time, do you mean > the strict academic meaning or just that you want "interactive" things > happening and it would be annoying if they were delayed by a few tens of > milliseconds (as opposed to someone dieing because something got delayed > by a millisecond). It is real-time in the academic definition, but most deadlines are measured in seconds. Not too different than a web app really. - Marsh
Marsh Ray <marsh5143@gmail.com> writes: > The central question: So if I successfully commit an update > transaction on one connection, then instantaneously issue a select on > another previously-opened connection, under what circumstances am I > guaranteed that the select will see the effects of the update? If the select is using a snapshot taken later than the commit, it will see the effects of the update. The point that the remark about "ill-defined" behavior is trying to make is that the application could try to compare the results of queries done using different snapshots, and those results might be different. regards, tom lane
On Mon, May 18, 2009 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marsh Ray <marsh5143@gmail.com> writes: >> The central question: So if I successfully commit an update >> transaction on one connection, then instantaneously issue a select on >> another previously-opened connection, under what circumstances am I >> guaranteed that the select will see the effects of the update? > > If the select is using a snapshot taken later than the commit, it will > see the effects of the update. Great! Just the kind of definitive answer I was looking for. Now I just need to find a comprehensive list of all the things that could cause an older snapshot to be retained, and ensure that none of them could possibly be occurring on this connection. This is a connection kept open for extended periods, and used mutithreadedly for selects only. Do you suppose a long-running concurrent select on another thread could be holding back the snapshot for the whole connection? Hmm... - Marsh
Marsh Ray escribió: > On Mon, May 18, 2009 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Marsh Ray <marsh5143@gmail.com> writes: > >> The central question: So if I successfully commit an update > >> transaction on one connection, then instantaneously issue a select on > >> another previously-opened connection, under what circumstances am I > >> guaranteed that the select will see the effects of the update? > > > > If the select is using a snapshot taken later than the commit, it will > > see the effects of the update. > > Great! Just the kind of definitive answer I was looking for. > > Now I just need to find a comprehensive list of all the things that > could cause an older snapshot to be retained, and ensure that none of > them could possibly be occurring on this connection. On a serializable transaction all queries will use the same snapshot taken when the first query is executed. Otherwise (read committed), a new query always gets a fresh one. (Old snapshots are also used for stuff like cursors that remain open, but that's not the case here.) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Marsh Ray wrote: >>> The central question: So if I successfully commit an update >>> transaction on one connection, then instantaneously issue a select on >>> another previously-opened connection, under what circumstances am I >>> guaranteed that the select will see the effects of the update? >> >> If the select is using a snapshot taken later than the commit, it will >> see the effects of the update. > > Great! Just the kind of definitive answer I was looking for. > > Now I just need to find a comprehensive list of all the things that > could cause an older snapshot to be retained, and ensure that none of > them could possibly be occurring on this connection. > > This is a connection kept open for extended periods, and used > mutithreadedly for selects only. Do you suppose a long-running > concurrent select on another thread could be holding back the snapshot > for the whole connection? Hmm... You cannot run two selects in one connection at the same time, see http://www.postgresql.org/docs/current/static/libpq-threading.html One connection belongs to one backend process that can do one thing at a time. If you want concurrency, you must use more than one connection. If the isolation mode is "read committed", then the snapshot of the query will be taken at query start time. So there is no need to worry. Yours, Laurenz Albe
On Mon, 2009-05-18 at 18:44 -0500, Marsh Ray wrote: > This is a connection kept open for extended periods, and used > mutithreadedly for selects only. I have some code that will allow you to reuse snapshots from one session to another, which could help your multiple threads see a consistent view of the database. It isn't updated as yet for 8.2+. Please contact me off-list if you're interested. I hope it will be part of PostgreSQL one day. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support