Thread: Commit visibility guarantees

Commit visibility guarantees

From
Marsh Ray
Date:
Hello Everyone,

I'm looking at an easy real-time application using PostgreSQL.

Looking at the pg docs, I see lots of discussion about cases where
MVCC may reflect different versions of the data to different
connections. For example:

http://www.postgresql.org/docs/8.1/static/mvcc.html
"while querying a database each transaction sees a snapshot of data (a
database version) as it was some time ago, regardless of the current
state of the underlying data"

"Read Committed is the default isolation level ... a SELECT query sees
only data committed before the query began; it never sees either
uncommitted data or changes committed during query execution by
concurrent transactions. (However, the SELECT does see the effects of
previous updates executed within its own transaction, even though they
are not yet committed.) In effect, a SELECT query sees a snapshot of
the database as of the instant that that query begins to run."

However, I don't actually see any statements giving guarantees about
when the updated data _does_ become visible.

The central question: So if I successfully commit an update
transaction on one connection, then instantaneously issue a select on
another previously-opened connection, under what circumstances am I
guaranteed that the select will see the effects of the update?

The db is in the default read committed mode. The select is being done
on another long-running connection which has never done any updates,
just selects within its implicit transaction.

Maybe this is the statement I'm looking for: "in Read Committed mode
each new command starts with a new snapshot that includes all
transactions committed up to that instant, subsequent commands in the
same transaction will see the effects of the committed concurrent
transaction". But this statement is just an aside when making a
different point, and I see other statements like "So the whole concept
of "now" is somewhat ill-defined anyway."

"This is not normally a big problem if the client applications are
isolated from each other, but if the clients can communicate via
channels outside the database then serious confusion may ensue." And
communication via outside channels is exactly what the app is doing.

Thanks,

- Marsh

Re: Commit visibility guarantees

From
Sam Mason
Date:
On Mon, May 18, 2009 at 04:38:36PM -0500, Marsh Ray wrote:
> The central question: So if I successfully commit an update
> transaction on one connection, then instantaneously issue a select on
> another previously-opened connection, under what circumstances am I
> guaranteed that the select will see the effects of the update?
>
> Maybe this is the statement I'm looking for: "in Read Committed mode
> each new command starts with a new snapshot that includes all
> transactions committed up to that instant, subsequent commands in the
> same transaction will see the effects of the committed concurrent
> transaction".

For read committed that sounds like what I'd expect, row level locking
buys you a bit more in implementation terms but just complicates the
formal side as queries don't attempt to do any locking by default. I'm
not aware of any formally defined semantics that PG tries to be a
faithful implementation of---i.e. there may be bugs, but when they're
fixed where are we aiming for.  If somebody could come up with a nice
set of inductive definitions I'd be interested in seeing what they
implied as well.

> But this statement is just an aside when making a
> different point, and I see other statements like "So the whole concept
> of "now" is somewhat ill-defined anyway."

Not sure if it's quite as bad as that; transactional semantics go a
long way to making large classes of problems simple.  The interactions
between two independent systems that both have transactional semantics
get very awkward.  Unbounded rollback being a term I remember, but can't
remember when/why it applies.

> "This is not normally a big problem if the client applications are
> isolated from each other, but if the clients can communicate via
> channels outside the database then serious confusion may ensue." And
> communication via outside channels is exactly what the app is doing.

Yes, things get awkward when you start doing this.

Is there anyway to keep things "inside" the database, using NOTIFY or
somesuch?  Could you define what you mean by real-time, do you mean
the strict academic meaning or just that you want "interactive" things
happening and it would be annoying if they were delayed by a few tens of
milliseconds (as opposed to someone dieing because something got delayed
by a millisecond).

--
  Sam  http://samason.me.uk/

Re: Commit visibility guarantees

From
Marsh Ray
Date:
On Mon, May 18, 2009 at 4:53 PM, Ben Chobot <bench@silentmedia.com> wrote:
> On Mon, 18 May 2009, Marsh Ray wrote:
>> Hello Everyone,
>> I'm looking at an easy real-time application using PostgreSQL.
> As I understand real-time applications, PostgreSQL is inherintly unsuited
> for the task. There is absolutely no timing constraints on your queries, and
> "large" sets of working data can sometimes spill to disk, which incurs the
> obvious - but not always consistent - performance hit.

Definitely true, but I don't think it's really the issue here. The app
does have a hard real-time deadline, but the deadline is generally
quite "easy" for such a system. Like any web app, there will be a hard
deadline to meet before the browser times out, though the timeout
value 60 or more seconds.

There is a near-instantaneous context switch from the 'update' process
to the 'select' process, and I am wondering if some behaviors change
with that tight scheduling.

- Marsh

Re: Commit visibility guarantees

From
Sam Mason
Date:
On Mon, May 18, 2009 at 05:18:06PM -0500, Marsh Ray wrote:
> On Mon, May 18, 2009 at 4:53 PM, Ben Chobot <bench@silentmedia.com> wrote:
> > On Mon, 18 May 2009, Marsh Ray wrote:
> >> Hello Everyone,
> >> I'm looking at an easy real-time application using PostgreSQL.
> > As I understand real-time applications, PostgreSQL is inherintly unsuited
> > for the task. There is absolutely no timing constraints on your queries, and
> > "large" sets of working data can sometimes spill to disk, which incurs the
> > obvious - but not always consistent - performance hit.
>
> Definitely true, but I don't think it's really the issue here. The app
> does have a hard real-time deadline, but the deadline is generally
> quite "easy" for such a system. Like any web app, there will be a hard
> deadline to meet before the browser times out, though the timeout
> value 60 or more seconds.

Even then it's not useful to class it as real-time; nothing "bad"
happens if you don't get a response before timeout the user just gets an
error message.  Real-time applies when if you don't get a response the
plane crashes or a heart stops because the pacemaker hasn't put out a
signal in time.

--
  Sam  http://samason.me.uk/

Re: Commit visibility guarantees

From
Marsh Ray
Date:
On Mon, May 18, 2009 at 5:24 PM, Sam Mason <sam@samason.me.uk> wrote:
> Even then it's not useful to class it as real-time; nothing "bad"
> happens if you don't get a response before timeout the user just gets an
> error message.  Real-time applies when if you don't get a response the
> plane crashes or a heart stops because the pacemaker hasn't put out a
> signal in time.

Think of a network router: it is neither a pacemaker nor an airplane
but is definitely a real-time system. If it cannot complete its
processing on time, not only is it useless, it may actually bring down
significant numbers of other systems.

- Marsh

Re: Commit visibility guarantees

From
Marsh Ray
Date:
On Mon, May 18, 2009 at 5:14 PM, Sam Mason <sam@samason.me.uk> wrote:
> On Mon, May 18, 2009 at 04:38:36PM -0500, Marsh Ray wrote:
>> The central question: So if I successfully commit an update
>> transaction on one connection, then instantaneously issue a select on
>> another previously-opened connection, under what circumstances am I
>> guaranteed that the select will see the effects of the update?
>>
>> Maybe this is the statement I'm looking for: "in Read Committed mode
>> each new command starts with a new snapshot that includes all
>> transactions committed up to that instant, subsequent commands in the
>> same transaction will see the effects of the committed concurrent
>> transaction".
>
> For read committed that sounds like what I'd expect, row level locking
> buys you a bit more in implementation terms but just complicates the
> formal side as queries don't attempt to do any locking by default. I'm
> not aware of any formally defined semantics that PG tries to be a
> faithful implementation of---i.e. there may be bugs, but when they're
> fixed where are we aiming for.  If somebody could come up with a nice
> set of inductive definitions I'd be interested in seeing what they
> implied as well.

Do you know if this kind of concurrency test is included in pg's
regression tests?

>> But this statement is just an aside when making a
>> different point, and I see other statements like "So the whole concept
>> of "now" is somewhat ill-defined anyway."
>
> Not sure if it's quite as bad as that; transactional semantics go a
> long way to making large classes of problems simple.  The interactions
> between two independent systems that both have transactional semantics
> get very awkward.  Unbounded rollback being a term I remember, but can't
> remember when/why it applies.

At some point, a committed update has got to show up in other
connections' selects or users would obviously complain. However, is
the lag guaranteed to be zero?

>> "This is not normally a big problem if the client applications are
>> isolated from each other, but if the clients can communicate via
>> channels outside the database then serious confusion may ensue." And
>> communication via outside channels is exactly what the app is doing.
>
> Yes, things get awkward when you start doing this.

Yep, awkward city. I'm just trying to figure out what guarantees I do
get out of pg in order to analyze it going forward.

> Is there anyway to keep things "inside" the database, using NOTIFY or
> somesuch?

Unfortunately no, the db really is between external actors that also
have their own kernel-object-fast signaling mechanism. The data
currently being passed via the db could be duplicated over this side
channel, but it would be far more interesting to learn if a basic
assumption was wrong.

> Could you define what you mean by real-time, do you mean
> the strict academic meaning or just that you want "interactive" things
> happening and it would be annoying if they were delayed by a few tens of
> milliseconds (as opposed to someone dieing because something got delayed
> by a millisecond).

It is real-time in the academic definition, but most deadlines are
measured in seconds. Not too different than a web app really.

- Marsh

Re: Commit visibility guarantees

From
Tom Lane
Date:
Marsh Ray <marsh5143@gmail.com> writes:
> The central question: So if I successfully commit an update
> transaction on one connection, then instantaneously issue a select on
> another previously-opened connection, under what circumstances am I
> guaranteed that the select will see the effects of the update?

If the select is using a snapshot taken later than the commit, it will
see the effects of the update.

The point that the remark about "ill-defined" behavior is trying to make
is that the application could try to compare the results of queries done
using different snapshots, and those results might be different.

            regards, tom lane

Re: Commit visibility guarantees

From
Marsh Ray
Date:
On Mon, May 18, 2009 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marsh Ray <marsh5143@gmail.com> writes:
>> The central question: So if I successfully commit an update
>> transaction on one connection, then instantaneously issue a select on
>> another previously-opened connection, under what circumstances am I
>> guaranteed that the select will see the effects of the update?
>
> If the select is using a snapshot taken later than the commit, it will
> see the effects of the update.

Great! Just the kind of definitive answer I was looking for.

Now I just need to find a comprehensive list of all the things that
could cause an older snapshot to be retained, and ensure that none of
them could possibly be occurring on this connection.

This is a connection kept open for extended periods, and used
mutithreadedly for selects only. Do you suppose a long-running
concurrent select on another thread could be holding back the snapshot
for the whole connection? Hmm...

- Marsh

Re: Commit visibility guarantees

From
Alvaro Herrera
Date:
Marsh Ray escribió:
> On Mon, May 18, 2009 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Marsh Ray <marsh5143@gmail.com> writes:
> >> The central question: So if I successfully commit an update
> >> transaction on one connection, then instantaneously issue a select on
> >> another previously-opened connection, under what circumstances am I
> >> guaranteed that the select will see the effects of the update?
> >
> > If the select is using a snapshot taken later than the commit, it will
> > see the effects of the update.
>
> Great! Just the kind of definitive answer I was looking for.
>
> Now I just need to find a comprehensive list of all the things that
> could cause an older snapshot to be retained, and ensure that none of
> them could possibly be occurring on this connection.

On a serializable transaction all queries will use the same snapshot
taken when the first query is executed.  Otherwise (read committed), a
new query always gets a fresh one.

(Old snapshots are also used for stuff like cursors that remain open,
but that's not the case here.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Commit visibility guarantees

From
"Albe Laurenz"
Date:
Marsh Ray wrote:
>>> The central question: So if I successfully commit an update
>>> transaction on one connection, then instantaneously issue a select on
>>> another previously-opened connection, under what circumstances am I
>>> guaranteed that the select will see the effects of the update?
>>
>> If the select is using a snapshot taken later than the commit, it will
>> see the effects of the update.
>
> Great! Just the kind of definitive answer I was looking for.
>
> Now I just need to find a comprehensive list of all the things that
> could cause an older snapshot to be retained, and ensure that none of
> them could possibly be occurring on this connection.
>
> This is a connection kept open for extended periods, and used
> mutithreadedly for selects only. Do you suppose a long-running
> concurrent select on another thread could be holding back the snapshot
> for the whole connection? Hmm...

You cannot run two selects in one connection at the same time,
see http://www.postgresql.org/docs/current/static/libpq-threading.html

One connection belongs to one backend process that can do one thing
at a time. If you want concurrency, you must use more than one
connection.

If the isolation mode is "read committed", then the snapshot of the
query will be taken at query start time.

So there is no need to worry.

Yours,
Laurenz Albe

Re: Commit visibility guarantees

From
Simon Riggs
Date:
On Mon, 2009-05-18 at 18:44 -0500, Marsh Ray wrote:

> This is a connection kept open for extended periods, and used
> mutithreadedly for selects only.

I have some code that will allow you to reuse snapshots from one session
to another, which could help your multiple threads see a consistent view
of the database. It isn't updated as yet for 8.2+.

Please contact me off-list if you're interested.

I hope it will be part of PostgreSQL one day.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support