MVCC and concurrent clients - Mailing list pgsql-sql

From Tom Lane
Subject MVCC and concurrent clients
Date
Msg-id 12132.933365440@sss.pgh.pa.us
Whole thread Raw
Responses Re: [SQL] MVCC and concurrent clients
List pgsql-sql
I have an application that involves multiple processes accessing and
updating a shared database.  I have been running it successfully on
6.4, but I am wondering whether I can use 6.5's MVCC features to
improve performance.  The user's guide doesn't tell me a whole lot
about the behavior of MVCC.  Comments on this problem would be
appreciated.

The processes must all be aware of the current state of every active
tuple in the database, so each one needs to find out when any other one
changes the database.  However, there are plenty of tuples in each table
that are *not* getting modified at any instant, so repeatedly SELECT'ing
the whole table and looking for differences is not what I want.

The solution I'm using now: each shared table has a column that is
an "update sequence number", plus an associated sequence object.
Whenever any client wants to insert or update rows, it does
something like this:BEGIN TRANSACTION;LOCK TABLE shared_table;INSERT and/or UPDATE, setting each new or modified
row'supdateSequenceNumber to    nextval('shared_sequence_object');NOTIFY shared_table;END TRANSACTION;
 

All clients are listeners for the NOTIFY messages, and whenever
a notify is received, a client will doSELECT * FROM shared_table WHERE updateSequenceNumber >
highest_sequence_number_seen;
where highest_sequence_number_seen is a state variable in each
client that is updated while scanning the SELECT results.  This brings
the client up-to-date about all new and modified rows.  There is an
index on updateSequenceNumber so that this kind of query is fast.

In 6.4 this works pretty well; it doesn't allow for concurrent updates
but 6.4 didn't allow concurrent writes even without a LOCK command.
I am wondering if I can use MVCC in 6.5 to increase the level of
concurrency.  The main problem I see is that in the 6.4 design, the
writer sequencing provided by the LOCK TABLE statement guarantees that
updates will be assigned strictly increasing updateSequenceNumber
values.  If I allow concurrent writes in 6.5, it seems that writer
transactions might commit while other writers that obtained lower
sequence values haven't committed yet.  If that happens, a reader might
scan the committed tuples and advance its highest_sequence_number_seen
beyond the uncommitted values.  When the other transaction(s) commit and
NOTIFY, the reader will repeat its SELECT, but will fail to retrieve the
newly-committed tuples.  Oops.

The documentation doesn't say anything about the behavior of sequence
objects under MVCC, so I don't know if there's a way to prevent this
problem.  Do I have to stick to 6.4-equivalent total locking of the
table to prevent concurrent use of the sequence object?

Alternatively, is there a better way to design the table structure for
this application?  I would think "give me all recently modified tuples"
is a pretty standard problem.

Thanks for any info...
        regards, tom lane


pgsql-sql by date:

Previous
From: Thomas Good
Date:
Subject: date_arithmetic revisited
Next
From: Jens Glaser
Date:
Subject: Re: [SQL] Outer Joins, Foreign keys, subselects