Vance Maverick wrote:
> I want to create a table with the semantics of a log. There may be
> multiple concurrent writers, because log entries will be generated by
> triggers. And there will be multiple concurrent readers -- mostly
> remote processes watching the log over time. I'd like to guarantee that
> each of those readers will see the same, complete sequence of changes.
>
> For a first attempt, I created a table with a serial column (actually I
> created the sequence ID separately, but that's a detail). The readers
> connect remotely, scan through the table in sequence order, and remember
> the ID of the last row they read. When they read again, they start from
> after that ID -- the query is roughly
>
> SELECT * FROM logtable WHERE id > ? ORDER BY id
>
> But there's a problem with this -- the rows may not be inserted into the
> log in ID order. For example, if two concurrent sessions get the
> sequence IDs 3 and 4, the one with ID 4 might commit first. If a reader
> queries the table at that moment, it will see the sequence (1, 2, 4).
> Later, if the other session commits, a log entry with ID 3 will be
> added. The IDs will now be (1, 2, 4, 3) -- but if the reader comes back
> to poll the log again, it will miss entry 3. In order not to miss it,
> the reader would have to remember much more information than just a
> high-water mark. (I think it would have to remember the complete set of
> IDs it had processed already, and then scan the whole table to find
> entries not in the set.)
>
> Is there an approach that will give the semantics I need? In
> particular, I'd like a reader that has already read up to a certain
> point in the log to be able to restart at or near that same point, so it
> wouldn't have to reread everything.
>
> (I recognize that these log semantics are closer to those provided by
> files, not database tables. A file is an option I'll consider, but
> obviously it lacks transactional semantics.)
>
Use a timestamp column also.