Re: table as log (multiple writers and readers) - Mailing list pgsql-general

From Craig Ringer
Subject Re: table as log (multiple writers and readers)
Date
Msg-id 4806D5C3.9070106@postnewspapers.com.au
Whole thread Raw
In response to Re: table as log (multiple writers and readers)  (Andrew Sullivan <ajs@crankycanuck.ca>)
Responses Re: table as log (multiple writers and readers)  (Andrew Sullivan <ajs@crankycanuck.ca>)
Re: table as log (multiple writers and readers)  (Andrew Sullivan <ajs@crankycanuck.ca>)
List pgsql-general
Andrew Sullivan wrote:
> On Thu, Apr 17, 2008 at 12:35:33AM +0800, Craig Ringer wrote:
>> That's subject to the same issues, because a transaction's
>> current_timestamp() is determined at transaction start.
>
> But clock_timestamp() (and its ancestors in Postgres) don't have that
> restriction.

True, as I noted later, but it doesn't help. AFAIK you can't guarantee
that multiple concurrent INSERTs will be committed in the same order
that their clock_timestamp() calls were evaluated.

Consequently, you can still have a situation where a record with a lower
timestamp becomes visible to readers after a record with a higher
timestamp has, and after the reader has already recorded the higher
timestamp as their cutoff.

> I dunno that it's enough for you, though, since you have
> visibility issues as well.  You seem to want both the benefits of files and
> relational database transactions, and I don't think you can really have both
> at once without paying in reader complication.

Or writer complication. In the end, the idea that using a file based log
wouldn't have this problem is based on the implicit assumption that the
file based logging mechanism would provide some sort of serialization of
writes.

As POSIX requires the write() call to be thread safe, write() would be
doing its own internal locking (or doing clever lock-free queueing etc)
to ensure writes are serialized.

However, at least in Linux fairly recently, writes aren't serialized, so
you have to do it yourself. See:

http://lwn.net/Articles/180387/

In any case, logging to a file with some sort of writer serialization
isn't significantly different to logging to a database table outside
your transaction using some sort of writer serialization. Both
mechanisms must serialize writers to work. Both mechanisms must operate
outside the transactional rules of the transaction invoking the logging
operation in order to avoid serializing all operations in transactions
that write to the log on the log.

> One way I can think of doing it is to write a seen_log that notes what the
> client has already seen with a timestamp of (say) 1 minute.  Then you can
> say "go forward from this time excluding ids (ids here)".

It won't work with multiple concurrent writers. There is no guarantee
that an INSERT with a timestamp older than the one you just saw isn't
waiting to commit.

--
Craig Ringer

pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Binary bytea to literal strings
Next
From: Andrew Sullivan
Date:
Subject: Re: table as log (multiple writers and readers)