Re: temporal support patch - Mailing list pgsql-hackers

From Gavin Flower
Subject Re: temporal support patch
Date
Msg-id 5033F4ED.9030900@archidevsys.co.nz
Whole thread Raw
In response to Re: temporal support patch  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: temporal support patch
Re: temporal support patch
List pgsql-hackers
<div class="moz-cite-prefix">On 22/08/12 02:16, Kevin Grittner wrote:<br /></div><blockquote
cite="mid:503351DA0200002500049981@gw.wicourts.gov"type="cite"><pre wrap="">Jeff Davis <a class="moz-txt-link-rfc2396E"
href="mailto:pgsql@j-davis.com"><pgsql@j-davis.com></a>wrote:
 
</pre><blockquote type="cite"><pre wrap="">On Mon, 2012-08-20 at 19:32 -0500, Kevin Grittner wrote:
</pre><blockquote type="cite"><pre wrap="">Josh Berkus <a class="moz-txt-link-rfc2396E"
href="mailto:josh@agliodbs.com"><josh@agliodbs.com></a>wrote:
 
</pre><blockquote type="cite"><pre wrap="">This is sounding like a completely runaway spec on what should
be a simple feature.
</pre></blockquote><pre wrap=""> 
I hate to contribute to scope creep (or in this case scope
screaming down the tracks at full steam), but I've been watching
this with a queasy feeling about interaction with Serializable
Snapshot Isolation (SSI).
</pre></blockquote><pre wrap="">
There are all kinds of challenges here, and I'm glad you're
thinking about them. I alluded to some problems here:


</pre></blockquote><pre wrap=""><a class="moz-txt-link-freetext"
href="http://archives.postgresql.org/message-id/1345415312.20987.56.camel@jdavis">http://archives.postgresql.org/message-id/1345415312.20987.56.camel@jdavis</a>
</pre><blockquote type="cite"><pre wrap="">
But those might be a subset of the problems you're talking about.

It sounds like, at a high level, there are two problems:

1. capturing the apparent order of execution in the audit log
2. assigning meaningful times to the changes that are consistent
with the apparent order of execution
</pre></blockquote><pre wrap=""> 
As far as I can see, transactions which execute DML at any
transaction isolation level other than serializable can be
considered to have occurred in commit order.  Transactions which
don't write to the database don't need to be considered as part of
the history, at least in terms of viewing prior state.  Same with
transactions which roll back.  (Now, failed transactions and reads
might be of interest for some audit reports, but that seems to me
like a different issue than a temporal database.)
The funny bit is for a serializable transaction (TN) which commits
after writing to the database -- you can't know the apparent order
of execution as long as there are any serializable transactions
active which can't see the work of TN (i.e., the transactions
overlap).  If such a transaction (TX) executes a read which
conflicts with a TN write, TX appears to have executed first, since
it doesn't see the work of TN, so I think the sequence number or
timestamp for TN has to follow that for TX even though TN committed
first.  On the other hand, TX might write something that conflicts
with a TN read, in which case TN will appear to have executed first
and must get a sequence number or timestamp before TX.
If there is a cycle, SSI will cancel one of the transactions
involved, so that can't occur anywhere in the time line.
So, if you want to allow serializable temporal queries, the timing
of a read-write serializable transaction can't be locked down until
all overlapping read-write serializable transactions complete; and
the apparent order of execution must be based on read-write
conflicts, which are tracked within SSI.  I think that if we can
generate a list of committed transactions in order based on this
logic, it could feed into replication system -- hot standby as well
as trigger-based systems.  I think we could generate snapshots which
exclude the transactions for which the order of execution has not
yet been determined, and avoid the delays involved in other possible
solutions.
There's a lot of detail missing here in terms of what the API would
be, and how we handle the summarization that can occur within SSI so
that it can continue to function within bounded memory even in
pessimal circumstances, but that's the general outline of my
concerns and suggested solution.
-Kevin


</pre></blockquote><p class="western" style="margin-bottom: 0cm">So if I understand correctly...<br /><br /> If there
isa very long running transaction, say 1 hour, then all (or just some? - depending) transactions that nominally start
andfinish within that time, can not have definitive start times until the very long running transaction finishes, even
ifthey are successfully committed?<br /><br /> So if someone looks at the audit log they might not see all the
transactionsthey expect to see.<br /><br /> So, if I had an automatic query A which updated statistics based on on
transactionscommitted over the last 10 minutes, then many (all?) transactions starting and successfully completing
duringthe time of the very long running transaction will never show up! Here I am envisioning a query fired off every
tenminutes looking for audit records with timestamps within the previous ten minutes. However, if I ran a query B
lookingat audit record numbers with in 10 minute intervals for a week, but kicked off 24 hours after the week finished
–then I would see the records I did not see in query A.<p class="western" style="margin-bottom: 0cm">Hmm... if I am at
allright, then probably best to have some suitably worded 'government health warning' prominent in the
documentation!<br/><p class="western" style="margin-bottom: 0cm"><br /> Cheers,<br /> Gavin<br /><br /><br /><br /><br
/>

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: multi-master pgbench?
Next
From: Mathieu Fenniak
Date:
Subject: restartpoints stop generating on streaming replication slave