Thread: [RFC] extended txid docs

[RFC] extended txid docs

From
"Marko Kreen"
Date:
Although the new txid functions are very clean 1:1 interface
to the internal MVCC info and they don't need much docs
in that respect, their "killer" usage comes from the
possibility to query txids committed between 2 snapshots.

But how to do that (efficiently) is far from obvious when
just looking at the API.

So with attached docs patch I try to fill the gap.  Here I
also show 2 variants for the common query helper function.

But I'm pretty bad at SGML, english and writing docs, so
please review it.  In addition to english/typos/sgml
the suspicious aspects are:

- code style
- writing style
- used mostly PgQ terminology (ticks), could there be
  something better?
- giving two variants of helper function may be too much

Even the realistic code may be too much for general docs,
but considering this is not a functionality covered
by general SQL textbooks, I think it is worth having.

I also put rendered pages up here:

 http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html
 http://skytools.projects.postgresql.org/txid/functions-txid.html

--
marko

Attachment

Re: [RFC] extended txid docs

From
Chris Browne
Date:
markokr@gmail.com ("Marko Kreen") writes:
> Even the realistic code may be too much for general docs,
> but considering this is not a functionality covered
> by general SQL textbooks, I think it is worth having.
>
> I also put rendered pages up here:
>
>  http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html

"The data type txid_snapshot stores info about what transaction ids
are visible in a particular moment of time. Components are described
in..."

I'd suggest instead:

"The data type txid_snapshot stores info about transaction ID
visibility at a particular moment in time. The components are
described in..."

"Smallest txid that may be active. Below it all txids are visible."

I'd suggest instead:

"Earliest transaction ID that is still active.  All earlier
transactions will either be committed and visible, or rolled back and
dead."

"Next unassigned txid. Above it all txids are unassigned, thus invisible."

I'd suggest instead:

"Next unassigned txid.  All txids later than this one are unassigned,
and thus invisible."


>  http://skytools.projects.postgresql.org/txid/functions-txid.html

"The main use of the functions comes from the fact that user can query txids that were committed between 2 snapshots.
Asthis is slightly tricky, it is described here in details on the example of simple queue table." 

I'd suggest instead:

"The main use of the functions is to determine which transactions were
committed between 2 snapshots.  As this is somewhat tricky, a
demonstration of their use with a simple queue table is provided."

"Then let there be table for snapshots, into which a separate process
inserts a row with current snapshot after each 5 seconds (for
example). Lets call it 'ticks' table:"

I'd suggest instead:

"We define a table to store snapshots, called 'ticks', into which a
separate process inserts a row indicating a current transaction
snapshot every 5 seconds."

"Now if someone wants to read events from the queue table, then at
first he needs to get 2 rows with snapshots from ticks table, then
query for txids that were committed between those 2 snapshots on
events table.

Because the txids and snapshots are tied to PostgreSQL internal MVCC
mechanism, the reader can be certain that the txid range queried stays
constant."

I'd suggest instead:

"In order to consistently read event data for a particular period,
then first the user must read 2 rows from the 'ticks' table that
indicate, between them, transaction visibility information, and then
search the event table for the txids that were committed between those
2 snapshots.

Since the txid and snapshot values are tied to PostgreSQL's internal
MVCC mechanism, the reader may be certain that the txid range queried
is consistent."

"But it will have problems if there are long transactions
running. That means the snap1.xmin will stay at the position of
running transaction and the range will get very large.

This can be fixed by fetching only [snap1.xmax..snap2.xmax] by range
and fetching possible txids below snap1.xmax explicitly:"

I'd suggest instead:

"But the query may be processed inefficiently if there are
long-running transactions during the period.  That would have the
result that the snap1.xmin value would continue to refer to the
elderly running transaction, and the range will grow very large.

This may be rectified by fetching only [snap1.xmax..snap2.xmax] by
range and, and fetching candidate txids earlier than snap1.xmax
explicitly:"

"But that is also slightly inefficient as long transactions can be open during several snapshots. So it would be good
topick out exact transactions that were open at the time of snap1 and committed before snap2. That can be done with
followingquery:" 

I'd suggest instead:

"But that query is also somewhat inefficient because long-running
transactions may be open across multiple snapshots.  As a result, it
may be more efficient to pick out exact transactions that were open at
the time of snap1 and committed before snap2.  That can be done with
following query:"

"As txids returned by last query are certainly interesting, their visiblity does not need additional checks. That means
thefinal query can be in form:" 

I'd suggest instead:

"As txids returned by that last query are certainly of interest,
visibility checking does not require additional checks.  That means
the final query may of the form:"

"Although the above queries are technically correct, PostgreSQL fails to plan them efficiently. The actual query should
alwaysbe made with actual values written in." 

I'd suggest instead:

"Although of the above queries are all technically correct, PostgreSQL
will not plan them efficiently unless specific values are used.  The
actual query should always be executed using specific values."

I believe that those suggested texts describe what you intended, and
they should represent better English text for this.
--
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
"What you  said you   want to do  is  roughly  equivalent to   nailing
horseshoes to the tires of your Buick."  -- danceswithcrows@usa.net on
the question "Why can't Linux use Windows Drivers?"

Re: [RFC] extended txid docs

From
"Marko Kreen"
Date:
On 10/16/07, Chris Browne <cbbrowne@acm.org> wrote:
> markokr@gmail.com ("Marko Kreen") writes:
> > Even the realistic code may be too much for general docs,
> > but considering this is not a functionality covered
> > by general SQL textbooks, I think it is worth having.
> >
> > I also put rendered pages up here:
> > http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html
>
> >  http://skytools.projects.postgresql.org/txid/functions-txid.html
>
> I believe that those suggested texts describe what you intended, and
> they should represent better English text for this.

Thanks.  Here is a version with your changes applied, plus
minor code cleanup and example output.

I uploaded full docs to above urls, should be easier to browse.

--
marko

Attachment

Re: [RFC] extended txid docs

From
Tom Lane
Date:
"Marko Kreen" <markokr@gmail.com> writes:
> Thanks.  Here is a version with your changes applied, plus
> minor code cleanup and example output.

I can't really see the reasoning for putting this into the PG
documentation.  It's tremendously complicated and doesn't seem like
something very many people would want to read about.  In any case
it seems rather out of place where it is --- we don't have large
code examples elsewhere in func.sgml.

It almost looks like something that should be turned into a pgfoundry
or contrib module.

            regards, tom lane

Re: [RFC] extended txid docs

From
"Marko Kreen"
Date:
On 10/17/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Marko Kreen" <markokr@gmail.com> writes:
> > Thanks.  Here is a version with your changes applied, plus
> > minor code cleanup and example output.
>
> I can't really see the reasoning for putting this into the PG
> documentation.  It's tremendously complicated and doesn't seem like
> something very many people would want to read about.  In any case
> it seems rather out of place where it is --- we don't have large
> code examples elsewhere in func.sgml.
>
> It almost looks like something that should be turned into a pgfoundry
> or contrib module.

The whole point of the functions it to allow doing snapshot-based
queries.  It is indeed tricky, but that increases the need for
documentaton, no?

I think the last "more realistic code" section can be dropped,
it shows more user-friendly function but adds nothing new,
and the code is rather unreadeable.

--
marko

Re: [RFC] extended txid docs

From
Bruce Momjian
Date:
Marko Kreen wrote:
> On 10/16/07, Chris Browne <cbbrowne@acm.org> wrote:
> > markokr@gmail.com ("Marko Kreen") writes:
> > > Even the realistic code may be too much for general docs,
> > > but considering this is not a functionality covered
> > > by general SQL textbooks, I think it is worth having.
> > >
> > > I also put rendered pages up here:
> > > http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html
> >
> > >  http://skytools.projects.postgresql.org/txid/functions-txid.html
> >
> > I believe that those suggested texts describe what you intended, and
> > they should represent better English text for this.
>
> Thanks.  Here is a version with your changes applied, plus
> minor code cleanup and example output.
>
> I uploaded full docs to above urls, should be easier to browse.

I have applied part of your patch that documents the txid components in
the datatype section.  I didn't apply any of your example usage.  I just
added the mention that:

    The main use of these functions is to determine which transactions
    were committed between two snapshots.

If you want to put those examples on a web site or pgfoundry, we can
link to it from the documentation.

Applied patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/datatype.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.211
diff -c -c -r1.211 datatype.sgml
*** doc/src/sgml/datatype.sgml    21 Oct 2007 20:04:37 -0000    1.211
--- doc/src/sgml/datatype.sgml    5 Nov 2007 14:35:49 -0000
***************
*** 3437,3442 ****
--- 3437,3513 ----

    </sect1>

+   <sect1 id="datatype-txid-snapshot">
+    <title>Transaction Snapshot Type</title>
+
+    <indexterm zone="datatype-txid-snapshot">
+     <primary>txid_snapshot</primary>
+    </indexterm>
+
+    <para>
+     The data type <type>txid_snapshot</type> stores info about transaction ID
+     visibility at a particular moment in time. The components are
+     described in <xref linkend="datatype-txid-snapshot-parts">.
+    </para>
+
+    <table id="datatype-txid-snapshot-parts">
+     <title>Snapshot components</title>
+     <tgroup cols="2">
+      <thead>
+       <row>
+        <entry>Name</entry>
+        <entry>Query Function</entry>
+        <entry>Description</entry>
+       </row>
+      </thead>
+
+      <tbody>
+
+       <row>
+        <entry><type>xmin</type></entry>
+        <entry>txid_snapshot_xmin()</entry>
+        <entry>
+          Earliest transaction ID that is still active.  All earlier
+          transactions will either be committed and visible, or rolled
+          back and dead.
+        </entry>
+       </row>
+
+       <row>
+        <entry><type>xmax</type></entry>
+        <entry>txid_snapshot_xmax()</entry>
+        <entry>
+         Next unassigned txid.  All txids later than this one are
+         unassigned, and thus invisible.
+        </entry>
+       </row>
+
+       <row>
+        <entry><type>xip_list</type></entry>
+        <entry>txid_snapshot_xip()</entry>
+        <entry>
+         Active txids at the time of snapshot.  All of them are between
+         xmin and xmax.  A txid that is <literal>xmin <= txid <
+         xmax</literal> and not in this list is visible.
+        </entry>
+       </row>
+
+      </tbody>
+     </tgroup>
+    </table>
+
+    <para>
+     Snapshot's textual representation is <literal>[xmin]:[xmax]:[xip_list]</literal>
+     for example <literal>10:20:10,14,15</literal> means
+     <literal>xmin=10 xmax=20 xip_list=10,14,15</literal>.
+    </para>
+
+    <para>
+     Functions for getting and querying transaction ids and snapshots are
+     described in <xref linkend="functions-txid">.
+    </para>
+   </sect1>
+
    <sect1 id="datatype-uuid">
     <title><acronym>UUID</acronym> Type</title>

Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.406
diff -c -c -r1.406 func.sgml
*** doc/src/sgml/func.sgml    30 Oct 2007 19:06:56 -0000    1.406
--- doc/src/sgml/func.sgml    5 Nov 2007 14:35:50 -0000
***************
*** 11490,11495 ****
--- 11490,11500 ----
      as well.
     </para>

+   </sect1>
+
+   <sect1 id="functions-txid">
+    <title>Transaction ID and Snapshot Functions</title>
+
     <indexterm>
      <primary>txid_current</primary>
     </indexterm>
***************
*** 11562,11581 ****
     </table>

     <para>
!     The internal transaction ID type (<type>xid</>) is 32 bits wide and so
!     it wraps around every 4 billion transactions.  However, these functions
!     export a 64-bit format that is extended with an <quote>epoch</> counter
!     so that it will not wrap around for the life of an installation.
     </para>
    </sect1>

!  <sect1 id="functions-admin">
!   <title>System Administration Functions</title>

!   <para>
!    <xref linkend="functions-admin-set-table"> shows the functions
!    available to query and alter run-time configuration parameters.
!   </para>

     <table id="functions-admin-set-table">
      <title>Configuration Settings Functions</title>
--- 11567,11589 ----
     </table>

     <para>
!     The internal transaction ID type (<type>xid</>) is 32 bits wide and
!     so it wraps around every 4 billion transactions.  However, these
!     functions export a 64-bit format that is extended with an
!     <quote>epoch</> counter so that it will not wrap around for the life
!     of an installation.  The main use of these functions is to determine
!     which transactions were committed between two snapshots.
     </para>
+
    </sect1>

!   <sect1 id="functions-admin">
!    <title>System Administration Functions</title>

!    <para>
!     <xref linkend="functions-admin-set-table"> shows the functions
!     available to query and alter run-time configuration parameters.
!    </para>

     <table id="functions-admin-set-table">
      <title>Configuration Settings Functions</title>