Re: tracking commit timestamps - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: tracking commit timestamps
Date
Msg-id CAB7nPqSC7GDjMV=u5PW0N4ZKreduLAVz8gc7wMu78ig-HaycBQ@mail.gmail.com
Whole thread Raw
In response to Re: tracking commit timestamps  (Petr Jelinek <petr@2ndquadrant.com>)
Responses Re: tracking commit timestamps  (Petr Jelinek <petr@2ndquadrant.com>)
Re: tracking commit timestamps  (Michael Paquier <michael.paquier@gmail.com>)
Re: tracking commit timestamps  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers


On Sat, Nov 1, 2014 at 12:46 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,

On 28/10/14 13:25, Simon Riggs wrote:
On 13 October 2014 10:05, Petr Jelinek <petr@2ndquadrant.com> wrote:

I worked bit on this patch to make it closer to committable state.

Here is updated version that works with current HEAD for the October
committfest.

I've reviewed this and it looks good to me. Clean, follows existing
code neatly, documented and tested.


Thanks for looking at this.

Please could you document a few things

* ExtendCommitTS() works only because commit_ts_enabled can only be
set at server start.
We need that documented so somebody doesn't make it more easily
enabled and break something.
(Could we make it enabled at next checkpoint or similar?)

Maybe we could, but that means some kind of two step enabling facility and I don't want to write that as part of the initial patch since that will need separate discussion (i.e. do we want to have new GUC flag for this, or hack solution just for committs?). So maybe later in a follow-up patch.

* The SLRU tracks timestamps of both xids and subxids. We need to
document that it does this because Subtrans SLRU is not persistent. If
we made Subtrans persistent we might need to store only the top level
xid's commitTS, but that's very useful for typical use cases and
wouldn't save much time at commit.

Attached version with the above comments near the relevant code.
On a personal note, I think that this is a useful feature, particularly useful for replication solutions to resolve commit conflicts by using the method of the first-transaction-that-commits-wins, but this has already been mentioned on this thread. So yes I am a fan of it, and yes let's keep the GUC controlling it at off by default.
 
Now here are a couple of comments at code level, this code seems not enough baked for a commit:
1) The following renaming should be done:
- pg_get_transaction_committime to pg_get_transaction_commit_time
- pg_get_transaction_extradata to pg_get_transaction_extra_data
- pg_get_transaction_committime_data to pg_get_transaction_commit_time_data
- pg_get_latest_transaction_committime_data to pg_get_latest_transaction_commit_time_data
2) This patch adds a new option -c in pg_resetxlog to set the transaction XID of the transaction from which can be retrieved a commit timestamp, but the documentation of pg_resetxlog is not updated.
3) General remark: committs is not a name suited IMO (ts for transaction??). What if the code is changed to use commit_time instead? This remark counts as well for the file names committs.c and committs.h, and for pg_committs.
4) Nitpicky remark in pg_resetxlog, let's try to respect the alphabetical order (not completely related to this patch), so not:
+       while ((c = getopt(argc, argv, "D:fl:m:no:O:x:e:c:")) != -1)
but:
+       while ((c = getopt(argc, argv, "c:e:D:fl:m:no:O:x:")) != -1)
5) --help message should be reworked (alphabetical order of the entries), this will avoid some cycles of Peter as he usually spends time revisiting and cleaning up such things:
        printf(_("  -x XID           set next transaction ID\n"));
+       printf(_("  -c XID           set the oldest retrievable commit timestamp\n"));
        printf(_("  -?, --help       show this help, then exit\n"));
6) To be consistent with everything, shouldn't track_commit_timestamp be renamed to track_commit_time
7) This documentation portion should be reworked from that:
+       <para>
+        Record commit time of transactions.  This parameter
+        can only be set in
+        the <filename>postgresql.conf</> file or on the server command line.
+        The default value is off.
+       </para>
To roughly that (not the rewording and the use of <literal>):
+       <para>
+        Record commit time of transactions.  This parameter
+        can only be set in <filename>postgresql.conf</> or on the server command line.
+        The default value is <literal>off</literal>.
+       </para>
8) Let's update this file list more consistently:
-OBJS = clogdesc.o dbasedesc.o gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+OBJS = clogdesc.o committsdesc.o dbasedesc.o gindesc.o gistdesc.o hashdesc.o \
+       heapdesc.o \
           mxactdesc.o nbtdesc.o relmapdesc.o seqdesc.o smgrdesc.o spgdesc.o \
           standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
9) Hm?! "oldest commit time xid", no?
-                                                "oldest running xid %u; %s",
+                                                "oldest CommitTs xid: %u; oldest running xid %u; %s",
10) I don't see why this diff is in the patch:
        /*
-        * See how many subxids, if any, are on the same page as the parent, if
-        * any.
+        * See how many subxids, if any, are on the same page as the parent.
         */
11) contrib/Makefile has not been updated with the new module test_committs that this patch introduces.
12) In committs_desc@committsdesc.c, isn't this block overthinking a bit:
+       else
+               appendStringInfo(buf, "UNKNOWN");
It may be better to remove it, no?
13) Isn't that 2014?
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
14) I'd put the two checks in the reverse order:
+       Assert(xid != InvalidTransactionId);
+
+       if (!commit_ts_enabled)
+               return;
15) The multiple calls to PG_FUNCTION_INFO_V1 in committs.c are not necessary. Those functions are already defined in pg_proc.h.
16) make installcheck (test committs_off) fails on a server that has track_committs set to on. You should use an alternate output. I would recommend removing as well the _off suffix in the test name. Let's use commit_time. Also, it should be mentioned in parallel_schedule with a comment that this test should always run alone and never in parallel with other tests. Honestly, I also think that test_committs brings no additional value and results in duplication code between src/test/regress and contrib/test_committs. So I'd just rip it off. On top of that, I think that "SHOW track_committs" should be added in the list of commands run in the test. We actually want to check of commit time are really registered if the feature switch is on or off.

I am still planning to do more extensive tests, and study a bit more committs.c (with even more comments) as it is the core part of the feature. For now I'd recommend to hold on commit fire for this patch.
Regards,
--
Michael

pgsql-hackers by date:

Previous
From: Eric Ridge
Date:
Subject: Re: Let's drop two obsolete features which are bear-traps for novices
Next
From: Simon Riggs
Date:
Subject: Re: group locking: incomplete patch, just for discussion