Thread: tracking commit timestamps

tracking commit timestamps

From
Alvaro Herrera
Date:
Hi,

There has been some interest in keeping track of timestamp of
transaction commits.  This patch implements that.

There are some seemingly curious choices here.  First, this module can
be disabled, and in fact it's turned off by default.  At startup, we
verify whether it's enabled, and create the necessary SLRU segments if
so.  And if the server is started with this disabled, we set the oldest
value we know about to avoid trying to read the commit TS of
transactions of which we didn't keep record.  The ability to turn this
off is there to avoid imposing the overhead on systems that don't need
this feature.

Another thing of note is that we allow for some extra data alongside the
timestamp proper.  This might be useful for a replication system that
wants to keep track of the origin node ID of a committed transaction,
for example.  Exactly what will we do with the bit space we have is
unclear, so I have kept it generic and called it "commit extra data".

This offers the chance for outside modules to set the commit TS of a
transaction; there is support for WAL-logging such values.  But the core
user of the feature (RecordTransactionCommit) doesn't use it, because
xact.c's WAL logging itself is enough.  For systems that are replicating
transactions from remote nodes, it is useful.

We also keep track of the latest committed transaction.  This is
supposed to be useful to calculate replication lag.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Amit Kapila
Date:
On Wed, Oct 23, 2013 at 3:46 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Hi,
>
> There has been some interest in keeping track of timestamp of
> transaction commits.  This patch implements that.

Some of the use cases, I could think of are
1. Is it for usecases such that if user want to read all data of table
where transaction commit_ts <= '2012-04-04 09:30:00'?
2. for replication systems, may be the middleware can use it to replay
transactions in some remote system.
3. Is there any use of this feature in logical-rep/change data extraction?

> There are some seemingly curious choices here.  First, this module can
> be disabled, and in fact it's turned off by default.  At startup, we
> verify whether it's enabled, and create the necessary SLRU segments if
> so.  And if the server is started with this disabled, we set the oldest
> value we know about to avoid trying to read the commit TS of
> transactions of which we didn't keep record.  The ability to turn this
> off is there to avoid imposing the overhead on systems that don't need
> this feature.
>
> Another thing of note is that we allow for some extra data alongside the
> timestamp proper.  This might be useful for a replication system that
> wants to keep track of the origin node ID of a committed transaction,
> for example.  Exactly what will we do with the bit space we have is
> unclear, so I have kept it generic and called it "commit extra data".

"commit extra data" can be LSN of commit log record, but I think it
will also depend on how someone wants to use this feature.
To suggest for storing LSN, I had referred information at below page
which describes about similar information for each transaction.
http://technet.microsoft.com/en-us/library/cc645959.aspx


> This offers the chance for outside modules to set the commit TS of a
> transaction; there is support for WAL-logging such values.  But the core
> user of the feature (RecordTransactionCommit) doesn't use it, because
> xact.c's WAL logging itself is enough.

I have one question for the case when commits is set from
RecordTransactionCommit().

*** 1118,1123 **** RecordTransactionCommit(void)
--- 1119,1132 ---- }
 /*
+ * We don't need to log the commit timestamp separately since the commit
+ * record logged above has all the necessary action to set the timestamp
+ * again.
+ */
+ TransactionTreeSetCommitTimestamp(xid, nchildren, children,
+  xactStopTimestamp, 0, false);
+

Here for CLOG, we are doing Xlogflush before writing to Clog page, but
Committs writes timestamp before XlogFlush().
Won't that create problem for synchronous commit as Checkpoint can
takecare of flushing Xlog for relation pages before flush of page,
but for Clog/Committs RecordTransactionCommit() should take care of doing it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: tracking commit timestamps

From
Jaime Casanova
Date:
On Tue, Oct 22, 2013 at 5:16 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Hi,
>
> There has been some interest in keeping track of timestamp of
> transaction commits.  This patch implements that.
>

Hi,

Sorry for the delay on the review.

First, because of the recent fixes the patch doesn't apply cleanly
anymore but the changes seems to be easy.

=== performance ===

i expected a regression on performance with the module turned on
because of the new XLOG records and wrote of files in pg_committs but
the performance drop is excessive.

Master             437.835674 tps
Patch, guc off   436.4340982 tps
Patch, guc on       0.370524 tps

This is in a pgbench's db initialized with scale=50 and run with
"pgbench -c 64 -j 16 -n -T 300" 5 times (values above are the average
of runs)

postgresql changes:

shared_buffers = 1GB
full_page_writes = off
checkpoint_segments = 30
checkpoint_timeout = 15min
random_page_cost = 2.0

== functionality ==

I started the server with the module off, then after some work turned
the module on and restarted the server and run a few benchs then
turned it off again and restart the server and get a message like:

"""
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 0/3112AE58
LOG:  redo is not required
FATAL:  cannot make new WAL entries during recovery
LOG:  startup process (PID 24876) exited with exit code 1
LOG:  aborting startup due to startup process failure
"""

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566         Cell: +593 987171157



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2013-12-02 02:39:55 -0500, Jaime Casanova wrote:
> === performance ===
> 
> i expected a regression on performance with the module turned on
> because of the new XLOG records and wrote of files in pg_committs but
> the performance drop is excessive.
> Master             437.835674 tps
> Patch, guc off   436.4340982 tps
> Patch, guc on       0.370524 tps

There clearly is something wrong. The additional amount of xlog records
should be nearly unnoticeable because committs piggybacks on commit
records.


> I started the server with the module off, then after some work turned
> the module on and restarted the server and run a few benchs then
> turned it off again and restart the server and get a message like:
> 
> """
> LOG:  database system was not properly shut down; automatic recovery in progress
> LOG:  record with zero length at 0/3112AE58
> LOG:  redo is not required
> FATAL:  cannot make new WAL entries during recovery
> LOG:  startup process (PID 24876) exited with exit code 1
> LOG:  aborting startup due to startup process failure
> """

Alvaro: That's because of the location you call StartupCommitts - a)
it's called at the beginning of recovery if HS is enabled b) it's called
before StartupXLOG() does LocalSetXLogInsertAllowed().

So I think you need to split StartupCommitts into StartupCommitts() and
TrimCommitts() where only the latter does the trimming if committs is
disabled.
I also wonder if track_commit_timestamp should be tracked by the the
XLOG_PARAMETER_CHANGE stuff? So it's not disabled on the standby when
it's enabled on the primary?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Heikki Linnakangas
Date:
On 10/23/2013 01:16 AM, Alvaro Herrera wrote:
> There has been some interest in keeping track of timestamp of
> transaction commits.  This patch implements that.
>
> There are some seemingly curious choices here.  First, this module can
> be disabled, and in fact it's turned off by default.  At startup, we
> verify whether it's enabled, and create the necessary SLRU segments if
> so.  And if the server is started with this disabled, we set the oldest
> value we know about to avoid trying to read the commit TS of
> transactions of which we didn't keep record.  The ability to turn this
> off is there to avoid imposing the overhead on systems that don't need
> this feature.
>
> Another thing of note is that we allow for some extra data alongside the
> timestamp proper.  This might be useful for a replication system that
> wants to keep track of the origin node ID of a committed transaction,
> for example.  Exactly what will we do with the bit space we have is
> unclear, so I have kept it generic and called it "commit extra data".
>
> This offers the chance for outside modules to set the commit TS of a
> transaction; there is support for WAL-logging such values.  But the core
> user of the feature (RecordTransactionCommit) doesn't use it, because
> xact.c's WAL logging itself is enough.  For systems that are replicating
> transactions from remote nodes, it is useful.
>
> We also keep track of the latest committed transaction.  This is
> supposed to be useful to calculate replication lag.

Generally speaking, I'm not in favor of adding dead code, even if it 
might be useful to someone in the future. For one, it's going to get 
zero testing. Once someone comes up with an actual use case, let's add 
that stuff at that point. Otherwise there's a good chance that we build 
something that's almost but not quite useful.

Speaking of the functionality this does offer, it seems pretty limited. 
A commit timestamp is nice, but it isn't very interesting on its own. 
You really also want to know what the transaction did, who ran it, etc. 
ISTM some kind of a auditing or log-parsing system that could tell you 
all that would be much more useful, but this patch doesn't get us any 
closer to that.

Does this handle XID wraparound correctly? SLRU has a maximum of 64k 
segments with 32 SLRU pages each. With 12 bytes per each commit entry, 
that's not enough to hold the timestamp and "commit extra data" of the 
whole 2^31 XID range: (8192 * 32 * 65536) / 12 = 1431655765. And that's 
with the default page size, with smaller pages you run into the limit 
quicker.

It would be nice to teach SLRU machinery how to deal with more than 64k 
segments. SSI code in twophase.c ran into the same limit, and all you 
get is a warning there.

- Heikki



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2013-12-10 11:56:45 +0200, Heikki Linnakangas wrote:
> Speaking of the functionality this does offer, it seems pretty limited. A
> commit timestamp is nice, but it isn't very interesting on its own. You
> really also want to know what the transaction did, who ran it, etc. ISTM
> some kind of a auditing or log-parsing system that could tell you all that
> would be much more useful, but this patch doesn't get us any closer to
> that.

It's useful for last-update-wins for async multimaster. Currently
several userspace solutions try to approximate it by inserting a
timestamps into a column when a row is inserted or updated, but that is
quite limiting because either the number is out of date wrt. the commit
and/or it will differ between the rows.

I don't see how you could get an accurate timestamp in a significantly
different way?

> Does this handle XID wraparound correctly? SLRU has a maximum of 64k
> segments with 32 SLRU pages each. With 12 bytes per each commit entry,
> that's not enough to hold the timestamp and "commit extra data" of the whole
> 2^31 XID range: (8192 * 32 * 65536) / 12 = 1431655765. And that's with the
> default page size, with smaller pages you run into the limit quicker.
> 
> It would be nice to teach SLRU machinery how to deal with more than 64k
> segments. SSI code in twophase.c ran into the same limit, and all you get is
> a warning there.

Yea, 9.3 is already running afoul of that, because of the changed size
for the multixact member pages. Came up just yesterday in the course of
#8673.

(gdb) p/x (1L<<32)/(MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)
$10 = 0x14078

Is this limitation actually documented anywhere? And is there anything
that needs to be changed but SlruScanDirectory()?


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Tue, Dec 10, 2013 at 4:56 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Generally speaking, I'm not in favor of adding dead code, even if it might
> be useful to someone in the future. For one, it's going to get zero testing.
> Once someone comes up with an actual use case, let's add that stuff at that
> point. Otherwise there's a good chance that we build something that's almost
> but not quite useful.

Fair.

> Speaking of the functionality this does offer, it seems pretty limited. A
> commit timestamp is nice, but it isn't very interesting on its own. You
> really also want to know what the transaction did, who ran it, etc. ISTM
> some kind of a auditing or log-parsing system that could tell you all that
> would be much more useful, but this patch doesn't get us any closer to that.

For what it's worth, I think that this has been requested numerous
times over the years by numerous developers of replication solutions.
My main question (apart from whether or not it may have bugs) is
whether it makes a noticeable performance difference.  If it does,
that sucks.  If it does not, maybe we ought to enable it by default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Tue, Dec 10, 2013 at 4:56 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:

> > Speaking of the functionality this does offer, it seems pretty limited. A
> > commit timestamp is nice, but it isn't very interesting on its own. You
> > really also want to know what the transaction did, who ran it, etc. ISTM
> > some kind of a auditing or log-parsing system that could tell you all that
> > would be much more useful, but this patch doesn't get us any closer to that.
> 
> For what it's worth, I think that this has been requested numerous
> times over the years by numerous developers of replication solutions.
> My main question (apart from whether or not it may have bugs) is
> whether it makes a noticeable performance difference.  If it does,
> that sucks.  If it does not, maybe we ought to enable it by default.

I expect it will have some performance impact -- this is why we made it
disable-able in the first place, and why I went to the trouble of
ensuring it can be turned on after initdb.  Normal pg_clog entries are 2
bits per transaction, whereas the commit timestamp stuff adds 12 *bytes*
per transaction.  Not something to be taken lightly, hence it's off by
default.  Presumably people who is using one of those replication
systems is okay with taking some (reasonable) performance hit.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Tue, Dec 10, 2013 at 4:04 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas escribió:
>> On Tue, Dec 10, 2013 at 4:56 AM, Heikki Linnakangas
>> <hlinnakangas@vmware.com> wrote:
>> > Speaking of the functionality this does offer, it seems pretty limited. A
>> > commit timestamp is nice, but it isn't very interesting on its own. You
>> > really also want to know what the transaction did, who ran it, etc. ISTM
>> > some kind of a auditing or log-parsing system that could tell you all that
>> > would be much more useful, but this patch doesn't get us any closer to that.
>>
>> For what it's worth, I think that this has been requested numerous
>> times over the years by numerous developers of replication solutions.
>> My main question (apart from whether or not it may have bugs) is
>> whether it makes a noticeable performance difference.  If it does,
>> that sucks.  If it does not, maybe we ought to enable it by default.
>
> I expect it will have some performance impact -- this is why we made it
> disable-able in the first place, and why I went to the trouble of
> ensuring it can be turned on after initdb.  Normal pg_clog entries are 2
> bits per transaction, whereas the commit timestamp stuff adds 12 *bytes*
> per transaction.  Not something to be taken lightly, hence it's off by
> default.  Presumably people who is using one of those replication
> systems is okay with taking some (reasonable) performance hit.

Well, writing 12 extra bytes (why not 8?) on each commit is not
intrinsically that expensive.  The (poor) design of SLRU might make it
expensive, though, because since it has no fsync absorption queue, so
sometimes you end up waiting for an fsync, and doing that 48x more
often will indeed have some cost.  :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Petr Jelinek
Date:
Hi,

I worked bit on this patch to make it closer to committable state.

There are several bugs fixed, including ones mentioned by Jamie (writing
WAL during recovery).

Also support for pg_resetxlog/pg_upgrade has been implemented by Andres.

I added simple regression test and regression contrib module to cover
both off and on settings.

The SLRU issue Heikki mentioned should be also gone mainly thanks to
638cf09e7 (I did test it too).

One notable thing missing is documentation for the three SQL level
interfaces provided, I plan to add that soon.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 09/09/14 19:05, Petr Jelinek wrote:
> Hi,
>
> I worked bit on this patch to make it closer to committable state.
>
> There are several bugs fixed, including ones mentioned by Jamie (writing
> WAL during recovery).
>
> Also support for pg_resetxlog/pg_upgrade has been implemented by Andres.
>
> I added simple regression test and regression contrib module to cover
> both off and on settings.
>
> The SLRU issue Heikki mentioned should be also gone mainly thanks to
> 638cf09e7 (I did test it too).
>

Here is updated version that works with current HEAD for the October
committfest.


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Simon Riggs
Date:
On 13 October 2014 10:05, Petr Jelinek <petr@2ndquadrant.com> wrote:

>> I worked bit on this patch to make it closer to committable state.

> Here is updated version that works with current HEAD for the October
> committfest.

I've reviewed this and it looks good to me. Clean, follows existing
code neatly, documented and tested.

Please could you document a few things

* ExtendCommitTS() works only because commit_ts_enabled can only be
set at server start.
We need that documented so somebody doesn't make it more easily
enabled and break something.
(Could we make it enabled at next checkpoint or similar?)

* The SLRU tracks timestamps of both xids and subxids. We need to
document that it does this because Subtrans SLRU is not persistent. If
we made Subtrans persistent we might need to store only the top level
xid's commitTS, but that's very useful for typical use cases and
wouldn't save much time at commit.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Michael Paquier
Date:


On Tue, Oct 28, 2014 at 9:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 13 October 2014 10:05, Petr Jelinek <petr@2ndquadrant.com> wrote:

>> I worked bit on this patch to make it closer to committable state.

> Here is updated version that works with current HEAD for the October
> committfest.

I've reviewed this and it looks good to me. Clean, follows existing
code neatly, documented and tested.

Please could you document a few things

* ExtendCommitTS() works only because commit_ts_enabled can only be
set at server start.
We need that documented so somebody doesn't make it more easily
enabled and break something.
(Could we make it enabled at next checkpoint or similar?)

* The SLRU tracks timestamps of both xids and subxids. We need to
document that it does this because Subtrans SLRU is not persistent. If
we made Subtrans persistent we might need to store only the top level
xid's commitTS, but that's very useful for typical use cases and
wouldn't save much time at commit.

Hm. What is the performance impact of this feature using the latest version of this patch? I imagine that the penalty of the additional operations this feature introduces is not zero, particularly for loads with lots of short transactions.
--
Michael

Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-10-31 14:55:11 +0900, Michael Paquier wrote:
> On Tue, Oct 28, 2014 at 9:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> 
> > On 13 October 2014 10:05, Petr Jelinek <petr@2ndquadrant.com> wrote:
> >
> > >> I worked bit on this patch to make it closer to committable state.
> >
> > > Here is updated version that works with current HEAD for the October
> > > committfest.
> >
> > I've reviewed this and it looks good to me. Clean, follows existing
> > code neatly, documented and tested.
> >
> > Please could you document a few things
> >
> > * ExtendCommitTS() works only because commit_ts_enabled can only be
> > set at server start.
> > We need that documented so somebody doesn't make it more easily
> > enabled and break something.
> > (Could we make it enabled at next checkpoint or similar?)
> >
> > * The SLRU tracks timestamps of both xids and subxids. We need to
> > document that it does this because Subtrans SLRU is not persistent. If
> > we made Subtrans persistent we might need to store only the top level
> > xid's commitTS, but that's very useful for typical use cases and
> > wouldn't save much time at commit.
> >
> 
> Hm. What is the performance impact of this feature using the latest version
> of this patch?

I haven't measured it recently, but it wasn't large, but measureable.

> I imagine that the penalty of the additional operations this
> feature introduces is not zero, particularly for loads with lots of short
> transactions.

Which is why you can disable it...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Merlin Moncure
Date:
On Tue, Dec 10, 2013 at 2:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Speaking of the functionality this does offer, it seems pretty limited. A
>> commit timestamp is nice, but it isn't very interesting on its own. You
>> really also want to know what the transaction did, who ran it, etc. ISTM
>> some kind of a auditing or log-parsing system that could tell you all that
>> would be much more useful, but this patch doesn't get us any closer to that.
>
> For what it's worth, I think that this has been requested numerous
> times over the years by numerous developers of replication solutions.
> My main question (apart from whether or not it may have bugs) is
> whether it makes a noticeable performance difference.  If it does,
> that sucks.  If it does not, maybe we ought to enable it by default.

+1

It's also requested now and then in the context of auditing and
forensic analysis of application problems.  But I also agree that the
tolerance for performance overhead is got to be quite low.  If a GUC
is introduced to manage the tradeoff, it should be defaulted to 'on'.

merlin



Re: tracking commit timestamps

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> It's also requested now and then in the context of auditing and
> forensic analysis of application problems.  But I also agree that the
> tolerance for performance overhead is got to be quite low.  If a GUC
> is introduced to manage the tradeoff, it should be defaulted to 'on'.

Alvaro's original submission specified that the feature defaults to "off".
Since there's no use-case for it unless you've installed some third-party
code (eg an external replication solution), I think that should stay true.
The feature's overhead might be small, but it is most certainly not zero,
and people shouldn't be paying for it unless they need it.
        regards, tom lane



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 31/10/14 15:07, Tom Lane wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> It's also requested now and then in the context of auditing and
>> forensic analysis of application problems.  But I also agree that the
>> tolerance for performance overhead is got to be quite low.  If a GUC
>> is introduced to manage the tradeoff, it should be defaulted to 'on'.
>
> Alvaro's original submission specified that the feature defaults to "off".
> Since there's no use-case for it unless you've installed some third-party
> code (eg an external replication solution), I think that should stay true.
> The feature's overhead might be small, but it is most certainly not zero,
> and people shouldn't be paying for it unless they need it.
>

Agreed, that's why it stayed 'off' in my version too.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
Hi,

On 28/10/14 13:25, Simon Riggs wrote:
> On 13 October 2014 10:05, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>>> I worked bit on this patch to make it closer to committable state.
>
>> Here is updated version that works with current HEAD for the October
>> committfest.
>
> I've reviewed this and it looks good to me. Clean, follows existing
> code neatly, documented and tested.
>

Thanks for looking at this.

> Please could you document a few things
>
> * ExtendCommitTS() works only because commit_ts_enabled can only be
> set at server start.
> We need that documented so somebody doesn't make it more easily
> enabled and break something.
> (Could we make it enabled at next checkpoint or similar?)

Maybe we could, but that means some kind of two step enabling facility
and I don't want to write that as part of the initial patch since that
will need separate discussion (i.e. do we want to have new GUC flag for
this, or hack solution just for committs?). So maybe later in a
follow-up patch.

> * The SLRU tracks timestamps of both xids and subxids. We need to
> document that it does this because Subtrans SLRU is not persistent. If
> we made Subtrans persistent we might need to store only the top level
> xid's commitTS, but that's very useful for typical use cases and
> wouldn't save much time at commit.

Attached version with the above comments near the relevant code.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Simon Riggs
Date:
On 31 October 2014 15:46, Petr Jelinek <petr@2ndquadrant.com> wrote:

> Attached version with the above comments near the relevant code.

Looks cooked and ready to serve. Who's gonna commit this? Alvaro, or
do you want me to?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Michael Paquier
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Sat, Nov 1, 2014 at 1:15 AM, Simon
Riggs<span dir="ltr"><<a href="mailto:simon@2ndquadrant.com" target="_blank">simon@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 31
October2014 15:46, Petr Jelinek <<a href="mailto:petr@2ndquadrant.com">petr@2ndquadrant.com</a>> wrote:<br /><br
/>> Attached version with the above comments near the relevant code.<br /><br /> Looks cooked and ready to serve.
Who'sgonna commit this? Alvaro, or<br /> do you want me to?<span class="HOEnZb"><font color="#888888"><br
/></font></span></blockquote></div>Couldyou hold on a bit? I'd like to have a look at it more deeply and by looking at
quicklyat the code there are a couple of things that could be improved.<br />Regards,<br />-- <br /><div
class="gmail_signature">Michael<br/></div></div></div> 

Re: tracking commit timestamps

From
Michael Paquier
Date:


On Sat, Nov 1, 2014 at 12:46 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,

On 28/10/14 13:25, Simon Riggs wrote:
On 13 October 2014 10:05, Petr Jelinek <petr@2ndquadrant.com> wrote:

I worked bit on this patch to make it closer to committable state.

Here is updated version that works with current HEAD for the October
committfest.

I've reviewed this and it looks good to me. Clean, follows existing
code neatly, documented and tested.


Thanks for looking at this.

Please could you document a few things

* ExtendCommitTS() works only because commit_ts_enabled can only be
set at server start.
We need that documented so somebody doesn't make it more easily
enabled and break something.
(Could we make it enabled at next checkpoint or similar?)

Maybe we could, but that means some kind of two step enabling facility and I don't want to write that as part of the initial patch since that will need separate discussion (i.e. do we want to have new GUC flag for this, or hack solution just for committs?). So maybe later in a follow-up patch.

* The SLRU tracks timestamps of both xids and subxids. We need to
document that it does this because Subtrans SLRU is not persistent. If
we made Subtrans persistent we might need to store only the top level
xid's commitTS, but that's very useful for typical use cases and
wouldn't save much time at commit.

Attached version with the above comments near the relevant code.
On a personal note, I think that this is a useful feature, particularly useful for replication solutions to resolve commit conflicts by using the method of the first-transaction-that-commits-wins, but this has already been mentioned on this thread. So yes I am a fan of it, and yes let's keep the GUC controlling it at off by default.
 
Now here are a couple of comments at code level, this code seems not enough baked for a commit:
1) The following renaming should be done:
- pg_get_transaction_committime to pg_get_transaction_commit_time
- pg_get_transaction_extradata to pg_get_transaction_extra_data
- pg_get_transaction_committime_data to pg_get_transaction_commit_time_data
- pg_get_latest_transaction_committime_data to pg_get_latest_transaction_commit_time_data
2) This patch adds a new option -c in pg_resetxlog to set the transaction XID of the transaction from which can be retrieved a commit timestamp, but the documentation of pg_resetxlog is not updated.
3) General remark: committs is not a name suited IMO (ts for transaction??). What if the code is changed to use commit_time instead? This remark counts as well for the file names committs.c and committs.h, and for pg_committs.
4) Nitpicky remark in pg_resetxlog, let's try to respect the alphabetical order (not completely related to this patch), so not:
+       while ((c = getopt(argc, argv, "D:fl:m:no:O:x:e:c:")) != -1)
but:
+       while ((c = getopt(argc, argv, "c:e:D:fl:m:no:O:x:")) != -1)
5) --help message should be reworked (alphabetical order of the entries), this will avoid some cycles of Peter as he usually spends time revisiting and cleaning up such things:
        printf(_("  -x XID           set next transaction ID\n"));
+       printf(_("  -c XID           set the oldest retrievable commit timestamp\n"));
        printf(_("  -?, --help       show this help, then exit\n"));
6) To be consistent with everything, shouldn't track_commit_timestamp be renamed to track_commit_time
7) This documentation portion should be reworked from that:
+       <para>
+        Record commit time of transactions.  This parameter
+        can only be set in
+        the <filename>postgresql.conf</> file or on the server command line.
+        The default value is off.
+       </para>
To roughly that (not the rewording and the use of <literal>):
+       <para>
+        Record commit time of transactions.  This parameter
+        can only be set in <filename>postgresql.conf</> or on the server command line.
+        The default value is <literal>off</literal>.
+       </para>
8) Let's update this file list more consistently:
-OBJS = clogdesc.o dbasedesc.o gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+OBJS = clogdesc.o committsdesc.o dbasedesc.o gindesc.o gistdesc.o hashdesc.o \
+       heapdesc.o \
           mxactdesc.o nbtdesc.o relmapdesc.o seqdesc.o smgrdesc.o spgdesc.o \
           standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
9) Hm?! "oldest commit time xid", no?
-                                                "oldest running xid %u; %s",
+                                                "oldest CommitTs xid: %u; oldest running xid %u; %s",
10) I don't see why this diff is in the patch:
        /*
-        * See how many subxids, if any, are on the same page as the parent, if
-        * any.
+        * See how many subxids, if any, are on the same page as the parent.
         */
11) contrib/Makefile has not been updated with the new module test_committs that this patch introduces.
12) In committs_desc@committsdesc.c, isn't this block overthinking a bit:
+       else
+               appendStringInfo(buf, "UNKNOWN");
It may be better to remove it, no?
13) Isn't that 2014?
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
14) I'd put the two checks in the reverse order:
+       Assert(xid != InvalidTransactionId);
+
+       if (!commit_ts_enabled)
+               return;
15) The multiple calls to PG_FUNCTION_INFO_V1 in committs.c are not necessary. Those functions are already defined in pg_proc.h.
16) make installcheck (test committs_off) fails on a server that has track_committs set to on. You should use an alternate output. I would recommend removing as well the _off suffix in the test name. Let's use commit_time. Also, it should be mentioned in parallel_schedule with a comment that this test should always run alone and never in parallel with other tests. Honestly, I also think that test_committs brings no additional value and results in duplication code between src/test/regress and contrib/test_committs. So I'd just rip it off. On top of that, I think that "SHOW track_committs" should be added in the list of commands run in the test. We actually want to check of commit time are really registered if the feature switch is on or off.

I am still planning to do more extensive tests, and study a bit more committs.c (with even more comments) as it is the core part of the feature. For now I'd recommend to hold on commit fire for this patch.
Regards,
--
Michael

Re: tracking commit timestamps

From
Petr Jelinek
Date:
Hi,

thanks for review.

On 01/11/14 05:45, Michael Paquier wrote:
> Now here are a couple of comments at code level, this code seems not
> enough baked for a commit:
> 1) The following renaming should be done:
> - pg_get_transaction_committime to pg_get_transaction_commit_time
> - pg_get_transaction_extradata to pg_get_transaction_extra_data
> - pg_get_transaction_committime_data to pg_get_transaction_commit_time_data
> - pg_get_latest_transaction_committime_data to
> pg_get_latest_transaction_commit_time_data

Makes sense.

> 3) General remark: committs is not a name suited IMO (ts for
> transaction??). What if the code is changed to use commit_time instead?
> This remark counts as well for the file names committs.c and committs.h,
> and for pg_committs.

The ts is for timestamp, tx would be shorthand for transaction. Looking 
at your remarks, it seems there is some general inconsistency with time 
vs timestamp in this patch, we should pick one and stick with it.

> 6) To be consistent with everything, shouldn't track_commit_timestamp be
> renamed to track_commit_time

(see above)

> 9) Hm?! "oldest commit time xid", no?
> -                                                "oldest running xid %u;
> %s",
> +                                                "oldest CommitTs xid:
> %u; oldest running xid %u; %s",

Again, timestamp vs time.

> 12) In committs_desc@committsdesc.c, isn't this block overthinking a bit:
> +       else
> +               appendStringInfo(buf, "UNKNOWN");
> It may be better to remove it, no?

Should be safe, indeed.

> 13) Isn't that 2014?
> + * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group

Hah, I forgot to update that (shows how long this patch has been waiting 
:) )

> 16) make installcheck (test committs_off) fails on a server that has
> track_committs set to on. You should use an alternate output. I would

Well, it is supposed to fail, that's the whole point, the output should 
be different depending on the value of the GUC.

> recommend removing as well the _off suffix in the test name. Let's use
> commit_time. Also, it should be mentioned in parallel_schedule with a
> comment that this test should always run alone and never in parallel
> with other tests. Honestly, I also think that test_committs brings no
> additional value and results in duplication code between
> src/test/regress and contrib/test_committs. So I'd just rip it off. On

Those tests are different though, one tests that the default (off) works 
as expected the contrib one tests that the feature when turned on works 
as expected. Since we can only set config values for contrib tests I 
don't see how else to do this, but I am open to suggestions.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 01/11/14 12:19, Petr Jelinek wrote:
> Hi,
>
> thanks for review.
>
> On 01/11/14 05:45, Michael Paquier wrote:
>> Now here are a couple of comments at code level, this code seems not
>> enough baked for a commit:
>> 1) The following renaming should be done:
>> - pg_get_transaction_committime to pg_get_transaction_commit_time
>> - pg_get_transaction_extradata to pg_get_transaction_extra_data
>> - pg_get_transaction_committime_data to
>> pg_get_transaction_commit_time_data
>> - pg_get_latest_transaction_committime_data to
>> pg_get_latest_transaction_commit_time_data
>
> Makes sense.
>

On second thought, maybe those should be pg_get_transaction_committs, 
pg_get_transaction_committs_data, etc.

For me the commit time thing feels problematic in the way I perceive it 
- I see commit time as a point in time, where I see commit timestamp (or 
committs for short) as something that can recorded. So I would prefer to 
stick with commit timestamp/committs.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Michael Paquier
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Sat, Nov 1, 2014 at 9:04 PM, Petr
Jelinek<span dir="ltr"><<a href="mailto:petr@2ndquadrant.com" target="_blank">petr@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">On01/11/14 12:19, Petr Jelinek wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"> Hi,<br /><br /> thanks for review.<br /><br /> On 01/11/14 05:45,
MichaelPaquier wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">Now here are a couple of comments at code level, this code seems not<br /> enough baked for a
commit:<br/> 1) The following renaming should be done:<br /> - pg_get_transaction_committime to
pg_get_transaction_commit_time<br/> - pg_get_transaction_extradata to pg_get_transaction_extra_data<br /> -
pg_get_transaction_committime_<u></u>datato<br /> pg_get_transaction_commit_<u></u>time_data<br /> -
pg_get_latest_transaction_<u></u>committime_datato<br /> pg_get_latest_transaction_<u></u>commit_time_data<br
/></blockquote><br/> Makes sense.<br /><br /></blockquote><br /></span> On second thought, maybe those should be
pg_get_transaction_committs,pg_get_transaction_committs_data, etc.<br /> For me the commit time thing feels problematic
inthe way I perceive it - I see commit time as a point in time, where I see commit timestamp (or committs for short) as
somethingthat can recorded. So I would prefer to stick with commit timestamp/committs.</blockquote></div>Hehe, I got
exactlythe opposite impression while reading the patch, but let's rely on your judgement for the namings. I am not the
onewriting this code.<br />-- <br /><div class="gmail_signature">Michael<br /></div></div></div> 

Re: tracking commit timestamps

From
Michael Paquier
Date:
On Sat, Nov 1, 2014 at 1:45 PM, Michael Paquier <michael.paquier@gmail.com> wrote: 
I am still planning to do more extensive tests, and study a bit more committs.c (with even more comments) as it is the core part of the feature.
More comments:
- Heikki already mentioned it, but after reading the code I see little point in having the extra field implementing like that in core for many reasons even if it is *just* 4 bytes:
1) It is untested and actually there is no direct use for it in core.
2) Pushing code that we know as dead is no good, that's a feature more or less defined as maybe-useful-but-we-are-not-sure-yet-what-to-do-with-it.
3) If you're going to re-use this API in BDR, which is a fork of Postgres. You'd better complete this API in BDR by yourself and not bother core with that.
For those reasons I think that this extra field should be ripped off from the patch.
- The API to get the commit timestamp is not that user-friendly, and I think it could really be improved, to something like that for example:
pg_get_commit_timestamp(from_xact xid, number_of_xacts int);
pg_get_commit_timestamp(from_xact xid);
pg_get_commit_timestamp(); or pg_get_latest_commit_timestamp();
from_xact to NULL means latest. number_of_xacts to NULL means 1.
Comment in line with the fact that extra field is well, not really useful.
Regards,
--
Michael

Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 01/11/14 14:00, Michael Paquier wrote:
>
> More comments:
> - Heikki already mentioned it, but after reading the code I see little
> point in having the extra field implementing like that in core for many
> reasons even if it is *just* 4 bytes:
> 1) It is untested and actually there is no direct use for it in core.
> 2) Pushing code that we know as dead is no good, that's a feature more
> or less defined as maybe-useful-but-we-are-not-sure-yet-what-to-do-with-it.
> 3) If you're going to re-use this API in BDR, which is a fork of
> Postgres. You'd better complete this API in BDR by yourself and not
> bother core with that.
> For those reasons I think that this extra field should be ripped off
> from the patch.

Well this is not BDR specific thing, the idea is that with logical 
replication, commit timestamp is not enough for conflict handling, you 
also need to have additional info in order to identify some types of 
conflicts conflicts (local update vs remote update for example). So the 
extradata field was meant as something that could be used to add the 
additional info to the xid.

But I see your point, I think solving this issue can be left to the 
replication identifier patch that is discussed in separate thread.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-01 13:45:44 +0900, Michael Paquier wrote:
> 14) I'd put the two checks in the reverse order:
> +       Assert(xid != InvalidTransactionId);
> +
> +       if (!commit_ts_enabled)
> +               return;

Please don't. The order is correct right now. Why you ask? This way the
correctness of the callsites is checked even when committs is
disabled. Which it'll likely be on the majority of developer setups. And
what's the upsite of changing the order? There's no difference in the
generated code in production builds and the overhead in assert enabled
ones is neglegible.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-01 22:00:40 +0900, Michael Paquier wrote:
> On Sat, Nov 1, 2014 at 1:45 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
> 
> > I am still planning to do more extensive tests, and study a bit more
> > committs.c (with even more comments) as it is the core part of the feature.
> >
> More comments:
> - Heikki already mentioned it, but after reading the code I see little
> point in having the extra field implementing like that in core for many
> reasons even if it is *just* 4 bytes:
> 1) It is untested and actually there is no direct use for it in core.

Meh. The whole feature is only there for extensions, not core.

> 2) Pushing code that we know as dead is no good, that's a feature more or
> less defined as maybe-useful-but-we-are-not-sure-yet-what-to-do-with-it.

Uh. It's not more/less dead than the whole of committs.

> 3) If you're going to re-use this API in BDR, which is a fork of Postgres.
> You'd better complete this API in BDR by yourself and not bother core with
> that.

I think that's a fundamentally wrong position. The only reason BDR isn't
purely stock postgres is that some features couldn't sanely be made work
without patches. I *hate* the fact that we had to do so. And I really
hope that we don't need any of the patches we have when building against
9.5.

So, now you might argue that the additional data is useless. But I think
that's just not thought far enough. If you think about it, in which
scenarios do you want to map xids to the commit timestamp? Primarily
that's going to be replication, right? One of the most obvious usecases
is allowing to detect/analyze/resolve conflicts in a multimaster setup,
right? To make sensible decisisons you'll often want to have more
information about the involved transactions. Makes sense so far?

Now, you might argue that could just be done with some table
transaction_metadata(xid DEFAULT txid_current(), meta, data). But that
has *significant* disadvantages: For one, it'll not work correctly once
subtransactions are used. Not good.  For another it has about a
magnitude higher overhead than the committs way.

And it's not like the the extra field is in any way bdr specific - even
if you actually want to store much more information about the
transaction than just the origin (which is what bdr does), you can use
it to correctly solve the subtransaction problem and refer to some
transaction metadata table.

> - The API to get the commit timestamp is not that user-friendly, and I
> think it could really be improved, to something like that for example:
> pg_get_commit_timestamp(from_xact xid, number_of_xacts int);

What'd be the point of this?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-01 14:41:02 +0100, Petr Jelinek wrote:
> On 01/11/14 14:00, Michael Paquier wrote:
> >
> >More comments:
> >- Heikki already mentioned it, but after reading the code I see little
> >point in having the extra field implementing like that in core for many
> >reasons even if it is *just* 4 bytes:
> >1) It is untested and actually there is no direct use for it in core.
> >2) Pushing code that we know as dead is no good, that's a feature more
> >or less defined as maybe-useful-but-we-are-not-sure-yet-what-to-do-with-it.
> >3) If you're going to re-use this API in BDR, which is a fork of
> >Postgres. You'd better complete this API in BDR by yourself and not
> >bother core with that.
> >For those reasons I think that this extra field should be ripped off
> >from the patch.
> 
> Well this is not BDR specific thing, the idea is that with logical
> replication, commit timestamp is not enough for conflict handling, you also
> need to have additional info in order to identify some types of conflicts
> conflicts (local update vs remote update for example). So the extradata
> field was meant as something that could be used to add the additional info
> to the xid.
>
> But I see your point, I think solving this issue can be left to the
> replication identifier patch that is discussed in separate thread.

For me this really hasn't anything directly to do with replication
identifiers, so delaying this decision doesn't make sense to me.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 01/11/14 18:44, Andres Freund wrote:
> On 2014-11-01 22:00:40 +0900, Michael Paquier wrote:
>> On Sat, Nov 1, 2014 at 1:45 PM, Michael Paquier <michael.paquier@gmail.com>
>> wrote:
>>
>>> I am still planning to do more extensive tests, and study a bit more
>>> committs.c (with even more comments) as it is the core part of the feature.
>>>
>> More comments:
>> - Heikki already mentioned it, but after reading the code I see little
>> point in having the extra field implementing like that in core for many
>> reasons even if it is *just* 4 bytes:
>> 1) It is untested and actually there is no direct use for it in core.
>
> Meh. The whole feature is only there for extensions, not core.
>
>> 2) Pushing code that we know as dead is no good, that's a feature more or
>> less defined as maybe-useful-but-we-are-not-sure-yet-what-to-do-with-it.
>
> Uh. It's not more/less dead than the whole of committs.
>
>> 3) If you're going to re-use this API in BDR, which is a fork of Postgres.
>> You'd better complete this API in BDR by yourself and not bother core with
>> that.
>
> I think that's a fundamentally wrong position. The only reason BDR isn't
> purely stock postgres is that some features couldn't sanely be made work
> without patches. I *hate* the fact that we had to do so. And I really
> hope that we don't need any of the patches we have when building against
> 9.5.
>
> So, now you might argue that the additional data is useless. But I think
> that's just not thought far enough. If you think about it, in which
> scenarios do you want to map xids to the commit timestamp? Primarily
> that's going to be replication, right? One of the most obvious usecases
> is allowing to detect/analyze/resolve conflicts in a multimaster setup,
> right? To make sensible decisisons you'll often want to have more
> information about the involved transactions. Makes sense so far?
>
> Now, you might argue that could just be done with some table
> transaction_metadata(xid DEFAULT txid_current(), meta, data). But that
> has *significant* disadvantages: For one, it'll not work correctly once
> subtransactions are used. Not good.  For another it has about a
> magnitude higher overhead than the committs way.
>
> And it's not like the the extra field is in any way bdr specific - even
> if you actually want to store much more information about the
> transaction than just the origin (which is what bdr does), you can use
> it to correctly solve the subtransaction problem and refer to some
> transaction metadata table.
>

Well, Michael has point that the extradata is pretty much useless 
currently, perhaps it would help to add the interface to set extradata?


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Jim Nasby
Date:
On 11/1/14, 8:41 AM, Petr Jelinek wrote:
> Well this is not BDR specific thing, the idea is that with logical replication, commit timestamp is not enough for
conflicthandling, you also need to have additional info in order to identify some types of conflicts conflicts (local
updatevs remote update for example). So the extradata field was meant as something that could be used to add the
additionalinfo to the xid.
 

Related to this... is there any way to deal with 2 transactions that commit in the same microsecond? It seems silly to
tryand handle that for every commit since it should be quite rare, but perhaps we could store the LSN as extradata if
wedetect a conflict?
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On 11/1/14, 8:41 AM, Petr Jelinek wrote:
> >Well this is not BDR specific thing, the idea is that with logical replication, commit timestamp is not enough for
conflicthandling, you also need to have additional info in order to identify some types of conflicts conflicts (local
updatevs remote update for example). So the extradata field was meant as something that could be used to add the
additionalinfo to the xid.
 
> 
> Related to this... is there any way to deal with 2 transactions that commit in the same microsecond? It seems silly
totry and handle that for every commit since it should be quite rare, but perhaps we could store the LSN as extradata
ifwe detect a conflict?
 

Well, two things.  One, LSN is 8 bytes and extradata (at least in this
patch when I last saw it) is only 4 bytes.  But secondly and more
important is that detecting a conflict is done in node B *after* node A
has recorded the transaction's commit time; there is no way to know at
commit time that there is going to be a conflict caused by that
transaction in the future.  (If there was a way to tell, you could just
as well not commit the transaction in the first place.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Peter Eisentraut
Date:
On 11/1/14 8:04 AM, Petr Jelinek wrote:
> On second thought, maybe those should be pg_get_transaction_committs,
> pg_get_transaction_committs_data, etc.

Please don't name anything "committs".  That looks like a misspelling of
something.

There is nothing wrong with

pg_get_transaction_commit_timestamp()

If you want to reduce the length, lose the "get".

> For me the commit time thing feels problematic in the way I perceive it
> - I see commit time as a point in time, where I see commit timestamp (or
> committs for short) as something that can recorded. So I would prefer to
> stick with commit timestamp/committs.

In PostgreSQL, it is pretty clearly established that time is hours,
minutes, seconds, and timestamp is years, months, days, hours, minutes,
seconds.  So unless this feature only records the hour, minute, and
second of a commit, it should be "timestamp".




Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 03/11/14 22:26, Peter Eisentraut wrote:
> On 11/1/14 8:04 AM, Petr Jelinek wrote:
>> On second thought, maybe those should be pg_get_transaction_committs,
>> pg_get_transaction_committs_data, etc.
>
> Please don't name anything "committs".  That looks like a misspelling of
> something.
>
> There is nothing wrong with
>
> pg_get_transaction_commit_timestamp()
>
> If you want to reduce the length, lose the "get".
>

I am fine with that, I only wonder if your definition of "anything" only 
concerns the SQL interfaces or also the internals.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Merlin Moncure
Date:
On Mon, Nov 3, 2014 at 3:26 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 11/1/14 8:04 AM, Petr Jelinek wrote:
>> On second thought, maybe those should be pg_get_transaction_committs,
>> pg_get_transaction_committs_data, etc.
>
> Please don't name anything "committs".  That looks like a misspelling of
> something.
>
> There is nothing wrong with
>
> pg_get_transaction_commit_timestamp()
>
> If you want to reduce the length, lose the "get".

+1: all non void returning functions 'get' something.

>> For me the commit time thing feels problematic in the way I perceive it
>> - I see commit time as a point in time, where I see commit timestamp (or
>> committs for short) as something that can recorded. So I would prefer to
>> stick with commit timestamp/committs.
>
> In PostgreSQL, it is pretty clearly established that time is hours,
> minutes, seconds, and timestamp is years, months, days, hours, minutes,
> seconds.  So unless this feature only records the hour, minute, and
> second of a commit, it should be "timestamp".

Elsewhere, for example, we have: "pg_last_xact_replay_timestamp()".
So, in keeping with that, maybe,

pg_xact_commit_timestamp(xid)

merlin



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
> Well, Michael has point that the extradata is pretty much useless currently,
> perhaps it would help to add the interface to set extradata?

Only accessible via C and useless aren't the same thing. But sure, add
it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Michael Paquier
Date:


On Sat, Nov 1, 2014 at 10:00 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
More comments:
 
I have done a couple of tests on my laptop with pgbench like that to generate a maximum of transaction commits:
$ pgbench --no-vacuum -f ~/Desktop/commit.sql -T 60 -c 24 -j 24
$ cat ~/Desktop/commit.sql
SELECT txid_current()
Here is an average of 5 runs:
- master: 49842.44
- GUC off, patched = 49688.71
- GUC on, patched = 49459.73
So there is little noise.

Here are also more comments about the code that I found while focusing on committs.c:
1) When the GUC is not enabled, and when the transaction ID provided is not in a correct range, a dummy value based on InvalidTransactionId (?!) is returned, like that:
+       /* Return empty if module not enabled */
+       if (!commit_ts_enabled)
+       {
+               if (ts)
+                       *ts = InvalidTransactionId;
+               if (data)
+                       *data = (CommitExtraData) 0;
+               return;
+       }
This leads to some incorrect results:
=# select pg_get_transaction_committime('1');
 pg_get_transaction_committime
-------------------------------
 2000-01-01 09:00:00+09
(1 row)
Or for example:
=# SELECT txid_current();
 txid_current
--------------
         1006
(1 row)
=# select pg_get_transaction_committime('1006');
 pg_get_transaction_committime
-------------------------------
 2014-11-04 14:54:37.589381+09
(1 row)
=# select pg_get_transaction_committime('1007');
 pg_get_transaction_committime
-------------------------------
 2000-01-01 09:00:00+09
(1 row)
=# SELECT txid_current();
 txid_current
--------------
         1007
(1 row)
And at other times error is not that helpful for user:
=# select pg_get_transaction_committime('10000');
ERROR:  XX000: could not access status of transaction 10000
DETAIL:  Could not read from file "pg_committs/0000" at offset 114688: Undefined error: 0.
LOCATION:  SlruReportIOError, slru.c:896
I think that you should simply return an error in TransactionIdGetCommitTsData when xid is not in the range supported. That will be more transparent for the user.
2) You may as well want to return a different error if the GUC track_commit_timestamps is disabled.
3) This comment should be updated in committs.c, we are not talking about CLOG here:
+/*
+ * Link to shared-memory data structures for CLOG control
+ */
4) Similarly, I think more comments should be put in here. It is OK to truncate the commit timestamp stuff similarly to CLOG to have a consistent status context available, but let's explain it.
         * multixacts; that will be done by the next checkpoint.
         */
        TruncateCLOG(frozenXID);
+       TruncateCommitTs(frozenXID)
5) Reading the code, TransactionTreeSetCommitTimestamp is always called with do_xlog = false, making that actually no timestamps are WAL'd... Hence WriteSetTimestampXlogRec is just dead code with the latest version of the patch. IMO, this should be always WAL-logged when track_commit_timestamp is on.
6) Shouldn't any value update of track_commit_timestamp be tracked in XLogReportParameters? That's thinking about making the commit timestamp available on standbys as well..
7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be useful for developers.
8) The redo and xlog routines of committs should be out in a dedicated file, like committsxlog.c or similar. The other rmgr do so, so let's be consistent.

Regards,
--
Michael

Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
> 5) Reading the code, TransactionTreeSetCommitTimestamp is always called
> with do_xlog = false, making that actually no timestamps are WAL'd... Hence
> WriteSetTimestampXlogRec is just dead code with the latest version of the
> patch. IMO, this should be always WAL-logged when track_commit_timestamp is
> on.

It's callable via a 'extern' function. So, I'd not consider it dead. And
the WAL logging is provided by xact.c's own WAL logging - it always does
the corresponding committs calls.

> 6) Shouldn't any value update of track_commit_timestamp be tracked in
> XLogReportParameters? That's thinking about making the commit timestamp
> available on standbys as well..

Yes, it should.

> 7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be
> useful for developers.

What do you mean by that? There's the corresponding rmgrdesc.c support I
think?

> 8) The redo and xlog routines of committs should be out in a dedicated
> file, like committsxlog.c or similar. The other rmgr do so, so let's be
> consistent.

Seems pointless to me. The file isn't that big and the other SLRUs don't
do it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Michael Paquier
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Tue, Nov 4, 2014 at 5:05 PM, Andres
Freund<span dir="ltr"><<a href="mailto:andres@2ndquadrant.com" target="_blank">andres@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">On2014-11-02 19:27:25 +0100, Petr Jelinek wrote:<br /> > Well, Michael has point that the extradata is
prettymuch useless currently,<br /> > perhaps it would help to add the interface to set extradata?<br /><br
/></span>Onlyaccessible via C and useless aren't the same thing. But sure, add<br /> it.<br /></blockquote></div>I'm
stillon a -1 for that. You mentioned that there is perhaps no reason to delay a decision on this matter, but IMO there
isno reason to rush either in doing something we may regret. And I am not the only one on this thread expressing
concernabout this extra data thingy.<br /><br /></div><div class="gmail_extra">If this extra data field is going to be
usedto identify from which node a commit comes from, then it is another feature than what is written on the subject of
thisthread. In this case let's discuss it in the thread dedicated to replication identifiers, or come up with an extra
patchonce the feature for commit timestamps is done.<br /></div><div class="gmail_extra">-- <br /><div
class="gmail_signature">Michael<br/></div></div></div> 

Re: tracking commit timestamps

From
Michael Paquier
Date:


On Tue, Nov 4, 2014 at 5:23 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
> 5) Reading the code, TransactionTreeSetCommitTimestamp is always called
> with do_xlog = false, making that actually no timestamps are WAL'd... Hence
> WriteSetTimestampXlogRec is just dead code with the latest version of the
> patch. IMO, this should be always WAL-logged when track_commit_timestamp is
> on.

It's callable via a 'extern' function. So, I'd not consider it dead. And
the WAL logging is provided by xact.c's own WAL logging - it always does
the corresponding committs calls.
The code path is unused. We'd better make the XLOG record mandatory if tracking is enabled, as this information is useful on standbys as well.
 
> 7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be
> useful for developers.

What do you mean by that? There's the corresponding rmgrdesc.c support I
think?
Oops sorry. I thought there was some big switch in pg_xlogdump when writing this comment. Yeah that's fine.
--
Michael

Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-04 17:29:04 +0900, Michael Paquier wrote:
> On Tue, Nov 4, 2014 at 5:23 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> 
> > On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
> > > 5) Reading the code, TransactionTreeSetCommitTimestamp is always called
> > > with do_xlog = false, making that actually no timestamps are WAL'd...
> > Hence
> > > WriteSetTimestampXlogRec is just dead code with the latest version of the
> > > patch. IMO, this should be always WAL-logged when track_commit_timestamp
> > is
> > > on.
> >
> > It's callable via a 'extern' function. So, I'd not consider it dead. And
> > the WAL logging is provided by xact.c's own WAL logging - it always does
> > the corresponding committs calls.
> >
> The code path is unused.

No. It is not. It can be called by extensions?

> We'd better make the XLOG record mandatory if
> tracking is enabled, as this information is useful on standbys as well.

Did you read what I wrote? To quote "And the WAL logging is provided by
xact.c's own WAL logging - it always does the corresponding committs
calls.".

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Michael Paquier wrote:

> I'm still on a -1 for that. You mentioned that there is perhaps no reason
> to delay a decision on this matter, but IMO there is no reason to rush
> either in doing something we may regret. And I am not the only one on this
> thread expressing concern about this extra data thingy.
> 
> If this extra data field is going to be used to identify from which node a
> commit comes from, then it is another feature than what is written on the
> subject of this thread. In this case let's discuss it in the thread
> dedicated to replication identifiers, or come up with an extra patch once
> the feature for commit timestamps is done.

Introducing the extra data field in a later patch would mean an on-disk
representation change, i.e. pg_upgrade trouble.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-04 10:01:00 -0300, Alvaro Herrera wrote:
> Michael Paquier wrote:
> 
> > I'm still on a -1 for that. You mentioned that there is perhaps no reason
> > to delay a decision on this matter, but IMO there is no reason to rush
> > either in doing something we may regret. And I am not the only one on this
> > thread expressing concern about this extra data thingy.
> > 
> > If this extra data field is going to be used to identify from which node a
> > commit comes from, then it is another feature than what is written on the
> > subject of this thread. In this case let's discuss it in the thread
> > dedicated to replication identifiers, or come up with an extra patch once
> > the feature for commit timestamps is done.
> 
> Introducing the extra data field in a later patch would mean an on-disk
> representation change, i.e. pg_upgrade trouble.

It's also simply not necessarily related to replication
identifiers. This is useful whether replication identifiers make it in
or not. It allows to implement something like replication identifiers
outside of core (albeit with a hefty overhead in OLTP workloads).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 04/11/14 09:05, Andres Freund wrote:
> On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
>> Well, Michael has point that the extradata is pretty much useless currently,
>> perhaps it would help to add the interface to set extradata?
>
> Only accessible via C and useless aren't the same thing. But sure, add
> it.
>

I actually meant nicer C api - the one that will make it possible to say 
for this transaction, use this extradata (or for all transactions from 
now on done by this backend use this extradata), instead of current API 
where you have to overwrite what RecordCommit already wrote.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 04/11/14 09:19, Michael Paquier wrote:
> On Sat, Nov 1, 2014 at 10:00 PM, Michael Paquier
> <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:
>
>     More comments:
>
> Here are also more comments about the code that I found while focusing
> on committs.c:
> 1) When the GUC is not enabled, and when the transaction ID provided is
> not in a correct range, a dummy value based on InvalidTransactionId (?!)
> is returned, like that:
> +       /* Return empty if module not enabled */
> +       if (!commit_ts_enabled)
> +       {
> +               if (ts)
> +                       *ts = InvalidTransactionId;
> +               if (data)
> +                       *data = (CommitExtraData) 0;
> +               return;
> +       }
> This leads to some incorrect results:
> =# select pg_get_transaction_committime('1');
>   pg_get_transaction_committime
> -------------------------------
>   2000-01-01 09:00:00+09
> (1 row)

Oh, I had this on my TODO and somehow forgot about it, I am leaning 
towards throwing an error when calling any committs "get" function while 
the tracking is disabled.

> Or for example:
> =# SELECT txid_current();
>   txid_current
> --------------
>           1006
> (1 row)
> =# select pg_get_transaction_committime('1006');
>   pg_get_transaction_committime
> -------------------------------
>   2014-11-04 14:54:37.589381+09
> (1 row)
> =# select pg_get_transaction_committime('1007');
>   pg_get_transaction_committime
> -------------------------------
>   2000-01-01 09:00:00+09
> (1 row)
> =# SELECT txid_current();
>   txid_current
> --------------
>           1007
> (1 row)
> And at other times error is not that helpful for user:
> =# select pg_get_transaction_committime('10000');
> ERROR:  XX000: could not access status of transaction 10000
> DETAIL:  Could not read from file "pg_committs/0000" at offset 114688:
> Undefined error: 0.
> LOCATION:  SlruReportIOError, slru.c:896
> I think that you should simply return an error in
> TransactionIdGetCommitTsData when xid is not in the range supported.
> That will be more transparent for the user.

Agreed.

> 5) Reading the code, TransactionTreeSetCommitTimestamp is always called
> with do_xlog = false, making that actually no timestamps are WAL'd...
> Hence WriteSetTimestampXlogRec is just dead code with the latest version
> of the patch. IMO, this should be always WAL-logged when
> track_commit_timestamp is on.

As Andres explained this is always WAL-logged when called by current 
caller so we don't want it to be double logged, so that's why do_xlog = 
false, but when extension will need call it, it will most likely need 
do_xlog = true.

> 8) The redo and xlog routines of committs should be out in a dedicated
> file, like committsxlog.c or similar. The other rmgr do so, so let's be
> consistent.
>

Most actually don't AFAICS.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Peter Eisentraut
Date:
On 11/3/14 5:17 PM, Petr Jelinek wrote:
>> Please don't name anything "committs".  That looks like a misspelling of
>> something.
>>
>> There is nothing wrong with
>>
>> pg_get_transaction_commit_timestamp()
>>
>> If you want to reduce the length, lose the "get".
>>
> 
> I am fine with that, I only wonder if your definition of "anything" only
> concerns the SQL interfaces or also the internals.

I'd be fine with commit_ts for internals, but not committs.

One day, you'll need a function or data structure that works with
multiple of these, and then you'll really be in naming trouble. ;-)




Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 04/11/14 09:25, Michael Paquier wrote:
> On Tue, Nov 4, 2014 at 5:05 PM, Andres Freund <andres@2ndquadrant.com
> <mailto:andres@2ndquadrant.com>> wrote:
>
>     On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
>     > Well, Michael has point that the extradata is pretty much useless currently,
>     > perhaps it would help to add the interface to set extradata?
>
>     Only accessible via C and useless aren't the same thing. But sure, add
>     it.
>
> I'm still on a -1 for that. You mentioned that there is perhaps no
> reason to delay a decision on this matter, but IMO there is no reason to
> rush either in doing something we may regret. And I am not the only one
> on this thread expressing concern about this extra data thingy.
>
> If this extra data field is going to be used to identify from which node
> a commit comes from, then it is another feature than what is written on
> the subject of this thread. In this case let's discuss it in the thread
> dedicated to replication identifiers, or come up with an extra patch
> once the feature for commit timestamps is done.

The issue as I see it is that both of those features are closely related 
and just one without the other has very limited use. What I learned from 
working on UDR is that for conflict handling, I was actually missing the 
extradata more than the timestamp itself - in other words I have 
extension for 9.4 where I have use for this API already, so the argument 
about dead code or forks or whatever does not really hold.

The other problem is that if we add extradata later we will either break 
upgrade-ability of will have to write essentially same code again which 
will store just the extradata instead of the timestamp, I don't really 
like either of those options to be honest.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 04/11/14 22:20, Peter Eisentraut wrote:
> On 11/3/14 5:17 PM, Petr Jelinek wrote:
>>> Please don't name anything "committs".  That looks like a misspelling of
>>> something.
>>>
>>> There is nothing wrong with
>>>
>>> pg_get_transaction_commit_timestamp()
>>>
>>> If you want to reduce the length, lose the "get".
>>>
>>
>> I am fine with that, I only wonder if your definition of "anything" only
>> concerns the SQL interfaces or also the internals.
>
> I'd be fine with commit_ts for internals, but not committs.
>
> One day, you'll need a function or data structure that works with
> multiple of these, and then you'll really be in naming trouble. ;-)
>

Hmm we use CommitTs in interfaces that uses CamelCase naming so I guess 
commit_ts is indeed natural expansion of that.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Michael Paquier
Date:
On Tue, Nov 4, 2014 at 10:01 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Michael Paquier wrote:

> I'm still on a -1 for that. You mentioned that there is perhaps no reason
> to delay a decision on this matter, but IMO there is no reason to rush
> either in doing something we may regret. And I am not the only one on this
> thread expressing concern about this extra data thingy.
>
> If this extra data field is going to be used to identify from which node a
> commit comes from, then it is another feature than what is written on the
> subject of this thread. In this case let's discuss it in the thread
> dedicated to replication identifiers, or come up with an extra patch once
> the feature for commit timestamps is done.

Introducing the extra data field in a later patch would mean an on-disk
representation change, i.e. pg_upgrade trouble.
Then why especially 4 bytes for the extra field? Why not 8 or 16?
--
Michael

Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-05 08:57:07 +0900, Michael Paquier wrote:
> On Tue, Nov 4, 2014 at 10:01 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> 
> > Michael Paquier wrote:
> >
> > > I'm still on a -1 for that. You mentioned that there is perhaps no reason
> > > to delay a decision on this matter, but IMO there is no reason to rush
> > > either in doing something we may regret. And I am not the only one on
> > this
> > > thread expressing concern about this extra data thingy.
> > >
> > > If this extra data field is going to be used to identify from which node
> > a
> > > commit comes from, then it is another feature than what is written on the
> > > subject of this thread. In this case let's discuss it in the thread
> > > dedicated to replication identifiers, or come up with an extra patch once
> > > the feature for commit timestamps is done.
> >
> > Introducing the extra data field in a later patch would mean an on-disk
> > representation change, i.e. pg_upgrade trouble.
> 
> Then why especially 4 bytes for the extra field? Why not 8 or 16?

It's sufficiently long that you can build infrastructure to storing more
transaction metadata data ontop. I could live making it 8 bytes, but I
don't see a clear advantage.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Jim Nasby
Date:
On 11/3/14, 2:36 PM, Alvaro Herrera wrote:
> Jim Nasby wrote:
>> On 11/1/14, 8:41 AM, Petr Jelinek wrote:
>>> Well this is not BDR specific thing, the idea is that with logical replication, commit timestamp is not enough for
conflicthandling, you also need to have additional info in order to identify some types of conflicts conflicts (local
updatevs remote update for example). So the extradata field was meant as something that could be used to add the
additionalinfo to the xid.
 
>>
>> Related to this... is there any way to deal with 2 transactions that commit in the same microsecond? It seems silly
totry and handle that for every commit since it should be quite rare, but perhaps we could store the LSN as extradata
ifwe detect a conflict?
 
>
> Well, two things.  One, LSN is 8 bytes and extradata (at least in this
> patch when I last saw it) is only 4 bytes.  But secondly and more
> important is that detecting a conflict is done in node B *after* node A
> has recorded the transaction's commit time; there is no way to know at
> commit time that there is going to be a conflict caused by that
> transaction in the future.  (If there was a way to tell, you could just
> as well not commit the transaction in the first place.)

I'm worried about 2 commits in the same microsecond on the same system, not on 2 different systems. Or, put another
way,if we're going to expose this I think it should also provide a guaranteed unique commit ordering for a single
cluster.Presumably, this shouldn't be that hard since we do know the exact order in which things committed.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: tracking commit timestamps

From
Anssi Kääriäinen
Date:
On Tue, 2014-11-04 at 23:43 -0600, Jim Nasby wrote:

> I'm worried about 2 commits in the same microsecond on the same
> system, not on 2 different systems. Or, put another way, if we're
> going to expose this I think it should also provide a guaranteed
> unique commit ordering for a single cluster. Presumably, this
> shouldn't be that hard since we do know the exact order in which
> things committed.

Addition of LSN when the timestamps for two transactions are exactly
same isn't enough. There isn't any guarantee that a later commit gets a
later timestamp than an earlier commit.

In addition, I wonder if this feature would be misused. Record
transaction ids to a table to find out commit order (use case could be
storing historical row versions for example). Do a dump and restore on
another cluster, and all the transaction ids are completely meaningless
to the system.

Having the ability to record commit order into an audit table would be
extremely welcome, but as is, this feature doesn't provide it.
- Anssi





Re: tracking commit timestamps

From
Michael Paquier
Date:


On Wed, Nov 5, 2014 at 5:24 PM, Anssi Kääriäinen <anssi.kaariainen@thl.fi> wrote:
On Tue, 2014-11-04 at 23:43 -0600, Jim Nasby wrote:

> I'm worried about 2 commits in the same microsecond on the same
> system, not on 2 different systems. Or, put another way, if we're
> going to expose this I think it should also provide a guaranteed
> unique commit ordering for a single cluster. Presumably, this
> shouldn't be that hard since we do know the exact order in which
> things committed.

Addition of LSN when the timestamps for two transactions are exactly
same isn't enough. There isn't any guarantee that a later commit gets a
later timestamp than an earlier commit.
True if WAL record ID is not globally consistent. Two-level commit ordering can be done with (timestamp or LSN, nodeID). At equal timestamp, we could say as well that the node with the lowest systemID wins for example. That's not something
 
In addition, I wonder if this feature would be misused. Record
transaction ids to a table to find out commit order (use case could be
storing historical row versions for example). Do a dump and restore on
another cluster, and all the transaction ids are completely meaningless
to the system.
I think you are forgetting the fact to be able to take a consistent dump using an exported snapshot. In this case the commit order may not be that meaningless..
 
Having the ability to record commit order into an audit table would be
extremely welcome, but as is, this feature doesn't provide it.
That's something that can actually be achieved with this feature if the SQL interface is able to query all the timestamps in a xid range with for example a background worker that tracks this data periodically. Now the thing is as well: how much timestamp history do we want to keep? The patch truncating SLRU files with frozenID may cover a sufficient range...
--
Michael

Re: tracking commit timestamps

From
Jim Nasby
Date:

On 11/5/14, 6:10 AM, Michael Paquier wrote:
>     In addition, I wonder if this feature would be misused. Record
>     transaction ids to a table to find out commit order (use case could be
>     storing historical row versions for example). Do a dump and restore on
>     another cluster, and all the transaction ids are completely meaningless
>     to the system.
>
> I think you are forgetting the fact to be able to take a consistent dump using an exported snapshot. In this case the
commitorder may not be that meaningless..
 

Anssi's point is that you can't use xmin because it can change, but I think anyone working with this feature would
understandthat.
 

>     Having the ability to record commit order into an audit table would be
>     extremely welcome, but as is, this feature doesn't provide it.
>
> That's something that can actually be achieved with this feature if the SQL interface is able to query all the
timestampsin a xid range with for example a background worker that tracks this data periodically. Now the thing is as
well:how much timestamp history do we want to keep? The patch truncating SLRU files with frozenID may cover a
sufficientrange...
 

Except that commit time is not guaranteed unique *even on a single system*. That's my whole point. If we're going to
botherwith all the commit time machinery it seems really silly to provide a way to uniquely order every commit.
 

Clearly trying to uniquely order commits across multiple systems is a far larger problem, and I'm not suggesting we
attemptthat. But for a single system AIUI all we need to do is expose the LSN of each commit record and that will give
youthe exact and unique order in which transactions committed.
 

This isn't a hypothetical feature either; if we had this, logical replication systems wouldn't have to try and fake
thisvia batches. You could actually recreate exactly what data was visible at what time to all transactions, not just
repeatableread ones (as long as you kept snapshot data as well, which isn't hard).
 

As for how much data to keep, if you have a process that's doing something to record this information permanently all
itneeds to do is keep an old enough snapshot around. That's not that hard to do, even from user space.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-05 10:23:15 -0600, Jim Nasby wrote:
> 
> 
> On 11/5/14, 6:10 AM, Michael Paquier wrote:
> >    In addition, I wonder if this feature would be misused. Record
> >    transaction ids to a table to find out commit order (use case could be
> >    storing historical row versions for example). Do a dump and restore on
> >    another cluster, and all the transaction ids are completely meaningless
> >    to the system.
> >
> >I think you are forgetting the fact to be able to take a consistent dump using an exported snapshot. In this case
thecommit order may not be that meaningless..
 
> 
> Anssi's point is that you can't use xmin because it can change, but I think anyone working with this feature would
understandthat.
 
>
> >    Having the ability to record commit order into an audit table would be
> >    extremely welcome, but as is, this feature doesn't provide it.
> >
> >That's something that can actually be achieved with this feature if
> >the SQL interface is able to query all the timestamps in a xid range
> >with for example a background worker that tracks this data
> >periodically. Now the thing is as well: how much timestamp history do
> >we want to keep? The patch truncating SLRU files with frozenID may
> >cover a sufficient range...


> Except that commit time is not guaranteed unique *even on a single
> system*. That's my whole point. If we're going to bother with all the
> commit time machinery it seems really silly to provide a way to
> uniquely order every commit.

Well. I think that's the misunderstanding here. That's absolutely not
what committs is supposed to be used for. For the replication stream
you'd hopefully use logical decoding. That gives you the transaction
data exactly in commit order.

> Clearly trying to uniquely order commits across multiple systems is a
> far larger problem, and I'm not suggesting we attempt that. But for a
> single system AIUI all we need to do is expose the LSN of each commit
> record and that will give you the exact and unique order in which
> transactions committed.

I don't think that's something you should attempt. That's what logical
decoding is for. Hence I see little point in exposing the LSN that way.

Where I think committs is useful is a method for analyzing and resolving
conflicts between multiple systems. In that case you likely can't use
the LSN for anything as it won't be very meaningful. If you get
conflicts below the accuracy of the timestamps you better use another
deterministic method of resolving them - BDR e.g. compares the system
identifier, timeline id, database oid, and a user defined name. While
enforcing that those aren't the same between systems.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Jim Nasby
Date:
On 11/5/14, 10:30 AM, Andres Freund wrote:
>> Except that commit time is not guaranteed unique *even on a single
>> >system*. That's my whole point. If we're going to bother with all the
>> >commit time machinery it seems really silly to provide a way to
>> >uniquely order every commit.
> Well. I think that's the misunderstanding here. That's absolutely not
> what committs is supposed to be used for. For the replication stream
> you'd hopefully use logical decoding. That gives you the transaction
> data exactly in commit order.

So presumably you'd want to use logical decoding to insert into a table with a sequence on it, or similar?

I agree, that sounds like a better way to handle this. I think it's worth mentioning in the docs for commit_ts, because
peopleWILL mistakenly try and use it to determine commit ordering.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-05 10:34:40 -0600, Jim Nasby wrote:
> On 11/5/14, 10:30 AM, Andres Freund wrote:
> >>Except that commit time is not guaranteed unique *even on a single
> >>>system*. That's my whole point. If we're going to bother with all the
> >>>commit time machinery it seems really silly to provide a way to
> >>>uniquely order every commit.

> >Well. I think that's the misunderstanding here. That's absolutely not
> >what committs is supposed to be used for. For the replication stream
> >you'd hopefully use logical decoding. That gives you the transaction
> >data exactly in commit order.
> 
> So presumably you'd want to use logical decoding to insert into a
> table with a sequence on it, or similar?

I'm not following. I'd use logical decoding to replicate the data to
another system, thereby guaranteeing its done in commit order. Then,
when applying the data on the other side, I can detect/resolve some
forms of conflicts by looking at the timestamps of rows via committs.

> I agree, that sounds like a better way to handle this. I think it's
> worth mentioning in the docs for commit_ts, because people WILL
> mistakenly try and use it to determine commit ordering.

Ok, sounds sensible.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Kevin Grittner
Date:
Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:

> for a single system AIUI all we need to do is expose the LSN of
> each commit record and that will give you the exact and unique
> order in which transactions committed.
>
> This isn't a hypothetical feature either; if we had this,
> logical replication systems wouldn't have to try and fake this
> via batches. You could actually recreate exactly what data was
> visible at what time to all transactions, not just repeatable
> read ones (as long as you kept snapshot data as well, which isn't
> hard).

Well, that not entirely true for serializable transactions; there
are points in time where reading the committed state could cause a
transaction to roll back[1] -- either a writing transaction which
would make that visible state inconsistent with the later committed
state or the reading transaction if it views something which is not
(yet) consistent.

That's not to say that this feature is a bad idea; part of the
serializable implementation itself depends on being able to
accurately determine commit order, and this feature could allow
that to work more efficiently.  I'm saying that, like hot standby,
a replicated database could not provide truly serializable
transactions (even read only ones) without something else in
addition to this.  We've discussed various ways of doing that.
Perhaps the most promising is to include in the stream some
indication of which points in the transaction stream are safe for a
serializable transaction to read.  If there's a way to implement
commit order recording such that a two-state flag could be
associated with each commit, I think it could be made to work for
serializable transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions



Re: tracking commit timestamps

From
Steve Singer
Date:
On 11/05/2014 11:23 AM, Jim Nasby wrote:
>
>
> Except that commit time is not guaranteed unique *even on a single 
> system*. That's my whole point. If we're going to bother with all the 
> commit time machinery it seems really silly to provide a way to 
> uniquely order every commit.
>
> Clearly trying to uniquely order commits across multiple systems is a 
> far larger problem, and I'm not suggesting we attempt that. But for a 
> single system AIUI all we need to do is expose the LSN of each commit 
> record and that will give you the exact and unique order in which 
> transactions committed.
>
> This isn't a hypothetical feature either; if we had this, logical 
> replication systems wouldn't have to try and fake this via batches. 
> You could actually recreate exactly what data was visible at what time 
> to all transactions, not just repeatable read ones (as long as you 
> kept snapshot data as well, which isn't hard).
>
> As for how much data to keep, if you have a process that's doing 
> something to record this information permanently all it needs to do is 
> keep an old enough snapshot around. That's not that hard to do, even 
> from user space.

+1 for this.

It isn't just 'replication' systems that have a need for getting the 
commit order of transactions on a single system.  I have a application 
(not slony) where we want to query a table but order the output based on 
the transaction commit order of when the insert into the table was done 
(think of a queue). I'm not replicating the output but passing the data 
to other applications for further processing.  If I just had the commit 
timestamp I would need to put in some other condition to break ties in a 
consistent way.  I think being able to get an ordering by commit LSN is 
what I really want in this case not the timestamp.

Logical decoding is one solution to this (that I was considering) but 
being able to do something like
select * FROM event_log order by commit_id would be a lot simpler.







Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-05 17:17:05 -0500, Steve Singer wrote:
> It isn't just 'replication' systems that have a need for getting the commit
> order of transactions on a single system.  I have a application (not slony)
> where we want to query a table but order the output based on the transaction
> commit order of when the insert into the table was done (think of a queue).
> I'm not replicating the output but passing the data to other applications
> for further processing.  If I just had the commit timestamp I would need to
> put in some other condition to break ties in a consistent way.  I think
> being able to get an ordering by commit LSN is what I really want in this
> case not the timestamp.
> 
> Logical decoding is one solution to this (that I was considering) but being
> able to do something like
> select * FROM event_log order by commit_id would be a lot simpler.

Imo that's essentially a different feature. What you essentially would
need here is a 'commit sequence number' - but no timestamps. And
probably to be useful that number has to be 8 bytes in itself.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Steve Singer
Date:
On 11/05/2014 05:43 PM, Andres Freund wrote:
> On 2014-11-05 17:17:05 -0500, Steve Singer wrote:
> Imo that's essentially a different feature. What you essentially would
> need here is a 'commit sequence number' - but no timestamps. And
> probably to be useful that number has to be 8 bytes in itself.

I think this gets to the heart of some of the differing views people 
have expressed on this patch

Is this patch supposed to:

A)  Add commit timestamp tracking but nothing more

B) Add infrastructure to store commit timestamps and provide a facility 
for storing additional bits of data extensions might want to be 
associated with the commit

C).  Add commit timestamps and node identifiers to commits

If the answer is (A) then I can see why some people are objecting to 
adding extradata.    If the answer is (B) then it's fair to ask how well 
does this patch handle various types of things people might want to 
attach to the commit record (such as the LSN).   I think the problem is 
that once you start providing a facility extensions can use to store 
data along with the commit record then being restricted to 4 or 8 bytes 
is very limiting.  It also doesn't allow you to load two extensions at 
once on a system.   You wouldn't be able to have both the  
'steve_commit_order' extension and BDR installed at the same time.  I 
don't think this patch does a very good job at (B) but It wasn't 
intended to.

If what we are really doing is C, and just calling the space 'extradata' 
until we get the logical identifier stuff in and then we are going to 
rename extradata  to nodeid then we should say so.  If we are really 
concerned about the pg_upgrade impact of expanding the record later then 
maybe we should add 4 bytes of padding to the CommitTimeStampEntry now 
and but leave the manipulating the node id until later.

Steve




> Greetings,
>
> Andres Freund
>




Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-05 19:31:52 -0500, Steve Singer wrote:
> On 11/05/2014 05:43 PM, Andres Freund wrote:
> >On 2014-11-05 17:17:05 -0500, Steve Singer wrote:
> >Imo that's essentially a different feature. What you essentially would
> >need here is a 'commit sequence number' - but no timestamps. And
> >probably to be useful that number has to be 8 bytes in itself.
> 
> I think this gets to the heart of some of the differing views people have
> expressed on this patch

I think it's actually besides the heart...

> Is this patch supposed to:
> 
> A)  Add commit timestamp tracking but nothing more
> 
> B) Add infrastructure to store commit timestamps and provide a facility for
> storing additional bits of data extensions might want to be associated with
> the commit
> 
> C).  Add commit timestamps and node identifiers to commits

> If the answer is (A) then I can see why some people are objecting to adding
> extradata.    If the answer is (B) then it's fair to ask how well does this
> patch handle various types of things people might want to attach to the
> commit record (such as the LSN).

I think there's a mistake exactly here. The LSN of the commit isn't just
some extra information about the commit. You can't say 'here, also
attach this piece of information'. Instead you need special case code in
xact.c to add it. Thus prohibiting that space to be used for something
else.

> I think the problem is that once you
> start providing a facility extensions can use to store data along with the
> commit record then being restricted to 4 or 8 bytes is very limiting.

Well, you can easily use those 4/8 bytes to start adding more data to
the transaction. By referring to some table with transaction metadata
for example.

> It also doesn't allow you to load two extensions at once on a system.
> You wouldn't be able to have both the 'steve_commit_order' extension
> and BDR installed at the same time.  I don't think this patch does a
> very good job at (B) but It wasn't intended to.

Well, I don't agree that steve_commit_order makes much sense in this
context. But there actually is a real problem here, namely that there's
no namespacing in those bytes. I'd be ok with saying that we split the
extradata in for bytes for the namespace and four for the actual
data. That's roughly how it works for advisory locks already.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Craig Ringer
Date:
On 11/01/2014 09:00 PM, Michael Paquier wrote:
> 1) It is untested and actually there is no direct use for it in core.
> 2) Pushing code that we know as dead is no good, that's a feature more
> or less defined as maybe-useful-but-we-are-not-sure-yet-what-to-do-with-it.
> 3) If you're going to re-use this API in BDR, which is a fork of
> Postgres.

I would like to emphasise that BDR is not really a fork at all, no more
than any big topic branch would be a fork.

BDR is two major parts:

- A collection of patches to core (commit timestamps/extradata, sequence
AM, replication identifiers, logical decoding, DDL deparse, event
triggers, etc). These are being progressively submitted to core.
maintained as multiple feature branches plus a merged version; and

- An extension that uses core features and, where necessary, the
additions to core to implement bi-directional logical replication.

Because of the time scales involved in getting things into core it's
been necessary to *temporarily* get the 9.4-based feature branch into
wider use so that it can be used to run the BDR extension, but if we can
get the required features into core that need will go away.

Event triggers and logical decoding were already merged in 9.4.

If we can get things like commit timestamps, commit extradata / logical
replication identifiers, the sequence access method, etc merged in 9.5
then it should be possible to do away with the need for the patches to
core entirely and run BDR on top of stock 9.5. I'd be delighted if that
were possible, as doing away with the patched 9.4 would get rid of a
great deal of work and frustration on my part.

Note that the BDR extension its self is PostgreSQL-licensed. Externally
maintained extensions have been bought in-core before. It's a lot of
code though, and I can't imagine that being a quick process.



> You'd better complete this API in BDR by yourself and not
> bother core with that.

This argument would've prevented the inclusion of logical decoding,
which is rapidly becoming the headline feature for 9.4, or at least
shortly behind jsonb. Slony is being adapted to use it, multiple people
are working on auditing systems based on it, and AFAIK EDB's xDB is
being adapted to take advantage of it too.

As development gets more complex and people attempt bigger features, the
One Big Patch that adds a feature and an in-core user of the feature is
not going to be a viable approach all the time. In my view it's already
well past that, and some recent patches (like RLS) really should've been
split up into patch series.

If we want to avoid unreviewable monster-patches it will, IMO, be
necessary to have progressive, iterative enhancement. That may sometimes
mean that there's code in core that's only used by future
yet-to-be-merged patches and/or by extensions.

Of course its desirable to have an in-tree user of the code wherever
possible/practical - but sometimes it may *not* be possible or
practical. It seems to me like the benefits of committing work in
smaller, more reviewable chunks outweigh the benefits of merging
multiple related but separate changes just so everything can have an
immediate in-tree user.

That's not to say that extradata must remain glued to commit timestamps.
It might make more sense as a separate patch with an API to allow
extensions to manipulate it directly, plus a dummy extension showing how
it works, like we do with various hooks and with APIs like FDWs.
However, just like the various hooks that we have, it *does* make sense
to have something in-core that has no "real world" in-core users.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 06/11/14 08:50, Andres Freund wrote:
> On 2014-11-05 19:31:52 -0500, Steve Singer wrote:
>> It also doesn't allow you to load two extensions at once on a system.
>> You wouldn't be able to have both the 'steve_commit_order' extension
>> and BDR installed at the same time.  I don't think this patch does a
>> very good job at (B) but It wasn't intended to.
>
> Well, I don't agree that steve_commit_order makes much sense in this
> context. But there actually is a real problem here, namely that there's
> no namespacing in those bytes. I'd be ok with saying that we split the
> extradata in for bytes for the namespace and four for the actual
> data. That's roughly how it works for advisory locks already.
>

I am not sure how would this solve problem of 2 extensions using the 
extradata given that there can be only 1 record per txid anyway.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 06/11/14 01:31, Steve Singer wrote:
> On 11/05/2014 05:43 PM, Andres Freund wrote:
>
> Is this patch supposed to:
>
> A)  Add commit timestamp tracking but nothing more
>
> B) Add infrastructure to store commit timestamps and provide a facility
> for storing additional bits of data extensions might want to be
> associated with the commit
>
> C).  Add commit timestamps and node identifiers to commits
>
> If the answer is (A) then I can see why some people are objecting to
> adding extradata.    If the answer is (B) then it's fair to ask how well
> does this patch handle various types of things people might want to
> attach to the commit record (such as the LSN).   I think the problem is
> that once you start providing a facility extensions can use to store
> data along with the commit record then being restricted to 4 or 8 bytes
> is very limiting.  It also doesn't allow you to load two extensions at
> once on a system.   You wouldn't be able to have both the
> 'steve_commit_order' extension and BDR installed at the same time.  I
> don't think this patch does a very good job at (B) but It wasn't
> intended to.
>

I would love to have (B) but I don't think that's realistic, at least 
not in the extent some people on this thread would like. I mean you can 
already do (B) by using table, it just isn't that great when it comes to 
performance of that solution.

This patch is aimed to do limited version of (B) where you don't have 
dynamic record for storing whatever you might desire but on the upside 
the performance is good. And yes so far this look more like we are 
actually doing (C) since main purpose of the patch is enabling conflict 
detection and resolving of those conflicts, which is useful in many 
replication scenarios that are not limited to the classical multi-master 
setup.

> If what we are really doing is C, and just calling the space 'extradata'
> until we get the logical identifier stuff in and then we are going to
> rename extradata  to nodeid then we should say so.  If we are really
> concerned about the pg_upgrade impact of expanding the record later then
> maybe we should add 4 bytes of padding to the CommitTimeStampEntry now
> and but leave the manipulating the node id until later.
>

This might not be bad idea. I don't see the extradata being useful for 
multiple extensions at the same time given that there is single record 
per txid unless we would enforce some kind of limitation that extension 
can only set the extradata for txids produced by that extension.

The namespacing idea that Andres has would again work fine for various 
replication solutions as it would make it easier for them to coexist but 
it would still not work for your 'steve_commit_order' (which I also 
think should be done differently anyway).

In general I do see this patch to be similar in purpose to what we did 
with replica triggers or logical decoding, those features also didn't 
really have in-core use, were optional and enabled us to take step 
forward with replication and had some side uses besides replication just 
like commit timestamps do.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-07 17:54:32 +0100, Petr Jelinek wrote:
> On 06/11/14 08:50, Andres Freund wrote:
> >On 2014-11-05 19:31:52 -0500, Steve Singer wrote:
> >>It also doesn't allow you to load two extensions at once on a system.
> >>You wouldn't be able to have both the 'steve_commit_order' extension
> >>and BDR installed at the same time.  I don't think this patch does a
> >>very good job at (B) but It wasn't intended to.
> >
> >Well, I don't agree that steve_commit_order makes much sense in this
> >context. But there actually is a real problem here, namely that there's
> >no namespacing in those bytes. I'd be ok with saying that we split the
> >extradata in for bytes for the namespace and four for the actual
> >data. That's roughly how it works for advisory locks already.
> >
> 
> I am not sure how would this solve problem of 2 extensions using the
> extradata given that there can be only 1 record per txid anyway.

It'd help you to detect problems.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Robert Haas
Date:
> On Nov 5, 2014, at 7:31 PM, Steve Singer <steve@ssinger.info> wrote:
>> On 11/05/2014 05:43 PM, Andres Freund wrote:
>> On 2014-11-05 17:17:05 -0500, Steve Singer wrote:
>> Imo that's essentially a different feature. What you essentially would
>> need here is a 'commit sequence number' - but no timestamps. And
>> probably to be useful that number has to be 8 bytes in itself.
>
> I think this gets to the heart of some of the differing views people have expressed on this patch
>
> Is this patch supposed to:
>
> A)  Add commit timestamp tracking but nothing more
>
> B) Add infrastructure to store commit timestamps and provide a facility for storing additional bits of data
extensionsmight want to be associated with the commit 
>
> C).  Add commit timestamps and node identifiers to commits

Well put.

I think the authors of this patch are suffering from a certain amount of myopia.  Commit timestamps are useful, but so
arecommit LSNs, and it makes little sense to me to suppose that we should have two different systems for those
closely-relatedneeds. 

Like Andres, I think B is impractical, so let's just be honest and admit that C is what we're really doing. But let's
addLSNs so the people who want that can be happy too. 

...Robert


Re: tracking commit timestamps

From
Robert Haas
Date:
On Nov 4, 2014, at 4:20 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 11/3/14 5:17 PM, Petr Jelinek wrote:
>>> Please don't name anything "committs".  That looks like a misspelling of
>>> something.
>>>
>>> There is nothing wrong with
>>>
>>> pg_get_transaction_commit_timestamp()
>>>
>>> If you want to reduce the length, lose the "get".
>>
>> I am fine with that, I only wonder if your definition of "anything" only
>> concerns the SQL interfaces or also the internals.
>
> I'd be fine with commit_ts for internals, but not committs.

I agree that committs is poor.  But I'd argue for spelling out commit_timestamp everywhere. It is more clear and easier
togrep. 

...Robert


Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 08/11/14 00:35, Robert Haas wrote:
>> On Nov 5, 2014, at 7:31 PM, Steve Singer <steve@ssinger.info> wrote:
>>> On 11/05/2014 05:43 PM, Andres Freund wrote:
>>> On 2014-11-05 17:17:05 -0500, Steve Singer wrote:
>>> Imo that's essentially a different feature. What you essentially would
>>> need here is a 'commit sequence number' - but no timestamps. And
>>> probably to be useful that number has to be 8 bytes in itself.
>>
>> I think this gets to the heart of some of the differing views people have expressed on this patch
>>
>> Is this patch supposed to:
>>
>> A)  Add commit timestamp tracking but nothing more
>>
>> B) Add infrastructure to store commit timestamps and provide a facility for storing additional bits of data
extensionsmight want to be associated with the commit
 
>>
>> C).  Add commit timestamps and node identifiers to commits
>
> Well put.
>
> I think the authors of this patch are suffering from a certain amount of myopia.  Commit timestamps are useful, but
soare commit LSNs, and it makes little sense to me to suppose that we should have two different systems for those
closely-relatedneeds.
 
>
> Like Andres, I think B is impractical, so let's just be honest and admit that C is what we're really doing. But let's
addLSNs so the people who want that can be happy too.
 
>

The list of what is useful might be long, but we can't have everything 
there as there are space constraints, and LSN is another 8 bytes and I 
still want to have some bytes for storing the "origin" or whatever you 
want to call it there, as that's the one I personally have biggest 
use-case for.
So this would be ~24bytes per txid already, hmm I wonder if we can pull 
some tricks to lower that a bit.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Fri, Nov 7, 2014 at 7:07 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> The list of what is useful might be long,

That's FUD.  It might also be short.

> but we can't have everything there
> as there are space constraints, and LSN is another 8 bytes and I still want
> to have some bytes for storing the "origin" or whatever you want to call it
> there, as that's the one I personally have biggest use-case for.
> So this would be ~24bytes per txid already, hmm I wonder if we can pull some
> tricks to lower that a bit.

It won't do to say "let's do the things that I want, and foreclose
forever the things that other people want".  I find it quite hard to
believe that 16 bytes per transaction is a perfectly tolerable
overhead but 24 bytes per transaction will break the bank.  But if
that is really true then we ought to reject this patch altogether,
because it's unacceptable, in any arena, for a patch that only
benefits extensions to consume all of the available bit-space in,
leaving none for future core needs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 08/11/14 03:05, Robert Haas wrote:
> On Fri, Nov 7, 2014 at 7:07 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> but we can't have everything there
>> as there are space constraints, and LSN is another 8 bytes and I still want
>> to have some bytes for storing the "origin" or whatever you want to call it
>> there, as that's the one I personally have biggest use-case for.
>> So this would be ~24bytes per txid already, hmm I wonder if we can pull some
>> tricks to lower that a bit.
>
> It won't do to say "let's do the things that I want, and foreclose
> forever the things that other people want".  I find it quite hard to
> believe that 16 bytes per transaction is a perfectly tolerable
> overhead but 24 bytes per transaction will break the bank.  But if
> that is really true then we ought to reject this patch altogether,
> because it's unacceptable, in any arena, for a patch that only
> benefits extensions to consume all of the available bit-space in,
> leaving none for future core needs.
>

That's not what I said. I am actually ok with adding the LSN if people 
see it useful.
I was just wondering if we can make the record smaller somehow - 24bytes 
per txid is around 96GB of data for whole txid range and won't work with 
pages smaller than ~4kBs unless we add 6 char support to SLRU (which is 
not hard and we could also not allow track_commit_timestamps to be 
turned on with smaller pagesize...).

I remember somebody was worried about this already during the original 
patch submission and it can't be completely ignored in the discussion 
about adding more stuff into the record.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Sat, Nov 8, 2014 at 5:35 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> That's not what I said. I am actually ok with adding the LSN if people see
> it useful.
> I was just wondering if we can make the record smaller somehow - 24bytes per
> txid is around 96GB of data for whole txid range and won't work with pages
> smaller than ~4kBs unless we add 6 char support to SLRU (which is not hard
> and we could also not allow track_commit_timestamps to be turned on with
> smaller pagesize...).
>
> I remember somebody was worried about this already during the original patch
> submission and it can't be completely ignored in the discussion about adding
> more stuff into the record.

Fair point.  Sorry I misunderstood.

I think the key question here is the time for which the data needs to
be retained.  2^32 of anything is a lot, but why keep around that
number of records rather than more (after all, we have epochs to
distinguish one use of a given txid from another) or fewer?  Obvious
alternatives include:

- Keep the data for some period of time; discard the data when it's
older than some threshold.
- Keep a certain amount of total data; every time we create a new
file, discard the oldest one.
- Let consumers of the data say how much they need, and throw away
data when it's no longer needed by the oldest consumer.
- Some combination of the above.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Steve Singer
Date:
On 11/07/2014 07:07 PM, Petr Jelinek wrote:


> The list of what is useful might be long, but we can't have everything 
> there as there are space constraints, and LSN is another 8 bytes and I 
> still want to have some bytes for storing the "origin" or whatever you 
> want to call it there, as that's the one I personally have biggest 
> use-case for.
> So this would be ~24bytes per txid already, hmm I wonder if we can 
> pull some tricks to lower that a bit.
>

The reason why Jim and myself are asking for the LSN and not just the 
timestamp is that I want to be able to order the transactions. Jim 
pointed out earlier in the thread that just ordering on timestamp allows 
for multiple transactions with the same timestamp.

Maybe we don't need the entire LSN to solve that.  If you already have 
the commit timestamp maybe you only need another byte or two of 
granularity to order transactions that are within the same microsecond.




Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Robert Haas wrote:

> I think the key question here is the time for which the data needs to
> be retained.  2^32 of anything is a lot, but why keep around that
> number of records rather than more (after all, we have epochs to
> distinguish one use of a given txid from another) or fewer?

The problem is not how much data we retain; is about how much data we
can address.  We must be able to address the data for transaction with
xid=2^32-1, even if you only retain the 1000 most recent transactions.
In fact, we already only retain data back to RecentXmin, if I recall
correctly.  All slru.c users work that way.

Back when pg_multixact/members had the 5-char issue, I came up with a
patch that had each slru.c user declare how many chars maximum were the
filenames.  I didn't push further with that because there was an issue
with it, I don't remember what it was offhand (but I don't think I
posted it).  But this is only needed so that the filenames are all equal
width, which is mostly cosmetical; the rest of the module works fine
with 4- or 5-char filenames, and can be trivially expanded to support 6
or more.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Anssi Kääriäinen
Date:
On Sun, 2014-11-09 at 11:57 -0500, Steve Singer wrote:

> The reason why Jim and myself are asking for the LSN and not just the 
> timestamp is that I want to be able to order the transactions. Jim 
> pointed out earlier in the thread that just ordering on timestamp allows 
> for multiple transactions with the same timestamp.
> 
> Maybe we don't need the entire LSN to solve that.  If you already have 
> the commit timestamp maybe you only need another byte or two of 
> granularity to order transactions that are within the same microsecond.

There is no guarantee that a commit with later LSN has a later
timestamp. There are cases where the clock could move significantly
backwards.

A robust solution to storing transaction commit information (including
commit order) in a way that can be referenced from other tables, can be
loaded to another cluster, and survives crashes would be a great
feature. But this feature doesn't have those properties.
- Anssi





Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 10/11/14 08:01, Anssi Kääriäinen wrote:
> On Sun, 2014-11-09 at 11:57 -0500, Steve Singer wrote:
>
>> The reason why Jim and myself are asking for the LSN and not just the
>> timestamp is that I want to be able to order the transactions. Jim
>> pointed out earlier in the thread that just ordering on timestamp allows
>> for multiple transactions with the same timestamp.
>>
>> Maybe we don't need the entire LSN to solve that.  If you already have
>> the commit timestamp maybe you only need another byte or two of
>> granularity to order transactions that are within the same microsecond.
>
> There is no guarantee that a commit with later LSN has a later
> timestamp. There are cases where the clock could move significantly
> backwards.
>
> A robust solution to storing transaction commit information (including
> commit order) in a way that can be referenced from other tables, can be
> loaded to another cluster, and survives crashes would be a great
> feature. But this feature doesn't have those properties.
>

It has the property of surviving crashes.

Not sure what you mean by referencing from other tables?

And about loading to another cluster, the txid does not really have any 
meaning on another cluster, so the info about it does not have either?

But anyway this patch is targeting extensions not DBAs - you could write 
extension that will provide that feature on top of this patch (although 
given what I wrote above I don't see how it's useful).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Sun, Nov 9, 2014 at 8:41 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> I think the key question here is the time for which the data needs to
>> be retained.  2^32 of anything is a lot, but why keep around that
>> number of records rather than more (after all, we have epochs to
>> distinguish one use of a given txid from another) or fewer?
>
> The problem is not how much data we retain; is about how much data we
> can address.

I thought I was responding to a concern about disk space utilization.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Robert Haas
Date:
On Mon, Nov 10, 2014 at 2:01 AM, Anssi Kääriäinen
<anssi.kaariainen@thl.fi> wrote:
> There is no guarantee that a commit with later LSN has a later
> timestamp. There are cases where the clock could move significantly
> backwards.

Good point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 09/11/14 17:57, Steve Singer wrote:
> On 11/07/2014 07:07 PM, Petr Jelinek wrote:
>> The list of what is useful might be long, but we can't have everything
>> there as there are space constraints, and LSN is another 8 bytes and I
>> still want to have some bytes for storing the "origin" or whatever you
>> want to call it there, as that's the one I personally have biggest
>> use-case for.
>> So this would be ~24bytes per txid already, hmm I wonder if we can
>> pull some tricks to lower that a bit.
>>
>
> The reason why Jim and myself are asking for the LSN and not just the
> timestamp is that I want to be able to order the transactions. Jim
> pointed out earlier in the thread that just ordering on timestamp allows
> for multiple transactions with the same timestamp.
>
> Maybe we don't need the entire LSN to solve that.  If you already have
> the commit timestamp maybe you only need another byte or two of
> granularity to order transactions that are within the same microsecond.
>

Hmm maybe just one part of LSN, but I don't really like that either, if 
we want to store LSN we should probably store it as is as somebody might 
want to map it to txid for other reasons.

I did the calculation above wrong btw, it's actually 20 bytes not 24 
bytes per record, I am inclined to just say we can live with that.

Since we agreed that the (B) case is not really feasible and we are 
doing the (C), I also wonder if extradata should be renamed to nodeid 
(even if it's not used at this point as nodeid). And then there is 
question about the size of it, since the nodeid itself can live with 2 
bytes probably ("64k of nodes ought to be enough for everybody" ;) ).
Or leave the extradata as is but use as reserved space for future use 
and not expose it at this time on SQL level at all?

I guess Andres could answer what suits him better here.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Sun, Nov 9, 2014 at 8:41 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Robert Haas wrote:
> >> I think the key question here is the time for which the data needs to
> >> be retained.  2^32 of anything is a lot, but why keep around that
> >> number of records rather than more (after all, we have epochs to
> >> distinguish one use of a given txid from another) or fewer?
> >
> > The problem is not how much data we retain; is about how much data we
> > can address.
> 
> I thought I was responding to a concern about disk space utilization.

Ah, right.  So AFAIK we don't need to keep anything older than
RecentXmin or something like that -- which is not too old.  If I recall
correctly Josh Berkus was saying in a thread about pg_multixact that it
used about 128kB or so in <= 9.2 for his customers; that one was also
limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
would be pretty acceptable.  Moreso considering that it's turned off by
default.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Mon, Nov 10, 2014 at 8:39 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> I did the calculation above wrong btw, it's actually 20 bytes not 24 bytes
> per record, I am inclined to just say we can live with that.

If you do it as 20 bytes, you'll have to do some work to squeeze out
the alignment padding.  I'm inclined to think it's fine to have a few
extra padding bytes here; someone might want to use those for
something in the future, and they probably don't cost much.

> Since we agreed that the (B) case is not really feasible and we are doing
> the (C), I also wonder if extradata should be renamed to nodeid (even if
> it's not used at this point as nodeid). And then there is question about the
> size of it, since the nodeid itself can live with 2 bytes probably ("64k of
> nodes ought to be enough for everybody" ;) ).
> Or leave the extradata as is but use as reserved space for future use and
> not expose it at this time on SQL level at all?

I vote for calling it node-ID, and for allowing at least 4 bytes for
it.  Penny wise, pound foolish.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Robert Haas
Date:
On Mon, Nov 10, 2014 at 8:40 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Ah, right.  So AFAIK we don't need to keep anything older than
> RecentXmin or something like that -- which is not too old.  If I recall
> correctly Josh Berkus was saying in a thread about pg_multixact that it
> used about 128kB or so in <= 9.2 for his customers; that one was also
> limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
> would be pretty acceptable.  Moreso considering that it's turned off by
> default.

I'm not sure whether keeping it just back to RecentXmin will be enough
for everybody's needs.  But we certainly don't need to keep the last
2^32 records as someone-or-other was suggesting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Steve Singer
Date:
On 11/10/2014 08:39 AM, Petr Jelinek wrote:
> On 09/11/14 17:57, Steve Singer wrote:
>> On 11/07/2014 07:07 PM, Petr Jelinek wrote:
>>> The list of what is useful might be long, but we can't have everything
>>> there as there are space constraints, and LSN is another 8 bytes and I
>>> still want to have some bytes for storing the "origin" or whatever you
>>> want to call it there, as that's the one I personally have biggest
>>> use-case for.
>>> So this would be ~24bytes per txid already, hmm I wonder if we can
>>> pull some tricks to lower that a bit.
>>>
>>
>> The reason why Jim and myself are asking for the LSN and not just the
>> timestamp is that I want to be able to order the transactions. Jim
>> pointed out earlier in the thread that just ordering on timestamp allows
>> for multiple transactions with the same timestamp.
>>
>> Maybe we don't need the entire LSN to solve that.  If you already have
>> the commit timestamp maybe you only need another byte or two of
>> granularity to order transactions that are within the same microsecond.
>>
>
> Hmm maybe just one part of LSN, but I don't really like that either, 
> if we want to store LSN we should probably store it as is as somebody 
> might want to map it to txid for other reasons.
>
> I did the calculation above wrong btw, it's actually 20 bytes not 24 
> bytes per record, I am inclined to just say we can live with that.
>
> Since we agreed that the (B) case is not really feasible and we are 
> doing the (C), I also wonder if extradata should be renamed to nodeid 
> (even if it's not used at this point as nodeid). And then there is 
> question about the size of it, since the nodeid itself can live with 2 
> bytes probably ("64k of nodes ought to be enough for everybody" ;) ).
> Or leave the extradata as is but use as reserved space for future use 
> and not expose it at this time on SQL level at all?
>
> I guess Andres could answer what suits him better here.
>

I am happy with renaming extradata to nodeid and not exposing it at this 
time.

If we feel that commit-order (ie LSN or something equivalent) is really 
a different patch/feature than commit-timestamp then I am okay with that 
also but we should make sure to warn users of the commit-timestamp in 
the documentation that two transactions might have the same timestamp 
and that the commit order might not be the same as ordering by the 
commit timestamp.






Re: tracking commit timestamps

From
Simon Riggs
Date:
On 4 November 2014 08:23, Andres Freund <andres@2ndquadrant.com> wrote:

>> 6) Shouldn't any value update of track_commit_timestamp be tracked in
>> XLogReportParameters? That's thinking about making the commit timestamp
>> available on standbys as well..
>
> Yes, it should.

Agree committs should be able to run on standby, but it seems possible
to do that without it running on the master. The two should be
unconnected.

Not sure why we'd want to have parameter changes on master reported?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-11 16:10:47 +0000, Simon Riggs wrote:
> On 4 November 2014 08:23, Andres Freund <andres@2ndquadrant.com> wrote:
> 
> >> 6) Shouldn't any value update of track_commit_timestamp be tracked in
> >> XLogReportParameters? That's thinking about making the commit timestamp
> >> available on standbys as well..
> >
> > Yes, it should.
> 
> Agree committs should be able to run on standby, but it seems possible
> to do that without it running on the master.

I don't think that's realistic. It requires WAL to be written in some
cases, so that's not going to work. I also don't think it's a
particularly interesting ability?

> The two should be unconnected.

Why?

> Not sure why we'd want to have parameter changes on master reported?

So it works correctly. We're currently truncating the slru on startup
when the guc is disabled which would cause problems WAL records coming
in from the primary. I think the code also needs some TLC to correctly
work after a failover.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 9 November 2014 16:57, Steve Singer <steve@ssinger.info> wrote:
> On 11/07/2014 07:07 PM, Petr Jelinek wrote:
>
>
>> The list of what is useful might be long, but we can't have everything
>> there as there are space constraints, and LSN is another 8 bytes and I still
>> want to have some bytes for storing the "origin" or whatever you want to
>> call it there, as that's the one I personally have biggest use-case for.
>> So this would be ~24bytes per txid already, hmm I wonder if we can pull
>> some tricks to lower that a bit.
>>
>
> The reason why Jim and myself are asking for the LSN and not just the
> timestamp is that I want to be able to order the transactions. Jim pointed
> out earlier in the thread that just ordering on timestamp allows for
> multiple transactions with the same timestamp.
>
> Maybe we don't need the entire LSN to solve that.  If you already have the
> commit timestamp maybe you only need another byte or two of granularity to
> order transactions that are within the same microsecond.

It looks like there are quite a few potential uses for this.

If we include everything it will be too fat to use for any of the
potential uses, since each will be pulled down by the others.

Sounds like it needs to be configurable in some way.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 11 November 2014 16:19, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-11-11 16:10:47 +0000, Simon Riggs wrote:
>> On 4 November 2014 08:23, Andres Freund <andres@2ndquadrant.com> wrote:
>>
>> >> 6) Shouldn't any value update of track_commit_timestamp be tracked in
>> >> XLogReportParameters? That's thinking about making the commit timestamp
>> >> available on standbys as well..
>> >
>> > Yes, it should.
>>
>> Agree committs should be able to run on standby, but it seems possible
>> to do that without it running on the master.
>
> I don't think that's realistic. It requires WAL to be written in some
> cases, so that's not going to work. I also don't think it's a
> particularly interesting ability?

OK, so we are saying commit timestamp will NOT be available on Standbys.

I'm fine with that, since data changes aren't generated there.


> So it works correctly. We're currently truncating the slru on startup
> when the guc is disabled which would cause problems WAL records coming
> in from the primary. I think the code also needs some TLC to correctly
> work after a failover.

OK

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Andres Freund
Date:
On 2014-11-11 17:09:54 +0000, Simon Riggs wrote:
> On 11 November 2014 16:19, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-11-11 16:10:47 +0000, Simon Riggs wrote:
> >> On 4 November 2014 08:23, Andres Freund <andres@2ndquadrant.com> wrote:
> >>
> >> >> 6) Shouldn't any value update of track_commit_timestamp be tracked in
> >> >> XLogReportParameters? That's thinking about making the commit timestamp
> >> >> available on standbys as well..
> >> >
> >> > Yes, it should.
> >>
> >> Agree committs should be able to run on standby, but it seems possible
> >> to do that without it running on the master.
> >
> > I don't think that's realistic. It requires WAL to be written in some
> > cases, so that's not going to work. I also don't think it's a
> > particularly interesting ability?
> 
> OK, so we are saying commit timestamp will NOT be available on Standbys.

Hm? They should be available - xact.c WAL replay will redo the setting
of the timestamps and explicitly overwritten timestamps will generate
their own WAL records. What I mean is just that you can't use commit
timestamps without also using it on the primary.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Jim Nasby
Date:
On 11/10/14, 7:40 AM, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Sun, Nov 9, 2014 at 8:41 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>> Robert Haas wrote:
>>>> I think the key question here is the time for which the data needs to
>>>> be retained.  2^32 of anything is a lot, but why keep around that
>>>> number of records rather than more (after all, we have epochs to
>>>> distinguish one use of a given txid from another) or fewer?
>>>
>>> The problem is not how much data we retain; is about how much data we
>>> can address.
>>
>> I thought I was responding to a concern about disk space utilization.
>
> Ah, right.  So AFAIK we don't need to keep anything older than
> RecentXmin or something like that -- which is not too old.  If I recall
> correctly Josh Berkus was saying in a thread about pg_multixact that it
> used about 128kB or so in <= 9.2 for his customers; that one was also
> limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
> would be pretty acceptable.  Moreso considering that it's turned off by
> default.

FWIW, AFAICS MultiXacts are only truncated after a (auto)vacuum process is able to advance datminmxid, which will (now)
onlyhappen when an entire relation has been scanned (which should be infrequent).
 

I believe the low normal space usage is just an indication that most databases don't use many MultiXacts.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On 11/10/14, 7:40 AM, Alvaro Herrera wrote:

> >Ah, right.  So AFAIK we don't need to keep anything older than
> >RecentXmin or something like that -- which is not too old.  If I recall
> >correctly Josh Berkus was saying in a thread about pg_multixact that it
> >used about 128kB or so in <= 9.2 for his customers; that one was also
> >limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
> >would be pretty acceptable.  Moreso considering that it's turned off by
> >default.
> 
> FWIW, AFAICS MultiXacts are only truncated after a (auto)vacuum process is able to advance datminmxid, which will
(now)only happen when an entire relation has been scanned (which should be infrequent).
 
> 
> I believe the low normal space usage is just an indication that most databases don't use many MultiXacts.

That's in 9.3.  Prior to that, they were truncated much more often.
Maybe you've not heard enough about this commit:

commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Wed Jan 23 12:04:59 2013 -0300
   Improve concurrency of foreign key locking

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Jim Nasby
Date:
On 11/11/14, 2:03 PM, Alvaro Herrera wrote:
> Jim Nasby wrote:
>> On 11/10/14, 7:40 AM, Alvaro Herrera wrote:
>
>>> Ah, right.  So AFAIK we don't need to keep anything older than
>>> RecentXmin or something like that -- which is not too old.  If I recall
>>> correctly Josh Berkus was saying in a thread about pg_multixact that it
>>> used about 128kB or so in <= 9.2 for his customers; that one was also
>>> limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
>>> would be pretty acceptable.  Moreso considering that it's turned off by
>>> default.
>>
>> FWIW, AFAICS MultiXacts are only truncated after a (auto)vacuum process is able to advance datminmxid, which will
(now)only happen when an entire relation has been scanned (which should be infrequent).
 
>>
>> I believe the low normal space usage is just an indication that most databases don't use many MultiXacts.
>
> That's in 9.3.  Prior to that, they were truncated much more often.

Well, we're talking about a new feature, so I wasn't looking in back branches. ;P

> Maybe you've not heard enough about this commit:
>
> commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182

Interestingly, git.postgresql.org hasn't either:
http://git.postgresql.org/gitweb/?p=postgresql.git&a=search&h=HEAD&st=commit&s=0ac5ad5134f2769ccbaefec73844f8504c4d6182

The commit is certainly there though...
decibel@decina:[15:12]~/pgsql/HEAD/src/backend (master=)$git log 0ac5ad5134f2769ccbaefec73844f8504c4d6182|head -n1
commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On 11/11/14, 2:03 PM, Alvaro Herrera wrote:
> >Jim Nasby wrote:
> >>On 11/10/14, 7:40 AM, Alvaro Herrera wrote:
> >
> >>>Ah, right.  So AFAIK we don't need to keep anything older than
> >>>RecentXmin or something like that -- which is not too old.  If I recall
> >>>correctly Josh Berkus was saying in a thread about pg_multixact that it
> >>>used about 128kB or so in <= 9.2 for his customers; that one was also
> >>>limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
> >>>would be pretty acceptable.  Moreso considering that it's turned off by
> >>>default.
> >>
> >>FWIW, AFAICS MultiXacts are only truncated after a (auto)vacuum process is able to advance datminmxid, which will
(now)only happen when an entire relation has been scanned (which should be infrequent).
 
> >>
> >>I believe the low normal space usage is just an indication that most databases don't use many MultiXacts.
> >
> >That's in 9.3.  Prior to that, they were truncated much more often.
> 
> Well, we're talking about a new feature, so I wasn't looking in back branches. ;P

Well, I did mention <= 9.2 in the text above ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 10/11/14 14:53, Robert Haas wrote:
> On Mon, Nov 10, 2014 at 8:39 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> I did the calculation above wrong btw, it's actually 20 bytes not 24 bytes
>> per record, I am inclined to just say we can live with that.
>
> If you do it as 20 bytes, you'll have to do some work to squeeze out
> the alignment padding.  I'm inclined to think it's fine to have a few
> extra padding bytes here; someone might want to use those for
> something in the future, and they probably don't cost much.
>

I did get around the alignment via memcpy, so it is still 20bytes.

>> Since we agreed that the (B) case is not really feasible and we are doing
>> the (C), I also wonder if extradata should be renamed to nodeid (even if
>> it's not used at this point as nodeid). And then there is question about the
>> size of it, since the nodeid itself can live with 2 bytes probably ("64k of
>> nodes ought to be enough for everybody" ;) ).
>> Or leave the extradata as is but use as reserved space for future use and
>> not expose it at this time on SQL level at all?
>
> I vote for calling it node-ID, and for allowing at least 4 bytes for
> it.  Penny wise, pound foolish.
>

Ok, I went this way.

Anyway here is v8 version of the patch, I think I addressed all the
concerns mentioned, it's also rebased against current master (BRIN
commit added some conflicts).

Brief list of changes:
  - the commit timestamp record now stores timestamp, lsn and nodeid
  - added code to disallow turning track_commit_timestamp on with too
small pagesize
  - the get interfaces error out when track_commit_timestamp is off
  - if the xid passed to get interface is out of range -infinity
timestamp is returned (I think it's bad idea to throw errors here as the
valid range is not static and same ID can start throwing errors between
calls theoretically)
  - renamed the sql interfaces to pg_xact_commit_timestamp,
pg_xact_commit_timestamp_data and pg_last_committed_xact, they don't
expose the nodeid atm, I personally am not big fan of the "xact" but it
seems more consistent with existing naming
  - documented pg_resetxlog changes and make all the pg_resetxlog
options alphabetically ordered
  - committs is not used anymore, it's commit_ts (and CommitTs in
camelcase), I think it's not really good idea to spell the timestamp
everywhere as some interface then get 30+ chars long names...
  - added WAL logging of the track_commit_timestamp GUC
  - added alternative expected output of the regression test so that it
works with make installcheck when track_commit_timestamp is on
  - added C interface to set default nodeid for current backend
  - several minor comment and naming adjustments mostly suggested by Michael


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Jim Nasby
Date:
On 11/12/14, 7:06 AM, Petr Jelinek wrote:
>   - if the xid passed to get interface is out of range -infinity timestamp is returned (I think it's bad idea to
throwerrors here as the valid range is not static and same ID can start throwing errors between calls theoretically)
 

Wouldn't NULL be more appropriate?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: tracking commit timestamps

From
Michael Paquier
Date:
On Thu, Nov 13, 2014 at 7:56 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 11/12/14, 7:06 AM, Petr Jelinek wrote:
>>
>>   - if the xid passed to get interface is out of range -infinity timestamp
>> is returned (I think it's bad idea to throw errors here as the valid range
>> is not static and same ID can start throwing errors between calls
>> theoretically)
>
>
> Wouldn't NULL be more appropriate?
Definitely. Defining a given value for information not valid is awkward.
-- 
Michael



Re: tracking commit timestamps

From
Michael Paquier
Date:
On Wed, Nov 12, 2014 at 10:06 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> Brief list of changes:
>  - the commit timestamp record now stores timestamp, lsn and nodeid
Now that not only the commit timestamp is stored, calling that "commit
timestamp", "committs" or "commit_timestamp" is strange, no? If this
patch is moving toward being a more complex information provider,
calling it commit information or commit data is more adapted, no?
Documentation would need a fresh brush as well in this case.

>  - added code to disallow turning track_commit_timestamp on with too small
> pagesize
>  - the get interfaces error out when track_commit_timestamp is off
OK, that's sane.

>  - if the xid passed to get interface is out of range -infinity timestamp is
> returned (I think it's bad idea to throw errors here as the valid range is
> not static and same ID can start throwing errors between calls
> theoretically)
Already mentioned by Jim in a previous mail: this would be better as NULL.

>  - renamed the sql interfaces to pg_xact_commit_timestamp,
> pg_xact_commit_timestamp_data and pg_last_committed_xact, they don't expose
> the nodeid atm, I personally am not big fan of the "xact" but it seems more
> consistent with existing naming
pg_xact_commit_timestamp and pg_xact_commit_timestamp_data are
overlapping. What's wrong with a single function able to return the
whole set (node ID, commit timetamp, commit LSN)? Let's say
pg_xact_commit_information or pg_xact_commit_data. Already mentioned,
but I also find using a SRF able to return all the available
information from a given XID value quite useful. And this does not
conflict with what is proposed currently, you would need just to call
the function with XID + number of entries wanted to get a single one.
Comments from other folks about that?

>  - documented pg_resetxlog changes and make all the pg_resetxlog options
> alphabetically ordered
>  - added WAL logging of the track_commit_timestamp GUC
>  - added alternative expected output of the regression test so that it works
> with make installcheck when track_commit_timestamp is on
>  - added C interface to set default nodeid for current backend
>  - several minor comment and naming adjustments mostly suggested by Michael
Thanks for those adjustments.

Then more input about the latest patch:
1) This block is not needed, option -e is listed twice:   The <option>-o</>, <option>-x</>, <option>-e</>,
-   <option>-m</>, <option>-O</>,
-   and <option>-l</>
+   <option>-m</>, <option>-O</>, <option>-l</>
+   and <option>-e</>
2) Very small thing: a couple of files have no newlines at the end,
among them committs.conf and test_committs/Makefile.
3) pg_last_committed_xact and not pg_last_xact_commit_information or similar?
4) storage.sgml needs to be updated with the new folder pg_committs
5) Er.. node ID is missing in pg_last_committed_xact, no?
6) This XXX notice can be removed:
+       /*
+        * Return empty if the requested value is older than what we have or
+        * newer than newest we have.
+        *
+        * XXX: should this be error instead?
+        */
We are moving toward returning invalid information in the SQL
interface when the information is not in history instead of an error,
no? (Note that I am still a partisan of an error message to let the
caller know that commit info history does not have the information
requested).
7) Note that TransactionTreeSetCommitTsData still never sets do_xlog
at true and that WriteSetTimestampXlogRec never gets called. So no
information is WAL-logged with this patch. Wouldn't that be useful for
standbys as well? Perhaps I am missing why this is disabled? This code
should be activated IMO or it would be just untested.
8) As a more general point, the node ID stuff makes me uncomfortable
and is just added on top of the existing patch without much
thinking... So I am really skeptical about it. The need here is to
pass on demand a int8 that is a node ID that can only be set through a
C interface, so only extensions could play with it. The data passed to
a WAL record is always built and determined by the system and entirely
transparent to the user, inserting user-defined data like that
inconsistent with what we've been doing until now, no?

Also, a question particularly for BDR and Slony folks: do you
sometimes add a new node using the base backup of an existing node :)
See what I come up with: a duplication of this new node ID system with
the already present system ID, no?
Similarly, the LSN is added to the WAL record containing the commit
timestamp, but cannot the LSN of the WAL record containing the commit
timestamp itself be used as a point of reference for a better
ordering? That's not exactly the same as the LSN of the transaction
commit, still it provides a WAL-based reference.
Regards,
-- 
Michael



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 13/11/14 07:04, Michael Paquier wrote:
> On Wed, Nov 12, 2014 at 10:06 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> Brief list of changes:
>>   - the commit timestamp record now stores timestamp, lsn and nodeid
> Now that not only the commit timestamp is stored, calling that "commit
> timestamp", "committs" or "commit_timestamp" is strange, no? If this
> patch is moving toward being a more complex information provider,
> calling it commit information or commit data is more adapted, no?

It's not, since adding more info will break upgrades, I doubt we will 
add more anytime soon. I was thinking about it too tbh, but don't have 
better name (I don't like commit data as it seems confusing - isn't 
commit data the dml you just committed?).
Maybe commit_metadata, commit_information is probably ok also, this 
would need input from others, I am personally fine with keeping the 
commit_timestamp name too.

>
>>   - if the xid passed to get interface is out of range -infinity timestamp is
>> returned (I think it's bad idea to throw errors here as the valid range is
>> not static and same ID can start throwing errors between calls
>> theoretically)
> Already mentioned by Jim in a previous mail: this would be better as NULL.

Yeah that make sense, I have no idea what I was thinking :)

>
>>   - renamed the sql interfaces to pg_xact_commit_timestamp,
>> pg_xact_commit_timestamp_data and pg_last_committed_xact, they don't expose
>> the nodeid atm, I personally am not big fan of the "xact" but it seems more
>> consistent with existing naming
> pg_xact_commit_timestamp and pg_xact_commit_timestamp_data are
> overlapping. What's wrong with a single function able to return the
> whole set (node ID, commit timetamp, commit LSN)? Let's say

That's what pg_xact_commit_timestamp_data does (it does not show nodeid 
because we agreed that it should not be exposed yet on sql level). Might 
make sense to rename, but let's wait for input about the general 
renaming point at the beginning of the mail.

> pg_xact_commit_information or pg_xact_commit_data. Already mentioned,
> but I also find using a SRF able to return all the available
> information from a given XID value quite useful. And this does not
> conflict with what is proposed currently, you would need just to call
> the function with XID + number of entries wanted to get a single one.
> Comments from other folks about that?

No idea what you mean by this to be honest, there is exactly one record 
stored for single XID.

>
> Then more input about the latest patch:
> 1) This block is not needed, option -e is listed twice:
>      The <option>-o</>, <option>-x</>, <option>-e</>,
> -   <option>-m</>, <option>-O</>,
> -   and <option>-l</>
> +   <option>-m</>, <option>-O</>, <option>-l</>
> +   and <option>-e</>
> 2) Very small thing: a couple of files have no newlines at the end,
> among them committs.conf and test_committs/Makefile.
> 3) pg_last_committed_xact and not pg_last_xact_commit_information or similar?

Just inspiration from DB2's rollforward (which shows among other things 
"last committed transaction: <timestamp>"), but I don't feel strongly 
about naming so can be changed.

> 4) storage.sgml needs to be updated with the new folder pg_committs

Right.

> 5) Er.. node ID is missing in pg_last_committed_xact, no?

That's intentional (for now).

> 6) This XXX notice can be removed:
> +       /*
> +        * Return empty if the requested value is older than what we have or
> +        * newer than newest we have.
> +        *
> +        * XXX: should this be error instead?
> +        */

Ok.

> (Note that I am still a partisan of an error message to let the
> caller know that commit info history does not have the information
> requested).

IMHO throwing error there would be same as throwing error when WHERE 
clause in SELECT does not match anything. As the xid range for which we 
store data is dynamic we need to accept any xid as valid input because 
the caller has no way of validating if the xid passed is out of range or 
not.

> 7) Note that TransactionTreeSetCommitTsData still never sets do_xlog
> at true and that WriteSetTimestampXlogRec never gets called. So no
> information is WAL-logged with this patch. Wouldn't that be useful for
> standbys as well? Perhaps I am missing why this is disabled? This code
> should be activated IMO or it would be just untested.

True is only needed here when you are setting this info to different 
transaction than the one you are in since the info can be reconstructed 
from normal transaction WAL record (see that it's actually called from 
xact_redo_commit_internal, which is how we get the WAL safety and why it 
works on slave). So the true is for use by extensions only, it's not 
completely uncommon that we have APIs that are used only by extensions.

> 8) As a more general point, the node ID stuff makes me uncomfortable
> and is just added on top of the existing patch without much
> thinking... So I am really skeptical about it. The need here is to
> pass on demand a int8 that is a node ID that can only be set through a
> C interface, so only extensions could play with it. The data passed to
> a WAL record is always built and determined by the system and entirely
> transparent to the user, inserting user-defined data like that
> inconsistent with what we've been doing until now, no?

Again it's not exposed to SQL because I thought there was agreement to 
not do that yet since we might want to build some more core stuff on top 
of that before exposing it. It's part of the record now because it can 
be useful already the way it is and because adding it later would break 
pg_upgrade (it's int4 btw).

Also I would really not say it was added without thought, I am one of 
the BDR developers and I was before one of the Londiste developers so I 
did think about what I would want when in those shoes.

That being said I think I will remove the CommitTsSetDefaultNodeId 
interface in next revision, as extension can already set nodeid via 
TransactionTreeSetCommitTsData call and we might want to revisit the 
CommitTsSetDefaultNodeId stuff once we start implementing the 
replication identifiers. Not to mention that I realized in meantime that 
CommitTsSetDefaultNodeId the way it's done currently isn't crash safe 
(it's not hard to make it crash safe). And since it's quite simple we 
can add it at later date easily if needed.

>
> Also, a question particularly for BDR and Slony folks: do you
> sometimes add a new node using the base backup of an existing node :)
> See what I come up with: a duplication of this new node ID system with
> the already present system ID, no?

Yes we do use basebackup sometimes and no it's not possible to use 
systemid here: - the point of nodeid is to be able to store *remote* nodeid as well 
as local one (depending where the change actually originated from) so 
your local systemid is quite useless there - systemid is per Postgres instance, you need per-db identifier when 
doing logical rep (2 dbs can have single db as destination or the other 
way around)

> Similarly, the LSN is added to the WAL record containing the commit
> timestamp, but cannot the LSN of the WAL record containing the commit
> timestamp itself be used as a point of reference for a better
> ordering? That's not exactly the same as the LSN of the transaction
> commit, still it provides a WAL-based reference.

No, again for several reasons: - as you pointed out yourself the LSN might not be same as LSN for the xid - more
importantlywe normally don't do special WAL logging for commit 
 
timestamp

How it works is that because currently the 
TransactionTreeSetCommitTsData is always called with xid of the current 
transaction, the WAL record for commit of current transaction can be 
used to get the info we need (both timestamp and lsn are used in fact).
As I said above, see how TransactionTreeSetCommitTsData is called from 
xact_redo_commit_internal.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 9 November 2014 16:57, Steve Singer <steve@ssinger.info> wrote:
> On 11/07/2014 07:07 PM, Petr Jelinek wrote:
>
>
>> The list of what is useful might be long, but we can't have everything
>> there as there are space constraints, and LSN is another 8 bytes and I still
>> want to have some bytes for storing the "origin" or whatever you want to
>> call it there, as that's the one I personally have biggest use-case for.
>> So this would be ~24bytes per txid already, hmm I wonder if we can pull
>> some tricks to lower that a bit.
>>
>
> The reason why Jim and myself are asking for the LSN and not just the
> timestamp is that I want to be able to order the transactions. Jim pointed
> out earlier in the thread that just ordering on timestamp allows for
> multiple transactions with the same timestamp.

I think we need to be clear about the role and function of components here.

Xid timestamps allow a replication system to do post-commit conflict
resolution based upon timestamp, i.e. last update wins. That is
potentially usable by BDR, Slony, xdb and anything else that wants
that.

Ordering transactions in LSN order is very precisly the remit of the
existing logical decoding API. Any user that wishes to see a commits
in sequence can do so using that API. BDR already does this, as do
other users of the decoding API. So Slony already has access to a
useful ordering if it wishes it. We do not need to anything *on this
patch* to enable LSNs for Slony or anyone else. I don't see any reason
to provide the same facility twice, in two different ways.

So in summary... the components are
* Commit LSN order is useful for applying changes - available by
logical decoding
* Commit timestamps and nodeid are useful for conflict resolution -
available from this patch

Both components have been designed in ways that allow multiple
replication systems to use these facilities.

So, -1 to including commit LSN in the SLRU alongside commit timestamp
and nodeid.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 13/11/14 14:18, Simon Riggs wrote:
>
> So in summary... the components are
> * Commit LSN order is useful for applying changes - available by
> logical decoding
> * Commit timestamps and nodeid are useful for conflict resolution -
> available from this patch
>
> Both components have been designed in ways that allow multiple
> replication systems to use these facilities.
>
> So, -1 to including commit LSN in the SLRU alongside commit timestamp
> and nodeid.
>

I am of the same opinion, I added the LSN "by popular demand", but I 
still personally don't see the value in having it there as it does *not* 
enable us to do something that would be impossible otherwise.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Thu, Nov 13, 2014 at 8:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Ordering transactions in LSN order is very precisly the remit of the
> existing logical decoding API. Any user that wishes to see a commits
> in sequence can do so using that API. BDR already does this, as do
> other users of the decoding API. So Slony already has access to a
> useful ordering if it wishes it. We do not need to anything *on this
> patch* to enable LSNs for Slony or anyone else. I don't see any reason
> to provide the same facility twice, in two different ways.

Perhaps you could respond more specifically to concerns expressed
upthread, like:

http://www.postgresql.org/message-id/BLU436-SMTP28B68B9312CBE5D9125441DC870@phx.gbl

I don't see that as a dumb argument on the face of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 13 November 2014 21:24, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 13, 2014 at 8:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Ordering transactions in LSN order is very precisly the remit of the
>> existing logical decoding API. Any user that wishes to see a commits
>> in sequence can do so using that API. BDR already does this, as do
>> other users of the decoding API. So Slony already has access to a
>> useful ordering if it wishes it. We do not need to anything *on this
>> patch* to enable LSNs for Slony or anyone else. I don't see any reason
>> to provide the same facility twice, in two different ways.
>
> Perhaps you could respond more specifically to concerns expressed
> upthread, like:
>
> http://www.postgresql.org/message-id/BLU436-SMTP28B68B9312CBE5D9125441DC870@phx.gbl
>
> I don't see that as a dumb argument on the face of it.

We have a clear "must have" argument for timestamps to support
replication conflicts.

Adding LSNs, when we already have a way to access them, is much more
of a nice to have. I don't really see it as a particularly nice to
have either, since the SLRU is in no way ordered.

Scope creep is a dangerous thing, so we shouldn't, and elsewhere
don't, collect up ideas like a bag of mixed sweets. It's easy to
overload things to the point where they don't fly at all. The whole
point of this is that we're building something faster than trigger
based systems.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Thu, Nov 13, 2014 at 6:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 13 November 2014 21:24, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Nov 13, 2014 at 8:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Ordering transactions in LSN order is very precisly the remit of the
>>> existing logical decoding API. Any user that wishes to see a commits
>>> in sequence can do so using that API. BDR already does this, as do
>>> other users of the decoding API. So Slony already has access to a
>>> useful ordering if it wishes it. We do not need to anything *on this
>>> patch* to enable LSNs for Slony or anyone else. I don't see any reason
>>> to provide the same facility twice, in two different ways.
>>
>> Perhaps you could respond more specifically to concerns expressed
>> upthread, like:
>>
>> http://www.postgresql.org/message-id/BLU436-SMTP28B68B9312CBE5D9125441DC870@phx.gbl
>>
>> I don't see that as a dumb argument on the face of it.
>
> We have a clear "must have" argument for timestamps to support
> replication conflicts.
>
> Adding LSNs, when we already have a way to access them, is much more
> of a nice to have. I don't really see it as a particularly nice to
> have either, since the SLRU is in no way ordered.
>
> Scope creep is a dangerous thing, so we shouldn't, and elsewhere
> don't, collect up ideas like a bag of mixed sweets. It's easy to
> overload things to the point where they don't fly at all. The whole
> point of this is that we're building something faster than trigger
> based systems.

I think that's slamming the door closed and nailing it shut behind
you.  If we add the feature without LSNs, how will someone go back and
add that later?  It would change the on-disk format of the SLRU, so to
avoid breaking pg_upgrade, someone would have to write a conversion
utility.  Even at that, it would slow pg_upgrade down when the feature
has been used.

By way of contrast, adding the feature now is quite easy.  It just
requires storing a few more bytes per transaction.

I am all in favor of incremental development when possible, but not
when it so greatly magnifies the work that needs to be done.  People
have been asking for the ability to determine the commit ordering for
years, and this is a way to do that, very inexpensively, as part of a
patch that is needed anyway.  We are not talking about loading 20 new
requirements on top of this patch; that would be intolerable.  We're
talking about adding one additional piece of information that has been
requested multiple times over the years.

The way I see it, there are at least three uses for this information.
One, trigger-based replication solutions.  While logical decoding will
doubtless be preferable, I don't think trigger-based replication
solutions will go away completely, and certainly not right away.
They've wanted this since forever.  Two, for user applications that
want to know the commit order for their own purposes, as in Steve's
example.  Three, for O(1) snapshots.  Heikki's patch to make that
happen seems to have stalled a bit, but if it's ever to go anywhere it
will need something like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 14 November 2014 17:12, Robert Haas <robertmhaas@gmail.com> wrote:

> We are not talking about loading 20 new
> requirements on top of this patch; that would be intolerable.  We're
> talking about adding one additional piece of information that has been
> requested multiple times over the years.

The requested information is already available, as discussed. Logical
decoding adds commit ordering for *exactly* the purpose of using it
for replication, available to all solutions. This often requested
feature has now been added and doesn't need to be added twice.

So what we are discussing is adding a completely superfluous piece of
information.

Not including the LSN info does nothing to trigger based replication,
which will no doubt live on happily for many years. But adding LSN
will slow down logical replication, for no purpose at all.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Steve Singer
Date:
On 11/14/2014 08:21 PM, Simon Riggs wrote:
> The requested information is already available, as discussed. Logical
> decoding adds commit ordering for *exactly* the purpose of using it
> for replication, available to all solutions. This often requested
> feature has now been added and doesn't need to be added twice.
>
> So what we are discussing is adding a completely superfluous piece of
> information.
>
> Not including the LSN info does nothing to trigger based replication,
> which will no doubt live on happily for many years. But adding LSN
> will slow down logical replication, for no purpose at all.
>

Simon,
The use cases I'm talking about aren't really replication related. Often 
I have come across systems that want to do something such as 'select * 
from orders where X > the_last_row_I_saw order by X' and then do further 
processing on the order.

This is kind of awkard to do today because you don't have a good 
candidate for 'X' to order on.   Using either a sequence or insert-row 
timestamp doesn't work well because a transaction with a lower value for 
X might end up committing after the higher value in in a query result.

Yes you could setup a logical wal slot and listen on the stream of 
inserts into your order table but thats a lot of administration overhead 
compared to just issuing an SQL query for what really is a query type 
operation.

Using the commit timestamp for my X sounded very tempting but could 
allow duplicates.

One could argue that this patch is about replication features, and 
providing commit ordering for query purposes should be a separate patch 
to add that on top of this infrastructure. I see merit to smaller more 
focused patches but that requires leaving the door open to easily 
extending things later.

It could also be that I'm the only one who wants to order and filter 
queries in this manner (but that would surprise me).  If the commit lsn 
has limited appeal and we decide we don't want it at all  then we 
shouldn't add it.  I've seen this type of requirement in a number of 
different systems at a number of different companies.  I've generally 
seen it dealt with by either selecting rows behind the last now() 
timestamp seen and then filtering out already processed rows or by 
tracking the 'processed' state of each row individually (ie performing 
an update on each row once its been processed) which performs poorly.

Steve








Re: tracking commit timestamps

From
Simon Riggs
Date:
On 15 November 2014 04:32, Steve Singer <steve@ssinger.info> wrote:

> The use cases I'm talking about aren't really replication related. Often I
> have come across systems that want to do something such as 'select * from
> orders where X > the_last_row_I_saw order by X' and then do further
> processing on the order.

Yes, existing facilities provide mechanisms for different types of
application change queues.

If you want to write a processing queue in SQL, that isn't the best
way. You'll need some way to keep track of whether or not its been
successfully processed. That's either a column in the table, or a
column in a queue table maintained by triggers, with the row write
locked on read. You can then have multiple readers from this queue
using the new SKIP LOCKED feature, which was specifically designed to
facilitate that.

Logical decoding was intended for much more than just replication. It
provides commit order access to changed data in a form that is both
usable and efficient for high volume applicatiion needs.

I don't see any reason to add LSN into a SLRU updated at commit to
support those application needs.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 15/11/14 13:36, Simon Riggs wrote:
> On 15 November 2014 04:32, Steve Singer <steve@ssinger.info> wrote:
>
>> The use cases I'm talking about aren't really replication related. Often I
>> have come across systems that want to do something such as 'select * from
>> orders where X > the_last_row_I_saw order by X' and then do further
>> processing on the order.
>
> Yes, existing facilities provide mechanisms for different types of
> application change queues.
>
> If you want to write a processing queue in SQL, that isn't the best
> way. You'll need some way to keep track of whether or not its been
> successfully processed. That's either a column in the table, or a
> column in a queue table maintained by triggers, with the row write
> locked on read. You can then have multiple readers from this queue
> using the new SKIP LOCKED feature, which was specifically designed to
> facilitate that.
>
> Logical decoding was intended for much more than just replication. It
> provides commit order access to changed data in a form that is both
> usable and efficient for high volume applicatiion needs.
>
> I don't see any reason to add LSN into a SLRU updated at commit to
> support those application needs.
>

I am still on the fence about the LSN issue, I don't mind it from code 
perspective, it's already written anyway, but I am not sure if we really 
want it in the SLRU as Simon says.

Mainly because of three things:
One, this patch is not really feature patch, as you can do most of what 
it does via tables already, but more a performance improvement and we 
should try to make it perform as good as possible then, adding more 
things does not really improve performance (according to my benchmarks 
the performance difference with/without LSN is under 1% so it's not 
terrible, but it's there), not to mention additional disk space.

Two, the LSN use-cases seem to still be only theoretical to me, while 
the timestamp use-case has been production problem for at least a decade.

Three, even if we add LSN, I am still not convinced that the use-cases 
presented here wouldn't be better served by putting that info into 
actual table instead of SLRU - as people want to use it as filter in 
WHERE clause, somebody mentioned exporting to different db, etc.

Maybe we need better explanation of the LSN use-case(s) to understand 
why it should be stored here and why the other solutions are 
significantly worse.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 19 November 2014 02:12, Petr Jelinek <petr@2ndquadrant.com> wrote:

> Maybe we need better explanation of the LSN use-case(s) to understand why it
> should be stored here and why the other solutions are significantly worse.

We should apply the same standard that has been applied elsewhere. If
someone can show some software that could actually make use of LSN and
there isn't a better way, then we can include it.

PostgreSQL isn't a place where we speculate about possible future needs.

I don't see why it should take 2+ years of prototypes, designs and
discussions to get something in from BDR, but then we simply wave a
hand and include something else at last minute without careful
thought. Even if that means that later additions might need to think
harder about upgrades.

Timestamp and nodeid are useful for a variety of cases; LSN doesn't
meet the same standard and should not be included now.

We still have many months before even beta for people that want LSN to
make a *separate* case for its inclusion as a separate feature.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 19/11/14 12:20, Simon Riggs wrote:
> On 19 November 2014 02:12, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>> Maybe we need better explanation of the LSN use-case(s) to understand why it
>> should be stored here and why the other solutions are significantly worse.
>
> We should apply the same standard that has been applied elsewhere. If
> someone can show some software that could actually make use of LSN and
> there isn't a better way, then we can include it.
>
> ...
>
> We still have many months before even beta for people that want LSN to
> make a *separate* case for its inclusion as a separate feature.
>

This is good point, we are not too late in the cycle that LSN couldn't 
be added later if we find that it is indeed needed (and we don't have to 
care about pg_upgrade until beta).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Petr Jelinek wrote:

> This is good point, we are not too late in the cycle that LSN couldn't be
> added later if we find that it is indeed needed (and we don't have to care
> about pg_upgrade until beta).

I think we're overblowing the pg_upgrade issue.  Surely we don't need to
preserve commit_ts data when upgrading across major versions; and
pg_upgrade is perfectly prepared to remove old data when upgrading
(actually it just doesn't copy it; consider pg_subtrans or pg_serial,
for instance.)  If we need to change binary representation in a future
major release, we can do so without any trouble.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Wed, Nov 19, 2014 at 8:22 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Petr Jelinek wrote:
>> This is good point, we are not too late in the cycle that LSN couldn't be
>> added later if we find that it is indeed needed (and we don't have to care
>> about pg_upgrade until beta).
>
> I think we're overblowing the pg_upgrade issue.  Surely we don't need to
> preserve commit_ts data when upgrading across major versions; and
> pg_upgrade is perfectly prepared to remove old data when upgrading
> (actually it just doesn't copy it; consider pg_subtrans or pg_serial,
> for instance.)  If we need to change binary representation in a future
> major release, we can do so without any trouble.

Actually, that's a good point.  I still don't understand what the
resistance is to add something quite inexpensive that multiple people
obviously want, but at least if we don't, we still have the option to
change it later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Steve Singer
Date:
On 11/19/2014 08:22 AM, Alvaro Herrera wrote:
>
> I think we're overblowing the pg_upgrade issue.  Surely we don't need to
> preserve commit_ts data when upgrading across major versions; and
> pg_upgrade is perfectly prepared to remove old data when upgrading
> (actually it just doesn't copy it; consider pg_subtrans or pg_serial,
> for instance.)  If we need to change binary representation in a future
> major release, we can do so without any trouble.
>

That sounds reasonable. I am okay with Petr removing the LSN portion 
this patch.




Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 19/11/14 17:30, Steve Singer wrote:
> On 11/19/2014 08:22 AM, Alvaro Herrera wrote:
>>
>> I think we're overblowing the pg_upgrade issue.  Surely we don't need to
>> preserve commit_ts data when upgrading across major versions; and
>> pg_upgrade is perfectly prepared to remove old data when upgrading
>> (actually it just doesn't copy it; consider pg_subtrans or pg_serial,
>> for instance.)  If we need to change binary representation in a future
>> major release, we can do so without any trouble.
>>
>
> That sounds reasonable. I am okay with Petr removing the LSN portion
> this patch.
>

I did that then, v9 attached with following changes:
  - removed lsn info (obviously)

  - the pg_xact_commit_timestamp and pg_last_committed_xact return NULLs
when timestamp data was not found

  - made the default nodeid crash safe - this also makes use of the
do_xlog parameter in TransactionTreeSetCommitTsData if nodeid is set,
although that still does not happen without extension actually using the API

  - added some more regression tests

  - some small comment and docs adjustments based on Michael's last email

I didn't change the pg_last_committed_xact function name and I didn't
make nodeid visible from SQL level interfaces and don't plan to in this
patch as I think it's very premature to do so before we have some C code
using it.

Just to explain once more and hopefully more clearly how the crash
safety/WAL logging is handled since there was some confusion in last review:
We only do WAL logging when nodeid is also logged (when nodeid is not 0)
because the timestamp itself can be read from WAL record of transaction
commit so it's pointless to log another WAL record just to store the
timestamp again.
Extension can either set default nodeid which is then logged
transparently, or can override the default logging mechanism by calling
TransactionTreeSetCommitTsData with whatever data it wants and do_xlog
set to true which will then write the WAL record with this overriding
information.
During WAL replay the commit timestamp is set from transaction commit
record and then if committs WAL record is found it's used to overwrite
the commit timestamp/nodeid for given xid.

Also, there is exactly one record in SLRU for each xid so there is no
point in making the SQL interfaces return multiple results.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 21/11/14 00:17, Petr Jelinek wrote:
> On 19/11/14 17:30, Steve Singer wrote:
>> On 11/19/2014 08:22 AM, Alvaro Herrera wrote:
>>>
>>> I think we're overblowing the pg_upgrade issue.  Surely we don't need to
>>> preserve commit_ts data when upgrading across major versions; and
>>> pg_upgrade is perfectly prepared to remove old data when upgrading
>>> (actually it just doesn't copy it; consider pg_subtrans or pg_serial,
>>> for instance.)  If we need to change binary representation in a future
>>> major release, we can do so without any trouble.
>>>
>>
>> That sounds reasonable. I am okay with Petr removing the LSN portion
>> this patch.
>>
>
> I did that then, v9 attached with following changes:
>   - removed lsn info (obviously)
>
>   - the pg_xact_commit_timestamp and pg_last_committed_xact return NULLs
> when timestamp data was not found
>
>   - made the default nodeid crash safe - this also makes use of the
> do_xlog parameter in TransactionTreeSetCommitTsData if nodeid is set,
> although that still does not happen without extension actually using the
> API
>
>   - added some more regression tests
>
>   - some small comment and docs adjustments based on Michael's last email
>
> I didn't change the pg_last_committed_xact function name and I didn't
> make nodeid visible from SQL level interfaces and don't plan to in this
> patch as I think it's very premature to do so before we have some C code
> using it.
>
> Just to explain once more and hopefully more clearly how the crash
> safety/WAL logging is handled since there was some confusion in last
> review:
> We only do WAL logging when nodeid is also logged (when nodeid is not 0)
> because the timestamp itself can be read from WAL record of transaction
> commit so it's pointless to log another WAL record just to store the
> timestamp again.
> Extension can either set default nodeid which is then logged
> transparently, or can override the default logging mechanism by calling
> TransactionTreeSetCommitTsData with whatever data it wants and do_xlog
> set to true which will then write the WAL record with this overriding
> information.
> During WAL replay the commit timestamp is set from transaction commit
> record and then if committs WAL record is found it's used to overwrite
> the commit timestamp/nodeid for given xid.
>
> Also, there is exactly one record in SLRU for each xid so there is no
> point in making the SQL interfaces return multiple results.
>

And here is v10 which fixes conflicts with Heikki's WAL API changes (no
changes otherwise).


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Alvaro Herrera
Date:
> And here is v10 which fixes conflicts with Heikki's WAL API changes (no
> changes otherwise).

After some slight additional changes, here's v11, which I intend to
commit early tomorrow.  The main change is moving the test module from
contrib to src/test/modules.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Fujii Masao
Date:
On Tue, Nov 25, 2014 at 7:58 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> And here is v10 which fixes conflicts with Heikki's WAL API changes (no
>> changes otherwise).
>
> After some slight additional changes, here's v11, which I intend to
> commit early tomorrow.  The main change is moving the test module from
> contrib to src/test/modules.

When I specify the XID of the aborted transaction in pg_xact_commit_timestamp(),
it always returns 2000-01-01 09:00:00+09. Is this intentional?

Can I check my understanding? Probably we cannot use this feature to calculate
the actual replication lag by, for example, comparing the result of
pg_last_committed_xact() in the master and that of
pg_last_xact_replay_timestamp()
in the standby. Because pg_last_xact_replay_timestamp() can return even
the timestamp of aborted transaction, but pg_last_committed_xact()
cannot. Right?

Regards,

-- 
Fujii Masao



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Fujii Masao wrote:
> On Tue, Nov 25, 2014 at 7:58 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> >> And here is v10 which fixes conflicts with Heikki's WAL API changes (no
> >> changes otherwise).
> >
> > After some slight additional changes, here's v11, which I intend to
> > commit early tomorrow.  The main change is moving the test module from
> > contrib to src/test/modules.
> 
> When I specify the XID of the aborted transaction in pg_xact_commit_timestamp(),
> it always returns 2000-01-01 09:00:00+09. Is this intentional?

Well, when a transaction has not committed, nothing is written so on
reading we get all zeroes which corresponds to the timestamp you give.
So yeah, it is intentional.  We could alternatively check pg_clog and
raise an error if the transaction is not marked either COMMITTED or
SUBCOMMITTED, but I'm not real sure there's much point.

The other option is to record a "commit" time for aborted transactions
too, but that doesn't seem very good either: first, this doesn't do
anything for crashed or for in-progress transactions; and second, how 
does it make sense to have a "commit" time for a transaction that
doesn't actually commit?

> Can I check my understanding? Probably we cannot use this feature to calculate
> the actual replication lag by, for example, comparing the result of
> pg_last_committed_xact() in the master and that of
> pg_last_xact_replay_timestamp()
> in the standby. Because pg_last_xact_replay_timestamp() can return even
> the timestamp of aborted transaction, but pg_last_committed_xact()
> cannot. Right?

I don't think it's suited for that.  I guess if you recorded the time
of the last transaction that actually committed, you can use that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Fujii Masao
Date:
On Tue, Nov 25, 2014 at 11:19 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Fujii Masao wrote:
>> On Tue, Nov 25, 2014 at 7:58 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>> >> And here is v10 which fixes conflicts with Heikki's WAL API changes (no
>> >> changes otherwise).
>> >
>> > After some slight additional changes, here's v11, which I intend to
>> > commit early tomorrow.  The main change is moving the test module from
>> > contrib to src/test/modules.
>>
>> When I specify the XID of the aborted transaction in pg_xact_commit_timestamp(),
>> it always returns 2000-01-01 09:00:00+09. Is this intentional?
>
> Well, when a transaction has not committed, nothing is written so on
> reading we get all zeroes which corresponds to the timestamp you give.
> So yeah, it is intentional.  We could alternatively check pg_clog and
> raise an error if the transaction is not marked either COMMITTED or
> SUBCOMMITTED, but I'm not real sure there's much point.
>
> The other option is to record a "commit" time for aborted transactions
> too, but that doesn't seem very good either: first, this doesn't do
> anything for crashed or for in-progress transactions; and second, how
> does it make sense to have a "commit" time for a transaction that
> doesn't actually commit?

What about the PREPARE and COMMIT PREPARED transactions?
ISTM that this feature tracks neither of them.

Regards,

-- 
Fujii Masao



Re: tracking commit timestamps

From
Robert Haas
Date:
On Tue, Nov 25, 2014 at 9:19 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Fujii Masao wrote:
>> On Tue, Nov 25, 2014 at 7:58 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>> >> And here is v10 which fixes conflicts with Heikki's WAL API changes (no
>> >> changes otherwise).
>> >
>> > After some slight additional changes, here's v11, which I intend to
>> > commit early tomorrow.  The main change is moving the test module from
>> > contrib to src/test/modules.
>>
>> When I specify the XID of the aborted transaction in pg_xact_commit_timestamp(),
>> it always returns 2000-01-01 09:00:00+09. Is this intentional?
>
> Well, when a transaction has not committed, nothing is written so on
> reading we get all zeroes which corresponds to the timestamp you give.
> So yeah, it is intentional.  We could alternatively check pg_clog and
> raise an error if the transaction is not marked either COMMITTED or
> SUBCOMMITTED, but I'm not real sure there's much point.

Maybe 0 should get translated to a NULL return, instead of a bogus timestamp.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Tue, Nov 25, 2014 at 9:19 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Fujii Masao wrote:
> >> On Tue, Nov 25, 2014 at 7:58 AM, Alvaro Herrera
> >> <alvherre@2ndquadrant.com> wrote:
> >> >> And here is v10 which fixes conflicts with Heikki's WAL API changes (no
> >> >> changes otherwise).
> >> >
> >> > After some slight additional changes, here's v11, which I intend to
> >> > commit early tomorrow.  The main change is moving the test module from
> >> > contrib to src/test/modules.
> >>
> >> When I specify the XID of the aborted transaction in pg_xact_commit_timestamp(),
> >> it always returns 2000-01-01 09:00:00+09. Is this intentional?
> >
> > Well, when a transaction has not committed, nothing is written so on
> > reading we get all zeroes which corresponds to the timestamp you give.
> > So yeah, it is intentional.  We could alternatively check pg_clog and
> > raise an error if the transaction is not marked either COMMITTED or
> > SUBCOMMITTED, but I'm not real sure there's much point.
> 
> Maybe 0 should get translated to a NULL return, instead of a bogus timestamp.

That's one idea --- surely no transaction is going to commit at 00:00:00
on 2000-01-01 anymore.  Yet this is somewhat discomforting.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 25/11/14 16:23, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Tue, Nov 25, 2014 at 9:19 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> Fujii Masao wrote:
>>>> On Tue, Nov 25, 2014 at 7:58 AM, Alvaro Herrera
>>>> <alvherre@2ndquadrant.com> wrote:
>>>>>> And here is v10 which fixes conflicts with Heikki's WAL API changes (no
>>>>>> changes otherwise).
>>>>>
>>>>> After some slight additional changes, here's v11, which I intend to
>>>>> commit early tomorrow.  The main change is moving the test module from
>>>>> contrib to src/test/modules.
>>>>
>>>> When I specify the XID of the aborted transaction in pg_xact_commit_timestamp(),
>>>> it always returns 2000-01-01 09:00:00+09. Is this intentional?
>>>
>>> Well, when a transaction has not committed, nothing is written so on
>>> reading we get all zeroes which corresponds to the timestamp you give.
>>> So yeah, it is intentional.  We could alternatively check pg_clog and
>>> raise an error if the transaction is not marked either COMMITTED or
>>> SUBCOMMITTED, but I'm not real sure there's much point.
>>
>> Maybe 0 should get translated to a NULL return, instead of a bogus timestamp.
>
> That's one idea --- surely no transaction is going to commit at 00:00:00
> on 2000-01-01 anymore.  Yet this is somewhat discomforting.
>

I solved it for xids that are out of range by returning -infinity and 
then changing that to NULL in sql interface, but no idea how to do that 
for aborted transactions.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Petr Jelinek wrote:
> On 25/11/14 16:23, Alvaro Herrera wrote:
> >Robert Haas wrote:

> >>Maybe 0 should get translated to a NULL return, instead of a bogus timestamp.
> >
> >That's one idea --- surely no transaction is going to commit at 00:00:00
> >on 2000-01-01 anymore.  Yet this is somewhat discomforting.
> 
> I solved it for xids that are out of range by returning -infinity and then
> changing that to NULL in sql interface, but no idea how to do that for
> aborted transactions.

I guess the idea is that we just read the value from the slru and if it
exactly matches allballs we do the same -infinity return and translation
to NULL.  (Do we really love this -infinity idea?  If it's just an
internal API we can use a boolean instead.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 25/11/14 16:30, Alvaro Herrera wrote:
> Petr Jelinek wrote:
>> On 25/11/14 16:23, Alvaro Herrera wrote:
>>> Robert Haas wrote:
>
>>>> Maybe 0 should get translated to a NULL return, instead of a bogus timestamp.
>>>
>>> That's one idea --- surely no transaction is going to commit at 00:00:00
>>> on 2000-01-01 anymore.  Yet this is somewhat discomforting.
>>
>> I solved it for xids that are out of range by returning -infinity and then
>> changing that to NULL in sql interface, but no idea how to do that for
>> aborted transactions.
>
> I guess the idea is that we just read the value from the slru and if it
> exactly matches allballs we do the same -infinity return and translation
> to NULL.  (Do we really love this -infinity idea?  If it's just an
> internal API we can use a boolean instead.)
>

As in returning boolean instead of void as "found"? That works for me 
(for the C interface).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 25 November 2014 at 13:35, Fujii Masao <masao.fujii@gmail.com> wrote:

> Can I check my understanding? Probably we cannot use this feature to calculate
> the actual replication lag by, for example, comparing the result of
> pg_last_committed_xact() in the master and that of
> pg_last_xact_replay_timestamp()
> in the standby. Because pg_last_xact_replay_timestamp() can return even
> the timestamp of aborted transaction, but pg_last_committed_xact()
> cannot. Right?

It was intended for that, but I forgot that
pg_last_xact_replay_timestamp() includes abort as well.

I suggest we add a function that returns both the xid and timestamp on
the standby:
* pg_last_commit_replay_info() - which returns both the xid and
timestamp of the last commit replayed on standby
* then we use the xid from the standby to lookup the commit timestamp
on the master.
We then have two timestamps that refer to the same xid and can
subtract to give us an exact replication lag.

That can be done manually by user if requested.

We can also do that by sending the replay info back as a feedback
message from standby to master, so the information can be calculated
by pg_stat_replication when requested.

I'll work on that once we have this current patch committed.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 25/11/14 17:16, Simon Riggs wrote:
> On 25 November 2014 at 13:35, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>> Can I check my understanding? Probably we cannot use this feature to calculate
>> the actual replication lag by, for example, comparing the result of
>> pg_last_committed_xact() in the master and that of
>> pg_last_xact_replay_timestamp()
>> in the standby. Because pg_last_xact_replay_timestamp() can return even
>> the timestamp of aborted transaction, but pg_last_committed_xact()
>> cannot. Right?
>
> It was intended for that, but I forgot that
> pg_last_xact_replay_timestamp() includes abort as well.
>
> I suggest we add a function that returns both the xid and timestamp on
> the standby:
> * pg_last_commit_replay_info() - which returns both the xid and
> timestamp of the last commit replayed on standby
> * then we use the xid from the standby to lookup the commit timestamp
> on the master.
> We then have two timestamps that refer to the same xid and can
> subtract to give us an exact replication lag.
>

Won't the pg_last_committed_xact() on slave + pg_xact_commit_timestamp() 
on master with the xid returned by slave accomplish the same thing?


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 25 November 2014 at 16:18, Petr Jelinek <petr@2ndquadrant.com> wrote:

> Won't the pg_last_committed_xact() on slave + pg_xact_commit_timestamp() on
> master with the xid returned by slave accomplish the same thing?

Surely the pg_last_committed_xact() will return the same value on
standby as it did on the master?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Michael Paquier
Date:
On Wed, Nov 26, 2014 at 1:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 25 November 2014 at 16:18, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>> Won't the pg_last_committed_xact() on slave + pg_xact_commit_timestamp() on
>> master with the xid returned by slave accomplish the same thing?
>
> Surely the pg_last_committed_xact() will return the same value on
> standby as it did on the master?

It should. Now it needs some extra help as well as in its current
shape this patch will WAL log a commit timestamp if the Node ID is
valid, per RecordTransactionCommit. The node ID can be set only
through CommitTsSetDefaultNodeId, which is called nowhere actually. So
if an extension or an extra library needs to do some leg work to have
to allow this information to be replayed on other nodes.
Regards,
-- 
Michael



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Petr Jelinek wrote:
> On 25/11/14 16:30, Alvaro Herrera wrote:
> >Petr Jelinek wrote:
> >>On 25/11/14 16:23, Alvaro Herrera wrote:
> >>>Robert Haas wrote:
> >
> >>>>Maybe 0 should get translated to a NULL return, instead of a bogus timestamp.
> >>>
> >>>That's one idea --- surely no transaction is going to commit at 00:00:00
> >>>on 2000-01-01 anymore.  Yet this is somewhat discomforting.
> >>
> >>I solved it for xids that are out of range by returning -infinity and then
> >>changing that to NULL in sql interface, but no idea how to do that for
> >>aborted transactions.
> >
> >I guess the idea is that we just read the value from the slru and if it
> >exactly matches allballs we do the same -infinity return and translation
> >to NULL.  (Do we really love this -infinity idea?  If it's just an
> >internal API we can use a boolean instead.)
>
> As in returning boolean instead of void as "found"? That works for me
> (for the C interface).

Petr sent me privately some changes to implement this idea.  I also
reworked the tests so that they only happen on src/test/modules (getting
rid of the one in core regress) and made them work with both enabled and
disabled track_commit_timestamps; in make installcheck, they pass
regardless of the setting of the installed server, and in make check,
they run a server with the setting enabled.

I made two more changes:
1. introduce newestCommitTs.  Original code was using lastCommitXact to
check that no "future" transaction is asked for, but this doesn't really
work if a long-running transaction is committed, because asking for
transactions with a higher Xid but which were committed earlier would
raise an error.

2. change CommitTsControlLock to CommitTsLock as the lock that protects
the stuff we keep in ShmemVariableCache.  The Control one is for SLRU
access, and so it might be slow at times.  This is important because we
fill the checkpoint struct from values protected by that lock, so a
checkpoint might be delayed if it happens to land in the middle of a
slru IO operation.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Pushed with some extra cosmetic tweaks.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 03/12/14 15:54, Alvaro Herrera wrote:
> Pushed with some extra cosmetic tweaks.
>

Cool, thanks!

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Mon, Dec 1, 2014 at 5:34 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> I made two more changes:
> 1. introduce newestCommitTs.  Original code was using lastCommitXact to
> check that no "future" transaction is asked for, but this doesn't really
> work if a long-running transaction is committed, because asking for
> transactions with a higher Xid but which were committed earlier would
> raise an error.

I'm kind of disappointed that, in spite of previous review comments,
this got committed with extensive use of the CommitTs naming.  I think
that's confusing, but it's also something that will be awkward if we
want to add other data, such as the much-discussed commit LSN, to the
facility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Mon, Dec 1, 2014 at 5:34 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > I made two more changes:
> > 1. introduce newestCommitTs.  Original code was using lastCommitXact to
> > check that no "future" transaction is asked for, but this doesn't really
> > work if a long-running transaction is committed, because asking for
> > transactions with a higher Xid but which were committed earlier would
> > raise an error.
> 
> I'm kind of disappointed that, in spite of previous review comments,
> this got committed with extensive use of the CommitTs naming.  I think
> that's confusing, but it's also something that will be awkward if we
> want to add other data, such as the much-discussed commit LSN, to the
> facility.

I never saw a comment that CommitTs was an unwanted name.  There were
some that said that committs wasn't liked because it looked like a
misspelling, so we added an underscore -- stuff in lower case is
commit_ts everywhere.  Stuff in camel case didn't get the underscore
because it didn't seem necessary.  But other than that issue, the name
wasn't questioned, as far as I'm aware.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Robert Haas wrote:

> > I'm kind of disappointed that, in spite of previous review comments,
> > this got committed with extensive use of the CommitTs naming.  I think
> > that's confusing, but it's also something that will be awkward if we
> > want to add other data, such as the much-discussed commit LSN, to the
> > facility.
> 
> I never saw a comment that CommitTs was an unwanted name.  There were
> some that said that committs wasn't liked because it looked like a
> misspelling, so we added an underscore -- stuff in lower case is
> commit_ts everywhere.  Stuff in camel case didn't get the underscore
> because it didn't seem necessary.  But other than that issue, the name
> wasn't questioned, as far as I'm aware.

I found one email where you said you didn't like committs and preferred
commit_timestamp instead.  I don't see how making that change would have
made you happy wrt the concern you just expressed.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Wed, Dec 3, 2014 at 2:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Alvaro Herrera wrote:
>> Robert Haas wrote:
>> > I'm kind of disappointed that, in spite of previous review comments,
>> > this got committed with extensive use of the CommitTs naming.  I think
>> > that's confusing, but it's also something that will be awkward if we
>> > want to add other data, such as the much-discussed commit LSN, to the
>> > facility.
>>
>> I never saw a comment that CommitTs was an unwanted name.  There were
>> some that said that committs wasn't liked because it looked like a
>> misspelling, so we added an underscore -- stuff in lower case is
>> commit_ts everywhere.  Stuff in camel case didn't get the underscore
>> because it didn't seem necessary.  But other than that issue, the name
>> wasn't questioned, as far as I'm aware.
>
> I found one email where you said you didn't like committs and preferred
> commit_timestamp instead.  I don't see how making that change would have
> made you happy wrt the concern you just expressed.

Fair point.

I'm still not sure we got this one right, but I don't know that I want
to spend more time wrangling about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Fujii Masao
Date:
On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Pushed with some extra cosmetic tweaks.

I got the following assertion failure when I executed pg_xact_commit_timestamp()
in the standby server.

=# select pg_xact_commit_timestamp('1000'::xid);
TRAP: FailedAssertion("!(((oldestCommitTs) != ((TransactionId) 0)) ==
((newestCommitTs) != ((TransactionId) 0)))", File: "commit_ts.c",
Line: 315)
server closed the connection unexpectedly   This probably means the server terminated abnormally   before or while
processingthe request.
 
The connection to the server was lost. Attempting reset: 2014-12-04
12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
signal 6: Aborted
2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
select pg_xact_commit_timestamp('1000'::xid);

The way to reproduce this problem is

#1. set up and start the master and standby servers with
track_commit_timestamp disabled
#2. enable track_commit_timestamp in the master and restart the master
#3. run some write transactions
#4. enable track_commit_timestamp in the standby and restart the standby
#5. execute "select pg_xact_commit_timestamp('1000'::xid)" in the standby

BTW, at the step #4, I got the following log messages. This might be a hint for
this problem.

LOG:  file "pg_commit_ts/0000" doesn't exist, reading as zeroes
CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
relcache 16384

Regards,

-- 
Fujii Masao



Re: tracking commit timestamps

From
Simon Riggs
Date:
On 4 December 2014 at 03:08, Fujii Masao <masao.fujii@gmail.com> wrote:

> #1. set up and start the master and standby servers with
> track_commit_timestamp disabled
> #2. enable track_commit_timestamp in the master and restart the master
> #3. run some write transactions
> #4. enable track_commit_timestamp in the standby and restart the standby
> #5. execute "select pg_xact_commit_timestamp('1000'::xid)" in the standby

I'm not sure what step4 is supposed to do?

Surely if steps 1-3 generate any WAL then the standby should replay
it, whether or not track_commit_timestamp is enabled.

So what effect does setting that parameter on the standby?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Fujii Masao
Date:
On Thu, Dec 4, 2014 at 12:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 4 December 2014 at 03:08, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>> #1. set up and start the master and standby servers with
>> track_commit_timestamp disabled
>> #2. enable track_commit_timestamp in the master and restart the master
>> #3. run some write transactions
>> #4. enable track_commit_timestamp in the standby and restart the standby
>> #5. execute "select pg_xact_commit_timestamp('1000'::xid)" in the standby
>
> I'm not sure what step4 is supposed to do?
>
> Surely if steps 1-3 generate any WAL then the standby should replay
> it, whether or not track_commit_timestamp is enabled.
>
> So what effect does setting that parameter on the standby?

At least track_commit_timestamp seems to need to be enabled even in the standby
when we want to call pg_xact_commit_timestamp() and pg_last_committed_xact()
in the standby. I'm not sure if this is good design, though.

Regards,

-- 
Fujii Masao



Re: tracking commit timestamps

From
Noah Misch
Date:
On Wed, Dec 03, 2014 at 11:54:38AM -0300, Alvaro Herrera wrote:
> Pushed with some extra cosmetic tweaks.

The commit_ts test suite gives me the attached diff on a 32-bit MinGW build
running on 64-bit Windows Server 2003.  I have not checked other Windows
configurations; the suite does pass on GNU/Linux.

Attachment

Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 08/12/14 00:56, Noah Misch wrote:
> On Wed, Dec 03, 2014 at 11:54:38AM -0300, Alvaro Herrera wrote:
>> Pushed with some extra cosmetic tweaks.
>
> The commit_ts test suite gives me the attached diff on a 32-bit MinGW build
> running on 64-bit Windows Server 2003.  I have not checked other Windows
> configurations; the suite does pass on GNU/Linux.
>

Hmm I wonder if "< now()" needs to be changed to "<= now()" in those 
queries to make them work correctly on that plarform, I don't have 
machine with that environment handy right now, so I would appreciate if 
you could try that, in case you don't have time for that, I will try to 
setup something later...

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Noah Misch
Date:
On Mon, Dec 08, 2014 at 02:23:39AM +0100, Petr Jelinek wrote:
> On 08/12/14 00:56, Noah Misch wrote:
> >The commit_ts test suite gives me the attached diff on a 32-bit MinGW build
> >running on 64-bit Windows Server 2003.  I have not checked other Windows
> >configurations; the suite does pass on GNU/Linux.
> 
> Hmm I wonder if "< now()" needs to be changed to "<= now()" in those queries
> to make them work correctly on that plarform, I don't have machine with that
> environment handy right now, so I would appreciate if you could try that, in
> case you don't have time for that, I will try to setup something later...

I will try that, though perhaps not until next week.



Re: tracking commit timestamps

From
Michael Paquier
Date:
On Wed, Dec 10, 2014 at 6:50 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Dec 08, 2014 at 02:23:39AM +0100, Petr Jelinek wrote:
>> On 08/12/14 00:56, Noah Misch wrote:
>> >The commit_ts test suite gives me the attached diff on a 32-bit MinGW build
>> >running on 64-bit Windows Server 2003.  I have not checked other Windows
>> >configurations; the suite does pass on GNU/Linux.
>>
>> Hmm I wonder if "< now()" needs to be changed to "<= now()" in those queries
>> to make them work correctly on that plarform, I don't have machine with that
>> environment handy right now, so I would appreciate if you could try that, in
>> case you don't have time for that, I will try to setup something later...
>
> I will try that, though perhaps not until next week.

FWIW, I just tried that with MinGW-32 and I can see the error on Win7.
I also checked that changing "< now()" to "<= now()" fixed the
problem, so your assumption was right, Petr.
Regards,
--
Michael

Attachment

Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 15/12/14 09:12, Michael Paquier wrote:
> On Wed, Dec 10, 2014 at 6:50 PM, Noah Misch <noah@leadboat.com> wrote:
>> On Mon, Dec 08, 2014 at 02:23:39AM +0100, Petr Jelinek wrote:
>>> On 08/12/14 00:56, Noah Misch wrote:
>>>> The commit_ts test suite gives me the attached diff on a 32-bit MinGW build
>>>> running on 64-bit Windows Server 2003.  I have not checked other Windows
>>>> configurations; the suite does pass on GNU/Linux.
>>>
>>> Hmm I wonder if "< now()" needs to be changed to "<= now()" in those queries
>>> to make them work correctly on that plarform, I don't have machine with that
>>> environment handy right now, so I would appreciate if you could try that, in
>>> case you don't have time for that, I will try to setup something later...
>>
>> I will try that, though perhaps not until next week.
>
> FWIW, I just tried that with MinGW-32 and I can see the error on Win7.
> I also checked that changing "< now()" to "<= now()" fixed the
> problem, so your assumption was right, Petr.
> Regards,
>

Cool, thanks, I think it was the time granularity problem in Windows.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Noah Misch
Date:
On Mon, Dec 15, 2014 at 12:12:10AM -0800, Michael Paquier wrote:
> On Wed, Dec 10, 2014 at 6:50 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Dec 08, 2014 at 02:23:39AM +0100, Petr Jelinek wrote:
> >> On 08/12/14 00:56, Noah Misch wrote:
> >> >The commit_ts test suite gives me the attached diff on a 32-bit MinGW build
> >> >running on 64-bit Windows Server 2003.  I have not checked other Windows
> >> >configurations; the suite does pass on GNU/Linux.
> >>
> >> Hmm I wonder if "< now()" needs to be changed to "<= now()" in those queries
> >> to make them work correctly on that plarform, I don't have machine with that
> >> environment handy right now, so I would appreciate if you could try that, in
> >> case you don't have time for that, I will try to setup something later...
> >
> > I will try that, though perhaps not until next week.
> 
> FWIW, I just tried that with MinGW-32 and I can see the error on Win7.
> I also checked that changing "< now()" to "<= now()" fixed the
> problem, so your assumption was right, Petr.

Committed, after fixing the alternate expected output.



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Noah Misch wrote:
> On Mon, Dec 15, 2014 at 12:12:10AM -0800, Michael Paquier wrote:

> > FWIW, I just tried that with MinGW-32 and I can see the error on Win7.
> > I also checked that changing "< now()" to "<= now()" fixed the
> > problem, so your assumption was right, Petr.
> 
> Committed, after fixing the alternate expected output.

Thanks.  I admit I don't understand what the issue is.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: tracking commit timestamps

From
Noah Misch
Date:
On Tue, Dec 16, 2014 at 01:05:31AM -0300, Alvaro Herrera wrote:
> Noah Misch wrote:
> > On Mon, Dec 15, 2014 at 12:12:10AM -0800, Michael Paquier wrote:
> 
> > > FWIW, I just tried that with MinGW-32 and I can see the error on Win7.
> > > I also checked that changing "< now()" to "<= now()" fixed the
> > > problem, so your assumption was right, Petr.
> > 
> > Committed, after fixing the alternate expected output.
> 
> Thanks.  I admit I don't understand what the issue is.

The test assumed that no two transactions of a given backend will get the same
timestamp value from now().  That holds so long as ticks of the system time
are small enough.  Not so on at least some Windows configurations.  Notice the
repeated timestamp values:
   Windows Server 2003 x64, 32-bit build w/ VS2010

localhost template1=# select clock_timestamp(), pg_sleep(.1 * (n % 2)) from generate_series(0,7) t(n);
clock_timestamp       | pg_sleep
 
-------------------------------+----------2014-12-18 08:34:34.522126+00 |2014-12-18 08:34:34.522126+00 |2014-12-18
08:34:34.631508+00|2014-12-18 08:34:34.631508+00 |2014-12-18 08:34:34.74089+00  |2014-12-18 08:34:34.74089+00
|2014-12-1808:34:34.850272+00 |2014-12-18 08:34:34.850272+00 |
 
(8 rows)
   GNU/Linux

[local] test=# select clock_timestamp(), pg_sleep(.1 * (n % 2)) from generate_series(0,7) t(n);       clock_timestamp
    | pg_sleep
 
-------------------------------+----------2014-12-19 06:49:47.590556+00 |2014-12-19 06:49:47.590611+00 |2014-12-19
06:49:47.691488+00|2014-12-19 06:49:47.691508+00 |2014-12-19 06:49:47.801483+00 |2014-12-19 06:49:47.801502+00
|2014-12-1906:49:47.921486+00 |2014-12-19 06:49:47.921505+00 |
 
(8 rows)



Re: tracking commit timestamps

From
Fujii Masao
Date:
On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Pushed with some extra cosmetic tweaks.
>
> I got the following assertion failure when I executed pg_xact_commit_timestamp()
> in the standby server.
>
> =# select pg_xact_commit_timestamp('1000'::xid);
> TRAP: FailedAssertion("!(((oldestCommitTs) != ((TransactionId) 0)) ==
> ((newestCommitTs) != ((TransactionId) 0)))", File: "commit_ts.c",
> Line: 315)
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: 2014-12-04
> 12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
> signal 6: Aborted
> 2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
> select pg_xact_commit_timestamp('1000'::xid);
>
> The way to reproduce this problem is
>
> #1. set up and start the master and standby servers with
> track_commit_timestamp disabled
> #2. enable track_commit_timestamp in the master and restart the master
> #3. run some write transactions
> #4. enable track_commit_timestamp in the standby and restart the standby
> #5. execute "select pg_xact_commit_timestamp('1000'::xid)" in the standby
>
> BTW, at the step #4, I got the following log messages. This might be a hint for
> this problem.
>
> LOG:  file "pg_commit_ts/0000" doesn't exist, reading as zeroes
> CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
> inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
> 45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
> relcache 16384

This problem still happens in the master.

Regards,

-- 
Fujii Masao



Re: tracking commit timestamps

From
Craig Ringer
Date:
On 12/19/2014 02:53 PM, Noah Misch wrote:
> The test assumed that no two transactions of a given backend will get the same
> timestamp value from now().  That holds so long as ticks of the system time
> are small enough.  Not so on at least some Windows configurations.

Most Windows systems with nothing else running will have 15 ms timer
granularity. So multiple timestamps allocated within the same
millisecond will have the same value for timestamps captured within that
interval.

If you're running other programs that use the multimedia timer APIs
(including Google Chrome, MS SQL Server, and all sorts of other apps you
might not expect) you'll probably have 1ms timer granularity instead.

Since PostgreSQL 9.4 and below capture time on Windows using
GetSystemTime the sub-millisecond part is lost anyway. On 9.5 it's
retained but will usually be some fixed value because the timer tick is
still 1ms.

If you're on Windows 8 or Windows 2012 and running PostgreSQL 9.5
(master), but not earlier versions, you'll get sub-microsecond
resolution like on sensible platforms.

Some details here: https://github.com/2ndQuadrant/pg_sysdatetime

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 05/01/15 07:28, Fujii Masao wrote:
> On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> Pushed with some extra cosmetic tweaks.
>>
>> I got the following assertion failure when I executed pg_xact_commit_timestamp()
>> in the standby server.
>>
>> =# select pg_xact_commit_timestamp('1000'::xid);
>> TRAP: FailedAssertion("!(((oldestCommitTs) != ((TransactionId) 0)) ==
>> ((newestCommitTs) != ((TransactionId) 0)))", File: "commit_ts.c",
>> Line: 315)
>> server closed the connection unexpectedly
>>      This probably means the server terminated abnormally
>>      before or while processing the request.
>> The connection to the server was lost. Attempting reset: 2014-12-04
>> 12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
>> signal 6: Aborted
>> 2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
>> select pg_xact_commit_timestamp('1000'::xid);
>>
>> The way to reproduce this problem is
>>
>> #1. set up and start the master and standby servers with
>> track_commit_timestamp disabled
>> #2. enable track_commit_timestamp in the master and restart the master
>> #3. run some write transactions
>> #4. enable track_commit_timestamp in the standby and restart the standby
>> #5. execute "select pg_xact_commit_timestamp('1000'::xid)" in the standby
>>
>> BTW, at the step #4, I got the following log messages. This might be a hint for
>> this problem.
>>
>> LOG:  file "pg_commit_ts/0000" doesn't exist, reading as zeroes
>> CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
>> inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
>> 45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
>> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
>> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
>> relcache 16384
>
> This problem still happens in the master.
>
> Regards,
>

Attached patch fixes it, I am not sure how happy I am with the way I did
it though.

And while at it I noticed that redo code for XLOG_PARAMETER_CHANGE sets
        ControlFile->wal_log_hints = wal_log_hints;
shouldn't it be
        ControlFile->wal_log_hints = xlrec.wal_log_hints;
instead?

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 05/01/15 16:17, Petr Jelinek wrote:
> On 05/01/15 07:28, Fujii Masao wrote:
>> On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao <masao.fujii@gmail.com>
>> wrote:
>>> On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
>>> <alvherre@2ndquadrant.com> wrote:
>>>> Pushed with some extra cosmetic tweaks.
>>>
>>> I got the following assertion failure when I executed
>>> pg_xact_commit_timestamp()
>>> in the standby server.
>>>
>>> =# select pg_xact_commit_timestamp('1000'::xid);
>>> TRAP: FailedAssertion("!(((oldestCommitTs) != ((TransactionId) 0)) ==
>>> ((newestCommitTs) != ((TransactionId) 0)))", File: "commit_ts.c",
>>> Line: 315)
>>> server closed the connection unexpectedly
>>>      This probably means the server terminated abnormally
>>>      before or while processing the request.
>>> The connection to the server was lost. Attempting reset: 2014-12-04
>>> 12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
>>> signal 6: Aborted
>>> 2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
>>> select pg_xact_commit_timestamp('1000'::xid);
>>>
>>> The way to reproduce this problem is
>>>
>>> #1. set up and start the master and standby servers with
>>> track_commit_timestamp disabled
>>> #2. enable track_commit_timestamp in the master and restart the master
>>> #3. run some write transactions
>>> #4. enable track_commit_timestamp in the standby and restart the standby
>>> #5. execute "select pg_xact_commit_timestamp('1000'::xid)" in the
>>> standby
>>>
>>> BTW, at the step #4, I got the following log messages. This might be
>>> a hint for
>>> this problem.
>>>
>>> LOG:  file "pg_commit_ts/0000" doesn't exist, reading as zeroes
>>> CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
>>> inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
>>> 45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
>>> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
>>> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
>>> relcache 16384
>>
>> This problem still happens in the master.
>>
>> Regards,
>>
>
> Attached patch fixes it, I am not sure how happy I am with the way I did
> it though.
>

Added more comments and made the error message bit clearer.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: tracking commit timestamps

From
Michael Paquier
Date:
On Fri, Dec 19, 2014 at 3:53 PM, Noah Misch <noah@leadboat.com> wrote:
> localhost template1=# select clock_timestamp(), pg_sleep(.1 * (n % 2)) from generate_series(0,7) t(n);
>         clock_timestamp        | pg_sleep
> -------------------------------+----------
>  2014-12-18 08:34:34.522126+00 |
>  2014-12-18 08:34:34.522126+00 |
>  2014-12-18 08:34:34.631508+00 |
>  2014-12-18 08:34:34.631508+00 |
>  2014-12-18 08:34:34.74089+00  |
>  2014-12-18 08:34:34.74089+00  |
>  2014-12-18 08:34:34.850272+00 |
>  2014-12-18 08:34:34.850272+00 |
> (8 rows)
So, we would need additional information other than the node ID *and*
the timestamp to ensure proper transaction commit ordering on Windows.
That's not cool and makes this feature very limited on this platform.
-- 
Michael



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 06/01/15 08:58, Michael Paquier wrote:
> On Fri, Dec 19, 2014 at 3:53 PM, Noah Misch <noah@leadboat.com> wrote:
>> localhost template1=# select clock_timestamp(), pg_sleep(.1 * (n % 2)) from generate_series(0,7) t(n);
>>          clock_timestamp        | pg_sleep
>> -------------------------------+----------
>>   2014-12-18 08:34:34.522126+00 |
>>   2014-12-18 08:34:34.522126+00 |
>>   2014-12-18 08:34:34.631508+00 |
>>   2014-12-18 08:34:34.631508+00 |
>>   2014-12-18 08:34:34.74089+00  |
>>   2014-12-18 08:34:34.74089+00  |
>>   2014-12-18 08:34:34.850272+00 |
>>   2014-12-18 08:34:34.850272+00 |
>> (8 rows)
> So, we would need additional information other than the node ID *and*
> the timestamp to ensure proper transaction commit ordering on Windows.
> That's not cool and makes this feature very limited on this platform.
>

Well that's Windows time api for you, it affects everything that deals 
with timestamps though, not just commit ts. Note that the precision 
depends on hardware and other software that was running on the computer 
(there is undocumented api to increase the resolution, also use of 
multimedia timer increases resolution, etc).

The good news is that MS provides new high precision time API in Windows 
8 and Windows Server 2012 which we are using thanks to 
519b0757a37254452e013ea0ac95f4e56391608c so we are good at least on 
modern systems.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Robert Haas
Date:
On Tue, Jan 6, 2015 at 2:58 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> So, we would need additional information other than the node ID *and*
> the timestamp to ensure proper transaction commit ordering on Windows.
> That's not cool and makes this feature very limited on this platform.

You can't use the timestamp alone for commit ordering on any platform.
Eventually, two transactions will manage to commit in a single clock
tick, no matter how short that is.

Now, if we'd included the LSN in there...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: tracking commit timestamps

From
Petr Jelinek
Date:
On 05/01/15 17:50, Petr Jelinek wrote:
> On 05/01/15 16:17, Petr Jelinek wrote:
>> On 05/01/15 07:28, Fujii Masao wrote:
>>> On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao <masao.fujii@gmail.com>
>>> wrote:
>>>> On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
>>>> <alvherre@2ndquadrant.com> wrote:
>>>>> Pushed with some extra cosmetic tweaks.
>>>>
>>>> I got the following assertion failure when I executed
>>>> pg_xact_commit_timestamp()
>>>> in the standby server.
>>>>
>>>> =# select pg_xact_commit_timestamp('1000'::xid);
>>>> TRAP: FailedAssertion("!(((oldestCommitTs) != ((TransactionId) 0)) ==
>>>> ((newestCommitTs) != ((TransactionId) 0)))", File: "commit_ts.c",
>>>> Line: 315)
>>>> server closed the connection unexpectedly
>>>>      This probably means the server terminated abnormally
>>>>      before or while processing the request.
>>>> The connection to the server was lost. Attempting reset: 2014-12-04
>>>> 12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
>>>> signal 6: Aborted
>>>> 2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
>>>> select pg_xact_commit_timestamp('1000'::xid);
>>>>
>>>> The way to reproduce this problem is
>>>>
>>>> #1. set up and start the master and standby servers with
>>>> track_commit_timestamp disabled
>>>> #2. enable track_commit_timestamp in the master and restart the master
>>>> #3. run some write transactions
>>>> #4. enable track_commit_timestamp in the standby and restart the
>>>> standby
>>>> #5. execute "select pg_xact_commit_timestamp('1000'::xid)" in the
>>>> standby
>>>>
>>>> BTW, at the step #4, I got the following log messages. This might be
>>>> a hint for
>>>> this problem.
>>>>
>>>> LOG:  file "pg_commit_ts/0000" doesn't exist, reading as zeroes
>>>> CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
>>>> inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
>>>> 45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
>>>> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
>>>> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
>>>> relcache 16384
>>>
>>> This problem still happens in the master.
>>>
>>> Regards,
>>>
>>
>> Attached patch fixes it, I am not sure how happy I am with the way I did
>> it though.
>>
>
> Added more comments and made the error message bit clearer.
>

Fujii, Alvaro, did one of you had chance to look at this fix?


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: tracking commit timestamps

From
Alvaro Herrera
Date:
Petr Jelinek wrote:

> >>On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao <masao.fujii@gmail.com>
> >>wrote:

> >>>I got the following assertion failure when I executed
> >>>pg_xact_commit_timestamp()
> >>>in the standby server.
> >>>
> >>>=# select pg_xact_commit_timestamp('1000'::xid);
> >>>TRAP: FailedAssertion("!(((oldestCommitTs) != ((TransactionId) 0)) ==
> >>>((newestCommitTs) != ((TransactionId) 0)))", File: "commit_ts.c",
> >>>Line: 315)

> >Attached patch fixes it, I am not sure how happy I am with the way I did
> >it though.

Pushed, thanks.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services