Re: tracking commit timestamps - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: tracking commit timestamps
Date
Msg-id 5464A9D7.2010006@2ndquadrant.com
Whole thread Raw
In response to Re: tracking commit timestamps  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 13/11/14 07:04, Michael Paquier wrote:
> On Wed, Nov 12, 2014 at 10:06 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> Brief list of changes:
>>   - the commit timestamp record now stores timestamp, lsn and nodeid
> Now that not only the commit timestamp is stored, calling that "commit
> timestamp", "committs" or "commit_timestamp" is strange, no? If this
> patch is moving toward being a more complex information provider,
> calling it commit information or commit data is more adapted, no?

It's not, since adding more info will break upgrades, I doubt we will 
add more anytime soon. I was thinking about it too tbh, but don't have 
better name (I don't like commit data as it seems confusing - isn't 
commit data the dml you just committed?).
Maybe commit_metadata, commit_information is probably ok also, this 
would need input from others, I am personally fine with keeping the 
commit_timestamp name too.

>
>>   - if the xid passed to get interface is out of range -infinity timestamp is
>> returned (I think it's bad idea to throw errors here as the valid range is
>> not static and same ID can start throwing errors between calls
>> theoretically)
> Already mentioned by Jim in a previous mail: this would be better as NULL.

Yeah that make sense, I have no idea what I was thinking :)

>
>>   - renamed the sql interfaces to pg_xact_commit_timestamp,
>> pg_xact_commit_timestamp_data and pg_last_committed_xact, they don't expose
>> the nodeid atm, I personally am not big fan of the "xact" but it seems more
>> consistent with existing naming
> pg_xact_commit_timestamp and pg_xact_commit_timestamp_data are
> overlapping. What's wrong with a single function able to return the
> whole set (node ID, commit timetamp, commit LSN)? Let's say

That's what pg_xact_commit_timestamp_data does (it does not show nodeid 
because we agreed that it should not be exposed yet on sql level). Might 
make sense to rename, but let's wait for input about the general 
renaming point at the beginning of the mail.

> pg_xact_commit_information or pg_xact_commit_data. Already mentioned,
> but I also find using a SRF able to return all the available
> information from a given XID value quite useful. And this does not
> conflict with what is proposed currently, you would need just to call
> the function with XID + number of entries wanted to get a single one.
> Comments from other folks about that?

No idea what you mean by this to be honest, there is exactly one record 
stored for single XID.

>
> Then more input about the latest patch:
> 1) This block is not needed, option -e is listed twice:
>      The <option>-o</>, <option>-x</>, <option>-e</>,
> -   <option>-m</>, <option>-O</>,
> -   and <option>-l</>
> +   <option>-m</>, <option>-O</>, <option>-l</>
> +   and <option>-e</>
> 2) Very small thing: a couple of files have no newlines at the end,
> among them committs.conf and test_committs/Makefile.
> 3) pg_last_committed_xact and not pg_last_xact_commit_information or similar?

Just inspiration from DB2's rollforward (which shows among other things 
"last committed transaction: <timestamp>"), but I don't feel strongly 
about naming so can be changed.

> 4) storage.sgml needs to be updated with the new folder pg_committs

Right.

> 5) Er.. node ID is missing in pg_last_committed_xact, no?

That's intentional (for now).

> 6) This XXX notice can be removed:
> +       /*
> +        * Return empty if the requested value is older than what we have or
> +        * newer than newest we have.
> +        *
> +        * XXX: should this be error instead?
> +        */

Ok.

> (Note that I am still a partisan of an error message to let the
> caller know that commit info history does not have the information
> requested).

IMHO throwing error there would be same as throwing error when WHERE 
clause in SELECT does not match anything. As the xid range for which we 
store data is dynamic we need to accept any xid as valid input because 
the caller has no way of validating if the xid passed is out of range or 
not.

> 7) Note that TransactionTreeSetCommitTsData still never sets do_xlog
> at true and that WriteSetTimestampXlogRec never gets called. So no
> information is WAL-logged with this patch. Wouldn't that be useful for
> standbys as well? Perhaps I am missing why this is disabled? This code
> should be activated IMO or it would be just untested.

True is only needed here when you are setting this info to different 
transaction than the one you are in since the info can be reconstructed 
from normal transaction WAL record (see that it's actually called from 
xact_redo_commit_internal, which is how we get the WAL safety and why it 
works on slave). So the true is for use by extensions only, it's not 
completely uncommon that we have APIs that are used only by extensions.

> 8) As a more general point, the node ID stuff makes me uncomfortable
> and is just added on top of the existing patch without much
> thinking... So I am really skeptical about it. The need here is to
> pass on demand a int8 that is a node ID that can only be set through a
> C interface, so only extensions could play with it. The data passed to
> a WAL record is always built and determined by the system and entirely
> transparent to the user, inserting user-defined data like that
> inconsistent with what we've been doing until now, no?

Again it's not exposed to SQL because I thought there was agreement to 
not do that yet since we might want to build some more core stuff on top 
of that before exposing it. It's part of the record now because it can 
be useful already the way it is and because adding it later would break 
pg_upgrade (it's int4 btw).

Also I would really not say it was added without thought, I am one of 
the BDR developers and I was before one of the Londiste developers so I 
did think about what I would want when in those shoes.

That being said I think I will remove the CommitTsSetDefaultNodeId 
interface in next revision, as extension can already set nodeid via 
TransactionTreeSetCommitTsData call and we might want to revisit the 
CommitTsSetDefaultNodeId stuff once we start implementing the 
replication identifiers. Not to mention that I realized in meantime that 
CommitTsSetDefaultNodeId the way it's done currently isn't crash safe 
(it's not hard to make it crash safe). And since it's quite simple we 
can add it at later date easily if needed.

>
> Also, a question particularly for BDR and Slony folks: do you
> sometimes add a new node using the base backup of an existing node :)
> See what I come up with: a duplication of this new node ID system with
> the already present system ID, no?

Yes we do use basebackup sometimes and no it's not possible to use 
systemid here: - the point of nodeid is to be able to store *remote* nodeid as well 
as local one (depending where the change actually originated from) so 
your local systemid is quite useless there - systemid is per Postgres instance, you need per-db identifier when 
doing logical rep (2 dbs can have single db as destination or the other 
way around)

> Similarly, the LSN is added to the WAL record containing the commit
> timestamp, but cannot the LSN of the WAL record containing the commit
> timestamp itself be used as a point of reference for a better
> ordering? That's not exactly the same as the LSN of the transaction
> commit, still it provides a WAL-based reference.

No, again for several reasons: - as you pointed out yourself the LSN might not be same as LSN for the xid - more
importantlywe normally don't do special WAL logging for commit 
 
timestamp

How it works is that because currently the 
TransactionTreeSetCommitTsData is always called with xid of the current 
transaction, the WAL record for commit of current transaction can be 
used to get the info we need (both timestamp and lsn are used in fact).
As I said above, see how TransactionTreeSetCommitTsData is called from 
xact_redo_commit_internal.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Add CREATE support to event triggers
Next
From: "Tomas Vondra"
Date:
Subject: Re: WIP: multivariate statistics / proof of concept