Re: Exposing the Xact commit order to the user - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Exposing the Xact commit order to the user
Date
Msg-id AANLkTilVW00VQBPxBpQ6n3_XknZRI_W6H0LXkiqk7EUA@mail.gmail.com
Whole thread Raw
In response to Re: Exposing the Xact commit order to the user  (Greg Stark <gsstark@mit.edu>)
Responses Re: Exposing the Xact commit order to the user
List pgsql-hackers
On Fri, Jun 4, 2010 at 10:44 AM, Greg Stark <gsstark@mit.edu> wrote:
> On Fri, Jun 4, 2010 at 2:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I find the skeptical attitude on this thread altogether unwarranted.
>> Jan made his case and, at least IMHO, presented it pretty clearly.
>
> Just to be clear I think the idea of exposing commit order is a
> no-brainer.  The specific interface is what I was questioning.

OK, thanks for that clarification.

> A function which takes a starting xid and a number of transactions to
> return seems very tied to one particular application. I could easily
> see other systems such as a multi-master system instead only wanting
> to compare two transactions to find out which committed first. Or
> non-replication applications where you have an LSN and want to know
> whether a given transaction had committed by that time.
>
> So one possible interface would be to do something like
> xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with
> an optional argument to limit the number or records returned.

I'm imagining that the backend data storage for this would be a file
containing, essentially, a struct for each commit repeated over and
over again, packed tightly.  It's easy to index into such a file using
a sequence number (give me the 1000'th commit) but searching by LSN
would require (a) storing the LSNs and (b) binary search.  Maybe it's
worth adding that complexity, but I'm not sure that it is.  Keeping
the size of this file small is important for ensuring that it has
minimal performance impact (which is also why I'm not sold on trying
to include the tuple counters that Jan proposed - I think we can solve
the problem he's worried about there more cleanly in other ways).

> So you could do:
>
> old := select pg_current_xlog_location();
> while (1)
> {
>  sleep 60s;
>  new := select pg_current_xlog_location()
>  process(select xids_committed_between(old,new))
>  old := new
> }
>
> This might be more useful for PITR recovery for example where you want
> to find out what transactions committed between now and some known
> point of corruption.

This could also be done by selecting the current commit sequence
number, getting the XIDs committed between the two commit sequence
numbers, etc.

> I could also see it being useful to have a function
> pg_xlog_location_of_commit(xid). That would let you run recovery until
> a particular transaction committed or test whether your replica is
> caught up to a particular commit. It could be useful for monitoring
> Hot Standby slaves.

Well, you'd need to index the commit data to make that work, I think,
so that adds a lot of complexity.  The implementation as proposed lets
you find the commits after a known point in order of occurrence, but
it doesn't let you inquire about the location of a particular commit.
If you want to run recovery until a particular transaction commits, we
could teach the recovery code to look for the commit record for that
XID and then pause at that point (or just before that point, if
someone wanted that as an alternative behavior), which would be much
simpler than using this mechanism.  And if you want to check whether
slaves are caught up, it would probably be better to use LSN rather
than commits, because you could be caught up on commits but way behind
on WAL replay.

>>  I think
>> we should be very careful about assuming that we understand
>> replication and its needs better than someone who has spent many years
>> developing one of the major PostgreSQL replication solutions.
>
> Well the flip side of that is that we want an interface that's useful
> for more than just one replication system. This is something basic
> enough that I think it will be useful for more than just replication
> if we design it generally enough. It should be useful for
> backup/restore processes and monitoring as well as various forms of
> replication including master-slave trigger based systems but also
> including PITR-based replication, log-parsing systems, multi-master
> trigger based systems, 2PC-based systems, etc.

Making it general enough to serve multiple needs is good, but we've
got to make sure that the extra complexity is buying us something.
Jan seems pretty confident that this could be used by Londiste also,
though it would be nice to have some confirmation from the Londiste
developer(s) on that.  I think it may also have applications for
distributed transactions and multi-master replication, but I am not
too sure it helps much for PITR-based replication or log-parsing
systems.  We want to design something that is good, but trying to
solve too many problems may end up solving none of them well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Idea for getting rid of VACUUM FREEZE on cold pages
Next
From: Robert Haas
Date:
Subject: Re: Did we really want to force an initdb in beta2?