Re: Exposing the Xact commit order to the user - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Exposing the Xact commit order to the user |
Date | |
Msg-id | AANLkTilVW00VQBPxBpQ6n3_XknZRI_W6H0LXkiqk7EUA@mail.gmail.com Whole thread Raw |
In response to | Re: Exposing the Xact commit order to the user (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: Exposing the Xact commit order to the user
|
List | pgsql-hackers |
On Fri, Jun 4, 2010 at 10:44 AM, Greg Stark <gsstark@mit.edu> wrote: > On Fri, Jun 4, 2010 at 2:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I find the skeptical attitude on this thread altogether unwarranted. >> Jan made his case and, at least IMHO, presented it pretty clearly. > > Just to be clear I think the idea of exposing commit order is a > no-brainer. The specific interface is what I was questioning. OK, thanks for that clarification. > A function which takes a starting xid and a number of transactions to > return seems very tied to one particular application. I could easily > see other systems such as a multi-master system instead only wanting > to compare two transactions to find out which committed first. Or > non-replication applications where you have an LSN and want to know > whether a given transaction had committed by that time. > > So one possible interface would be to do something like > xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with > an optional argument to limit the number or records returned. I'm imagining that the backend data storage for this would be a file containing, essentially, a struct for each commit repeated over and over again, packed tightly. It's easy to index into such a file using a sequence number (give me the 1000'th commit) but searching by LSN would require (a) storing the LSNs and (b) binary search. Maybe it's worth adding that complexity, but I'm not sure that it is. Keeping the size of this file small is important for ensuring that it has minimal performance impact (which is also why I'm not sold on trying to include the tuple counters that Jan proposed - I think we can solve the problem he's worried about there more cleanly in other ways). > So you could do: > > old := select pg_current_xlog_location(); > while (1) > { > sleep 60s; > new := select pg_current_xlog_location() > process(select xids_committed_between(old,new)) > old := new > } > > This might be more useful for PITR recovery for example where you want > to find out what transactions committed between now and some known > point of corruption. This could also be done by selecting the current commit sequence number, getting the XIDs committed between the two commit sequence numbers, etc. > I could also see it being useful to have a function > pg_xlog_location_of_commit(xid). That would let you run recovery until > a particular transaction committed or test whether your replica is > caught up to a particular commit. It could be useful for monitoring > Hot Standby slaves. Well, you'd need to index the commit data to make that work, I think, so that adds a lot of complexity. The implementation as proposed lets you find the commits after a known point in order of occurrence, but it doesn't let you inquire about the location of a particular commit. If you want to run recovery until a particular transaction commits, we could teach the recovery code to look for the commit record for that XID and then pause at that point (or just before that point, if someone wanted that as an alternative behavior), which would be much simpler than using this mechanism. And if you want to check whether slaves are caught up, it would probably be better to use LSN rather than commits, because you could be caught up on commits but way behind on WAL replay. >> I think >> we should be very careful about assuming that we understand >> replication and its needs better than someone who has spent many years >> developing one of the major PostgreSQL replication solutions. > > Well the flip side of that is that we want an interface that's useful > for more than just one replication system. This is something basic > enough that I think it will be useful for more than just replication > if we design it generally enough. It should be useful for > backup/restore processes and monitoring as well as various forms of > replication including master-slave trigger based systems but also > including PITR-based replication, log-parsing systems, multi-master > trigger based systems, 2PC-based systems, etc. Making it general enough to serve multiple needs is good, but we've got to make sure that the extra complexity is buying us something. Jan seems pretty confident that this could be used by Londiste also, though it would be nice to have some confirmation from the Londiste developer(s) on that. I think it may also have applications for distributed transactions and multi-master replication, but I am not too sure it helps much for PITR-based replication or log-parsing systems. We want to design something that is good, but trying to solve too many problems may end up solving none of them well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
pgsql-hackers by date: