Re: cheaper snapshots - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: cheaper snapshots |
Date | |
Msg-id | 1311869151.3117.1527.camel@hvost Whole thread Raw |
In response to | cheaper snapshots (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: cheaper snapshots
|
List | pgsql-hackers |
On Wed, 2011-07-27 at 22:51 -0400, Robert Haas wrote: > On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wonder whether we could do something involving WAL properties --- the > > current tuple visibility logic was designed before WAL existed, so it's > > not exploiting that resource at all. I'm imagining that the kernel of a > > snapshot is just a WAL position, ie the end of WAL as of the time you > > take the snapshot (easy to get in O(1) time). Visibility tests then > > reduce to "did this transaction commit with a WAL record located before > > the specified position?". You'd need some index datastructure that made > > it reasonably cheap to find out the commit locations of recently > > committed transactions, where "recent" means "back to recentGlobalXmin". > > That seems possibly do-able, though I don't have a concrete design in > > mind. > > I was mulling this idea over some more (the same ideas keep floating > back to the top...). I don't think an LSN can actually work, because > there's no guarantee that the order in which the WAL records are > emitted is the same order in which the effects of the transactions > become visible to new snapshots. For example: > > 1. Transaction A inserts its commit record, flushes WAL, and begins > waiting for sync rep. > 2. A moment later, transaction B sets synchronous_commit=off, inserts > its commit record, requests a background WAL flush, and removes itself > from the ProcArray. > 3. Transaction C takes a snapshot. It is Transaction A here which is acting badly - it should also remove itself from procArray right after it inserts its commit record, as for everybody else except the client app of transaction A it is committed at this point. It just cant report back to client before getting confirmation that it is actually syncrepped (or locally written to stable storage). At least at the point of consistent snapshots the right sequence should be: 1) inert commit record into wal 2) remove yourself from ProcArray (or use some other means to declare that your transaction is no longer running) 3) if so configured, wait for WAL flus to stable storage and/or SYnc Rep confirmation Based on this let me suggest a simple snapshot cache mechanism A simple snapshot cache mechanism ================================= have an array of running transactions, with one slot per backend txid running_transactions[max_connections]; there are exactly 3 operations on this array 1. insert backends running transaction id ----------------------------------------- this is done at the moment of acquiring your transaction id from system, and synchronized by the same mechanism as getting the transaction id running_transactions[my_backend] = current_transaction_id 2. remove backends running transaction id ----------------------------------------- this is done at the moment of committing or aborting the transaction, again synchronized by the write commit record mechanism. running_transactions[my_backend] = NULL should be first thing after insertin WAcommit record 3. getting a snapshot --------------------- memcpy() running_transactions to local memory, then construct a snapshot it may be that you need to protect all3 operations with a single spinlock, if so then I'd propose the same spinlock used when getting your transaction id (and placing the array near where latest transaction id is stored so they share cache line). But it is also possible, that you can get logically consistent snapshots by protecting only some ops. for example, if you protect only insert and get snapshot, then the worst that can happen is that you get a snapshot that is a few commits older than what youd get with full locking and it may well be ok for all real uses. -- ------- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/
pgsql-hackers by date: