9.3: load path to mitigate load penalty for checksums - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | 9.3: load path to mitigate load penalty for checksums |
Date | |
Msg-id | 1338859564.28589.199.camel@sussancws0025 Whole thread Raw |
Responses |
Re: 9.3: load path to mitigate load penalty for checksums
Re: 9.3: load path to mitigate load penalty for checksums |
List | pgsql-hackers |
Introduction: ================================= A lot of progress has been made on the checksums issue, with Simon's excellent work beginning to gain consensus: http://archives.postgresql.org/message-id/CA +U5nMKw_GBs6qQ_Y8-RjGL1V7MVW2HWBHartB8LoJhnPfxL8g@mail.gmail.com For the purposes of this proposal, I'll assume that's the general direction we'll be taking for CRCs. The major drawback of that proposal is that it increases the amount of work to be done after a large data load by requiring more WAL. Proposal: ================================= I propose a special user-initiated loading mode at the table granularity. During this time, readers must ignore PD_ALL_VISIBLE, HEAP_XMIN_COMMITTED, and the visibility map entirely. However, writers may set all of those bits before the writing transaction commits, obviating the need to rewrite (and WAL) the data again later. Ideally, there would be no work for VACUUM to do after the data load (unless a transaction aborted). This would also help the non-CRC case of course, but I expect CRCs to make this significantly more important. Goals: ================================= * Table granularity (doesn't affect other tables at all) * Allows concurrent loaders * Allows loading into existing tableswith existing data * Online (allow reads to proceed, even if degraded) Rough user interaction: ================================= INITIATE LOAD ON foo AS 'job name'; -- run concurrent loading sessions FINALIZE LOAD 'job name'; High-level design: ================================= By "hints" I mean the VM bit, PD_ALL_VISIBLE, and HEAP_XMIN_COMMITTED. By "ignorehints" I mean a flag in pg_class indicating that readers should ignore hints. By "optimistichints" I mean a flag in pg_class indicating that writers can optimistically set hints. Obviously, readers and writers would need a mechanism to honor those flags, but I haven't dug into the details yet (additional routines in tqual.c?). States: 0: normal * ignorehints = false * optimistichints = false 1: trying to enter data load mode, waitingon existing lockers (who might be scanning) to finish * ignorehints = true * optimistichints = false 2: data load mode * ignorehints = true * optimistichints = true 3: trying to leave data load mode, waitingon old snapshots to be released and aborted transactions to be cleaned up * ignorehints = true * optimistichints= false INITIATE LOAD would first transition from state 0 to 1 by acquiring a ShareUpdateExclusiveLock on the table (to be sure no concurrent INITIATE or FINALIZE LOAD is going on) and setting ignorehints = true. Then it moves from state 1 to state 2 by waiting for all transactions that hold a lock on that table. Any transactions that don't already have a lock will see the new flag when they try to get it. Now we're sure that all readers will see the "ignorehints" flag, so we can set the "optimistichints" flag to indicate that writers can write hints optimistically. FINALIZE LOAD would first move from state 2 to state 3 by acquiring a ShareUpdateExclusiveLock on the table setting optimistichints = false. Then, it would move from state 3 to state 0 by first waiting for all transactions that currently hold a lock on the table, to ensure they see the optimistichints=false flag. Then, it would remember the current xid as max_loader_xid, and wait until the global xmin is greater than max_loader_xid. This should ensure that all snapshots regard all loading transactions as complete. Also, it would need to run a lazy VACUUM to remove any tuples from aborted transactions. Details and optimizations =================================* We probably want a graceful way to handle multiple data loads happening on the same table. Rather than erroring out, we could treat it more like a reference count, and only do the work to move in to data load mode if not there already, and only move out of data load mode if we're the last loading job on the table.* In general, there are some usability issues to sort out, to make sure a table isn't put into data load mode and left that way. Right now, I'm mostly concerned with getting a working design, but those will be important, too.* We could optimize away the VACUUM going from 3 -> 0 if we are sure no writing transactions aborted.* INITIATE and FINALIZE probably need to use PreventTransactionChain() and multiple transactions, to avoid holding the ShareUpdateExclusiveLock for too long. Also, we want to keep people from using it in the same transaction as the loading xact, because they might not realize that they would get a concurrency of 1 that way (because of the ShareUpdateExclusiveLock). Thoughts? Regards,Jeff Davis
pgsql-hackers by date: