Re: WAL bypass for CTAS - Mailing list pgsql-patches
From | Simon Riggs |
---|---|
Subject | Re: WAL bypass for CTAS |
Date | |
Msg-id | 1119300912.3645.315.camel@localhost.localdomain Whole thread Raw |
In response to | Re: WAL bypass for CTAS (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: WAL bypass for CTAS
|
List | pgsql-patches |
On Mon, 2005-06-20 at 14:50 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I enclose a complete patch for avoiding WAL usage for CREATE TABLE AS > > SELECT, when not in archive mode (PITR). The main use case for this is > > large BI environments that create summary tables or prejoined tables, > > though there are many general applications. > > Applied Thanks > after heavy corrections --- there were a number of things wrong > with this "simple" patch, starting with having gotten the tests > backwards :-( Sorry, I thought I had corrected that error before submission. I was aware that I had made that error earlier. > and extending to not having actually flushed the data > before commit (smgrimmedsync isn't enough, you have to > FlushRelationBuffers). I followed the logic as seen in nbtsort.c as you suggested. That code doesn't perform a FlushRelationBuffers and it looks like I fooled myself into thinking the CTAS/SELECT INTO case was also in local. Perhaps we should be building CTAS/SELECT INTO in local buffers anyway? It looks like we could save time by avoiding shared_buffers completely and build up a whole page before writing it anywhere. (But thats a story for another day). Perhaps this is also related to metapage errors, since the metapage is always the last page to be written? > A consideration we had all missed in the original discussions is that > if the transaction doesn't emit any WAL records at all, > RecordTransactionCommit will think that it need not WAL-log the > transaction commit, leading to the possibility that the commit is lost > even though all the data is preserved :-( > This is not a hazard for CREATE TABLE AS, since it will certainly have > emitted WAL records while creating the table's catalog entries. It will > be a very real hazard for COPY however. OK, but I haven't written that patch yet! > The cleanest solution I can > think of is that the COPY code should emit a WAL record for the first > tuple copied in, but not for later ones. To this end, I separated the > "use_wal" and "use_fsm" aspects of what the patch was doing. Not very clean, but will do as you suggest. > I didn't apply the freespace.c changes either; that struck me as a > serious kluge with no real benefit. We can just omit updating the FSM's > running average, if it even has one. (ISTM there's a reasonable > argument to be made that the tuple sizes during CREATE/COPY might not be > representative of later requests anyway.) I was striving for completeness only. I was doubtful about that part of the patch, but thought I'd add that rather than have you say I hadn't thought about the FSM avg_request_size. I put those changes in mainly for COPY. If you don't make any request at all to FSM then a relation never gets to the MRU relation FSM list. I agree that it is not strictly necessary, but leaving it off would be a change in behaviour, since COPY did previously cause the relation to get to the MRU. That could be a problem, since a relation might not then be allocated any FSM pages following a vacuum. Best Regards, Simon Riggs
pgsql-patches by date: