Re: WAL bypass for CTAS - Mailing list pgsql-patches

From Simon Riggs
Subject Re: WAL bypass for CTAS
Date
Msg-id 1119300912.3645.315.camel@localhost.localdomain
Whole thread Raw
In response to Re: WAL bypass for CTAS  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WAL bypass for CTAS
List pgsql-patches
On Mon, 2005-06-20 at 14:50 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I enclose a complete patch for avoiding WAL usage for CREATE TABLE AS
> > SELECT, when not in archive mode (PITR). The main use case for this is
> > large BI environments that create summary tables or prejoined tables,
> > though there are many general applications.
>
> Applied

Thanks

> after heavy corrections --- there were a number of things wrong
> with this "simple" patch, starting with having gotten the tests
> backwards :-(

Sorry, I thought I had corrected that error before submission. I was
aware that I had made that error earlier.

> and extending to not having actually flushed the data
> before commit (smgrimmedsync isn't enough, you have to
> FlushRelationBuffers).

I followed the logic as seen in nbtsort.c as you suggested. That code
doesn't perform a FlushRelationBuffers and it looks like I fooled myself
into thinking the CTAS/SELECT INTO case was also in local.

Perhaps we should be building CTAS/SELECT INTO in local buffers anyway?
It looks like we could save time by avoiding shared_buffers completely
and build up a whole page before writing it anywhere. (But thats a story
for another day).

Perhaps this is also related to metapage errors, since the metapage is
always the last page to be written?

> A consideration we had all missed in the original discussions is that
> if the transaction doesn't emit any WAL records at all,
> RecordTransactionCommit will think that it need not WAL-log the
> transaction commit, leading to the possibility that the commit is lost
> even though all the data is preserved :-(

> This is not a hazard for CREATE TABLE AS, since it will certainly have
> emitted WAL records while creating the table's catalog entries.  It will
> be a very real hazard for COPY however.

OK, but I haven't written that patch yet!

> The cleanest solution I can
> think of is that the COPY code should emit a WAL record for the first
> tuple copied in, but not for later ones.  To this end, I separated the
> "use_wal" and "use_fsm" aspects of what the patch was doing.

Not very clean, but will do as you suggest.

> I didn't apply the freespace.c changes either; that struck me as a
> serious kluge with no real benefit.  We can just omit updating the FSM's
> running average, if it even has one.  (ISTM there's a reasonable
> argument to be made that the tuple sizes during CREATE/COPY might not be
> representative of later requests anyway.)

I was striving for completeness only. I was doubtful about that part of
the patch, but thought I'd add that rather than have you say I hadn't
thought about the FSM avg_request_size.

I put those changes in mainly for COPY. If you don't make any request at
all to FSM then a relation never gets to the MRU relation FSM list. I
agree that it is not strictly necessary, but leaving it off would be a
change in behaviour, since COPY did previously cause the relation to get
to the MRU. That could be a problem, since a relation might not then be
allocated any FSM pages following a vacuum.

Best Regards, Simon Riggs


pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: WAL bypass for CTAS
Next
From: Alvaro Herrera
Date:
Subject: Re: WAL bypass for CTAS