Thread: Move tablespace

Move tablespace

From
Simon Riggs
Date:
I just noticed that

ALTER TABLE foo SET TABLESPACE new_tablespace;

doesn't optimise writing WAL, so when streaming enabled it will copy the
whole table to WAL and across the wire. My understanding was that ALTER
TABLE had been optimised, though not as much as could be in this case.

Following patch writes a new WAL record that just says "copy foo to
newts" and during replay we flush buffers and then re-execute the copy
(but only when InArchiveRecovery). So the copy happens locally on the
standby, not copying from primary to standby. We do this just with a
little refactoring and a simple new WAL message.

So about 64 bytes rather than potentially gigabytes of data, which seems
important when using SR.

Simple patch, no new concepts, so I figure to apply this now.
(Yes, I need to bump WAL format id as well).

Objections?

--
 Simon Riggs           www.2ndQuadrant.com

Attachment

Re: Move tablespace

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Following patch writes a new WAL record that just says "copy foo to
> newts" and during replay we flush buffers and then re-execute the copy
> (but only when InArchiveRecovery). So the copy happens locally on the
> standby, not copying from primary to standby. We do this just with a
> little refactoring and a simple new WAL message.

And what happens to crash-recovery replay?  You can't have it both ways,
either the data is in WAL or it's missing.

> Objections?

This is NOT the time to be rushing in marginal performance
optimizations.  I don't think you've thought through all the corner
cases anyway.
        regards, tom lane


Re: Move tablespace

From
Simon Riggs
Date:
On Tue, 2010-04-20 at 21:03 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > Following patch writes a new WAL record that just says "copy foo to
> > newts" and during replay we flush buffers and then re-execute the copy
> > (but only when InArchiveRecovery). So the copy happens locally on the
> > standby, not copying from primary to standby. We do this just with a
> > little refactoring and a simple new WAL message.
> 
> And what happens to crash-recovery replay?  You can't have it both ways,
> either the data is in WAL or it's missing.

The patch changes nothing in the case of crash recovery.

There is no WAL written if !XLogIsNeeded, so we *must* have already made
the decision that the absence of WAL is not a problem for crash
recovery. Note that currently we flush the new table to disk just like
we do for heap_sync(), whether or not WAL is written. 

> > Objections?
> 
> This is NOT the time to be rushing in marginal performance
> optimizations.  I don't think you've thought through all the corner
> cases anyway.

The performance gain isn't marginal, otherwise I wouldn't have bothered
writing it

* We avoid writing GB of unnecessary WAL data on primary
* We avoid streaming that WAL data to the standby

If you can see a corner case that I do not, please say.

-- Simon Riggs           www.2ndQuadrant.com



Re: Move tablespace

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Tue, 2010-04-20 at 21:03 -0400, Tom Lane wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> Following patch writes a new WAL record that just says "copy foo to
>>> newts" and during replay we flush buffers and then re-execute the copy
>>> (but only when InArchiveRecovery). So the copy happens locally on the
>>> standby, not copying from primary to standby. We do this just with a
>>> little refactoring and a simple new WAL message.
>> And what happens to crash-recovery replay?  You can't have it both ways,
>> either the data is in WAL or it's missing.
> 
> The patch changes nothing in the case of crash recovery.

What happens if the record is replayed twice in archive recovery? For
example if you stop and restart a standby server after it has replayed
that record. What does the 2nd redo attempt do if the source file was
already deleted by the 1st recovery.

I also think we shouldn't be fiddling with this at this stage in the
release cycle.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Move tablespace

From
Simon Riggs
Date:
On Wed, 2010-04-21 at 14:37 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Tue, 2010-04-20 at 21:03 -0400, Tom Lane wrote:
> >> Simon Riggs <simon@2ndQuadrant.com> writes:
> >>> Following patch writes a new WAL record that just says "copy foo to
> >>> newts" and during replay we flush buffers and then re-execute the copy
> >>> (but only when InArchiveRecovery). So the copy happens locally on the
> >>> standby, not copying from primary to standby. We do this just with a
> >>> little refactoring and a simple new WAL message.
> >> And what happens to crash-recovery replay?  You can't have it both ways,
> >> either the data is in WAL or it's missing.
> > 
> > The patch changes nothing in the case of crash recovery.
> 
> What happens if the record is replayed twice in archive recovery? For
> example if you stop and restart a standby server after it has replayed
> that record. What does the 2nd redo attempt do if the source file was
> already deleted by the 1st recovery.

If the source is absent then we know that replay had successfully copied
the file and synced it, so we can just skip the record.

> I also think we shouldn't be fiddling with this at this stage in the
> release cycle.

OK, but not because I see a problem with the technique. 

-- Simon Riggs           www.2ndQuadrant.com



Re: Move tablespace

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Wed, 2010-04-21 at 14:37 +0300, Heikki Linnakangas wrote:
>> I also think we shouldn't be fiddling with this at this stage in the
>> release cycle.

> OK, but not because I see a problem with the technique. 

You made that plain already, but you have not convinced anyone else.
More to the point, ALTER SET TABLESPACE is not an operation that
happens so much that we ought to take any risks to optimize it.
        regards, tom lane


Re: Move tablespace

From
Simon Riggs
Date:
On Wed, 2010-04-21 at 11:53 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Wed, 2010-04-21 at 14:37 +0300, Heikki Linnakangas wrote:
> >> I also think we shouldn't be fiddling with this at this stage in the
> >> release cycle.
> 
> > OK, but not because I see a problem with the technique. 
> 
> You made that plain already, but you have not convinced anyone else.

Not pouting, just recording the fact that its an ongoing discussion.

> More to the point, ALTER SET TABLESPACE is not an operation that
> happens so much that we ought to take any risks to optimize it.

I think its the opposite: people don't run it because they can't.

But I'm happy not to press further at this stage of 9.0. I am always
optimistic about convincing you ;-)

-- Simon Riggs           www.2ndQuadrant.com