Thread: Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Tatsuo Ishii
Date:
> Attached is a patch implementing this functionality.
>
> I've modified make_new_heap() as well as swap_relfilenodes() to not assume
> that tablespaces remain the same from old to new heap. I thought it better
> to go down this road than introduce a lot of duplicate code.

I have tried your patches and it works great. Thanks.

One thing I noticed was if I change tablespace for a table having
indexes, they are left in the old tablespace and the table itself was
moved to the new tablespace. I regard this is a good thing since I
could assign different table spaces for table and indexes.
It would be even better to assign different tablespaces for each
index.
--
Tatsuo Ishii

Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Tatsuo Ishii
Date:
> > Attached is a patch implementing this functionality.
> >
> > I've modified make_new_heap() as well as swap_relfilenodes() to not assume
> > that tablespaces remain the same from old to new heap. I thought it better
> > to go down this road than introduce a lot of duplicate code.
>
> I have tried your patches and it works great. Thanks.
>
> One thing I noticed was if I change tablespace for a table having
> indexes, they are left in the old tablespace and the table itself was
> moved to the new tablespace. I regard this is a good thing since I
> could assign different table spaces for table and indexes.
> It would be even better to assign different tablespaces for each
> index.

Hm. It seems there's a problem with tablespaces. What I did was:

pgbench -i test
alter table accounts set tablespace mydb2;
\d accounts

backend crashes by signal 11...
--
Tatsuo Ishii

Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Gavin Sherry
Date:
On Sun, 20 Jun 2004, Tatsuo Ishii wrote:

> > > Attached is a patch implementing this functionality.
> > >
> > > I've modified make_new_heap() as well as swap_relfilenodes() to not assume
> > > that tablespaces remain the same from old to new heap. I thought it better
> > > to go down this road than introduce a lot of duplicate code.
> >
> > I have tried your patches and it works great. Thanks.
> >
> > One thing I noticed was if I change tablespace for a table having
> > indexes, they are left in the old tablespace and the table itself was
> > moved to the new tablespace. I regard this is a good thing since I
> > could assign different table spaces for table and indexes.
> > It would be even better to assign different tablespaces for each
> > index.
>
> Hm. It seems there's a problem with tablespaces. What I did was:
>
> pgbench -i test
> alter table accounts set tablespace mydb2;
> \d accounts
>
> backend crashes by signal 11...

I seem to be clobbering memory some where but I cannot get assert or
valgrind to tell me. Anyone got any ideas?

Gavin


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Tatsuo Ishii
Date:
> On Sun, 20 Jun 2004, Tatsuo Ishii wrote:
> 
> > > > Attached is a patch implementing this functionality.
> > > >
> > > > I've modified make_new_heap() as well as swap_relfilenodes() to not assume
> > > > that tablespaces remain the same from old to new heap. I thought it better
> > > > to go down this road than introduce a lot of duplicate code.
> > >
> > > I have tried your patches and it works great. Thanks.
> > >
> > > One thing I noticed was if I change tablespace for a table having
> > > indexes, they are left in the old tablespace and the table itself was
> > > moved to the new tablespace. I regard this is a good thing since I
> > > could assign different table spaces for table and indexes.
> > > It would be even better to assign different tablespaces for each
> > > index.
> >
> > Hm. It seems there's a problem with tablespaces. What I did was:
> >
> > pgbench -i test
> > alter table accounts set tablespace mydb2;
> > \d accounts
> >
> > backend crashes by signal 11...
> 
> I seem to be clobbering memory some where but I cannot get assert or
> valgrind to tell me. Anyone got any ideas?

First of all I would like to ask you if you intend to leave indexes in
the old tables space or not.

Also I think we need to enhance ALTER INDEX to assign new table spaces
for indexes. Assigning different tables spaces for tables and indexes
are essential to gain more I/O speed IMO.
--
Tatsuo Ishii


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Gavin Sherry
Date:
On Mon, 21 Jun 2004, Tatsuo Ishii wrote:

> > On Sun, 20 Jun 2004, Tatsuo Ishii wrote:
> >
> > > > > Attached is a patch implementing this functionality.
> > > > >
> > > > > I've modified make_new_heap() as well as swap_relfilenodes() to not assume
> > > > > that tablespaces remain the same from old to new heap. I thought it better
> > > > > to go down this road than introduce a lot of duplicate code.
> > > >
> > > > I have tried your patches and it works great. Thanks.
> > > >
> > > > One thing I noticed was if I change tablespace for a table having
> > > > indexes, they are left in the old tablespace and the table itself was
> > > > moved to the new tablespace. I regard this is a good thing since I
> > > > could assign different table spaces for table and indexes.
> > > > It would be even better to assign different tablespaces for each
> > > > index.
> > >
> > > Hm. It seems there's a problem with tablespaces. What I did was:
> > >
> > > pgbench -i test
> > > alter table accounts set tablespace mydb2;
> > > \d accounts
> > >
> > > backend crashes by signal 11...
> >
> > I seem to be clobbering memory some where but I cannot get assert or
> > valgrind to tell me. Anyone got any ideas?
>
> First of all I would like to ask you if you intend to leave indexes in
> the old tables space or not.

Yes, that is intentional.

>
> Also I think we need to enhance ALTER INDEX to assign new table spaces
> for indexes. Assigning different tables spaces for tables and indexes
> are essential to gain more I/O speed IMO.

I thought about this. ALTER INDEX doesn't exist yet and I figured that,
unlike the case of tables, its easy to drop and recreate indexes in new
tablespaces.

I'm still stumped as to where I am corrupting memory with this patch
though. (There was another bug: I wasn't detecting the case where users
set tablespace to the tablespace that the table is already in).

gavin


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Tatsuo Ishii
Date:
> > Also I think we need to enhance ALTER INDEX to assign new table spaces
> > for indexes. Assigning different tables spaces for tables and indexes
> > are essential to gain more I/O speed IMO.
> 
> I thought about this. ALTER INDEX doesn't exist yet and I figured that,
> unlike the case of tables, its easy to drop and recreate indexes in new
> tablespaces.

Oh you are right. I forgot about CREATE INDEX ... TABLESPACE.

> I'm still stumped as to where I am corrupting memory with this patch
> though. (There was another bug: I wasn't detecting the case where users
> set tablespace to the tablespace that the table is already in).
> 
> gavin
> 


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
"Scott Marlowe"
Date:
On Sun, 2004-06-20 at 17:15, Tatsuo Ishii wrote:
> > > Also I think we need to enhance ALTER INDEX to assign new table spaces
> > > for indexes. Assigning different tables spaces for tables and indexes
> > > are essential to gain more I/O speed IMO.
> > 
> > I thought about this. ALTER INDEX doesn't exist yet and I figured that,
> > unlike the case of tables, its easy to drop and recreate indexes in new
> > tablespaces.
> 
> Oh you are right. I forgot about CREATE INDEX ... TABLESPACE.
> 
> > I'm still stumped as to where I am corrupting memory with this patch
> > though. (There was another bug: I wasn't detecting the case where users
> > set tablespace to the tablespace that the table is already in).

On a related note, will there be a way to have implicit index creation
occur in a seperate table space automagically?  I.e.

create table test (id int4 primary key, n1 int unique);

so that the indexes created in id and n1 here would have a different
default namespace than the table?  Just wondering.



Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> On Mon, 21 Jun 2004, Tatsuo Ishii wrote:
>> Also I think we need to enhance ALTER INDEX to assign new table spaces
>> for indexes. Assigning different tables spaces for tables and indexes
>> are essential to gain more I/O speed IMO.

> I thought about this. ALTER INDEX doesn't exist yet and I figured that,
> unlike the case of tables, its easy to drop and recreate indexes in new
> tablespaces.

The precedents we already have (ALTER OWNER, RENAME, SET STATISTICS)
are that ALTER TABLE applies to any relation type for which it makes
sense.  So I'd expect ALTER TABLE SET TABLESPACE to just work on
indexes, not that we'd go and invent an ALTER INDEX ... command.

Given that you implement the data transfer as a straight block-by-block
copy and not some kind of tuple-at-a-time thing, I would think that
it would be trivial to consider them the same case from an
implementation point of view, too.
        regards, tom lane


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Gavin Sherry
Date:
On Sun, 20 Jun 2004, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > On Mon, 21 Jun 2004, Tatsuo Ishii wrote:
> >> Also I think we need to enhance ALTER INDEX to assign new table spaces
> >> for indexes. Assigning different tables spaces for tables and indexes
> >> are essential to gain more I/O speed IMO.
>
> > I thought about this. ALTER INDEX doesn't exist yet and I figured that,
> > unlike the case of tables, its easy to drop and recreate indexes in new
> > tablespaces.
>
> The precedents we already have (ALTER OWNER, RENAME, SET STATISTICS)
> are that ALTER TABLE applies to any relation type for which it makes
> sense.  So I'd expect ALTER TABLE SET TABLESPACE to just work on
> indexes, not that we'd go and invent an ALTER INDEX ... command.

Yes, of course.

>
> Given that you implement the data transfer as a straight block-by-block
> copy and not some kind of tuple-at-a-time thing, I would think that
> it would be trivial to consider them the same case from an
> implementation point of view, too.

But I did implement it as a tuple at a time thing. I reused the code from
rebuild_relation()...

What did you have in mind?

Gavin


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> But I did implement it as a tuple at a time thing. I reused the code from
> rebuild_relation()...

> What did you have in mind?

Something about like
for (b = 0; b < RelationGetNumberOfBlocks(src); b++){    smgrread(src, b, buf);    smgrwrite(dst, b, buf);}

Given that the only files people are going to be troubling to reassign
to new tablespaces are enormous ones, you'd want the transfer to be as
efficient as reasonably possible.

The main thing this is omitting is "what about wal-logging the move"?
Perhaps we could emit one WAL record showing the source and dest
RelFileNodes and number of blocks for the copy, and then LSN-stamp
each copied block with that record's LSN.  However I'm not sure how to
replay that if the source file isn't there anymore when the replay needs
to run :-(.  Maybe you have to dump each block into WAL as you copy it.
That would be kinda ugly ... though in point of fact less of a WAL load
than writing individual tuples ...
        regards, tom lane


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Gavin Sherry
Date:
On Sun, 20 Jun 2004, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > But I did implement it as a tuple at a time thing. I reused the code from
> > rebuild_relation()...
>
> > What did you have in mind?
>
> Something about like
>
>     for (b = 0; b < RelationGetNumberOfBlocks(src); b++)
>     {
>         smgrread(src, b, buf);
>         smgrwrite(dst, b, buf);
>     }
>
> Given that the only files people are going to be troubling to reassign
> to new tablespaces are enormous ones, you'd want the transfer to be as
> efficient as reasonably possible.
>
> The main thing this is omitting is "what about wal-logging the move"?

Yes, that's what I was thinking.

> Perhaps we could emit one WAL record showing the source and dest
> RelFileNodes and number of blocks for the copy, and then LSN-stamp
> each copied block with that record's LSN.  However I'm not sure how to
> replay that if the source file isn't there anymore when the replay needs
> to run :-(.  Maybe you have to dump each block into WAL as you copy it.
> That would be kinda ugly ... though in point of fact less of a WAL load
> than writing individual tuples ...

Should I use the WAL-enabled case of  _bt_blwritepage() as a guide here?

Gavin


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Gavin Sherry
Date:
On Mon, 21 Jun 2004, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > On Sun, 20 Jun 2004, Tom Lane wrote:
> >> Maybe you have to dump each block into WAL as you copy it.
> >> That would be kinda ugly ... though in point of fact less of a WAL load
> >> than writing individual tuples ...
>
> > Should I use the WAL-enabled case of  _bt_blwritepage() as a guide here?
>
> Yeah, actually that is a very good parallel.  If PITR archiving isn't
> turned on, you don't have to dump pages into WAL; you can substitute
> an fsync before commit, instead.  And if it's a temp table then you
> don't have to do either.  (Not sure anyone would ever do SET TABLESPACE
> on a temp table, but might as well get it right.)
>
> The xlog action here of copying a page image is currently
> btree-specific, but maybe we should move it to a more widely visible
> place, such as heapam.c.  I don't see any value in having identical
> xlog recovery actions in several different modules.

I was just thinking that. I imagine that this would be useful for WAL
logging of createdb() when that functionality gets implemented.

We might also be able to use it as a speed up for cluster() (some time
in the future). That is, we could form a complete page in memory in
relation_rebuild() and then write it out directly.

Gavin


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> On Sun, 20 Jun 2004, Tom Lane wrote:
>> Maybe you have to dump each block into WAL as you copy it.
>> That would be kinda ugly ... though in point of fact less of a WAL load
>> than writing individual tuples ...

> Should I use the WAL-enabled case of  _bt_blwritepage() as a guide here?

Yeah, actually that is a very good parallel.  If PITR archiving isn't
turned on, you don't have to dump pages into WAL; you can substitute
an fsync before commit, instead.  And if it's a temp table then you
don't have to do either.  (Not sure anyone would ever do SET TABLESPACE
on a temp table, but might as well get it right.)

The xlog action here of copying a page image is currently
btree-specific, but maybe we should move it to a more widely visible
place, such as heapam.c.  I don't see any value in having identical
xlog recovery actions in several different modules.
        regards, tom lane


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> On Mon, 21 Jun 2004, Tatsuo Ishii wrote:
>> First of all I would like to ask you if you intend to leave indexes in
>> the old tables space or not.

> Yes, that is intentional.

There's a related issue: what about the table's TOAST table (if any)
and the index on same?

During CREATE TABLE, the toast table is assigned to the same tablespace
as the base table, as is its index.  I don't think this is wrong exactly
--- the fact that we separate data storage into main and toast tables is
an implementation detail that users shouldn't have to think about.

If you subscribe to that notion, then ALTER TABLE SET TABLESPACE had
better move the toast table and index too.  If you don't subscribe
to it, then ALTER TABLE SET TABLESPACE had better be able to move
toast tables, and the issue will have to be documented.
        regards, tom lane


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Gavin Sherry
Date:
I'm rewriting the patch so don't worry :-)

Thanks,

Gavin

On Mon, 21 Jun 2004, Mark Kirkwood wrote:

> I don't know if this provides any more info than you already have -
> but is my last few lines from a single process backend run with valgrind :
>
> ==19666== Syscall param write(buf) contains uninitialised or
> unaddressable byte(s)
> ==19666==    at 0x404D94F8: __GI___libc_write (in /lib/libc-2.3.2.so)
> ==19666==    by 0x80934F8: XLogFlush (xlog.c:1414)
> ==19666==    by 0x8090723: RecordTransactionCommit (xact.c:550)
> ==19666==    by 0x8090BC0: CommitTransaction (xact.c:931)
> ==19666==    Address 0x4219236A is not stack'd, malloc'd or free'd
> backend>         1: oid (typeid = 26, len = 4, typmod = -1, byval = t)
>          2: nspname     (typeid = 19, len = 64, typmod = -1, byval = f)
>          3: relname     (typeid = 19, len = 64, typmod = -1, byval = f)
>         ----
> ==19666==
> ==19666== Invalid write of size 4
> ==19666==    at 0x8109B00: DLMoveToFront (dllist.c:237)
> ==19666==    by 0x81B2EB5: SearchCatCache (catcache.c:1155)
> ==19666==    by 0x81B7D72: GetSysCacheOid (syscache.c:606)
> ==19666==    by 0x81B8C7A: get_relname_relid (lsyscache.c:879)
> ==19666==    Address 0xCC3D5C04 is not stack'd, malloc'd or free'd
> Segmentation fault
>
>
> Gavin Sherry wrote:
>
> >On Sun, 20 Jun 2004, Tatsuo Ishii wrote:
> >
> >
> >
> >>>>Attached is a patch implementing this functionality.
> >>>>
> >>>>I've modified make_new_heap() as well as swap_relfilenodes() to not assume
> >>>>that tablespaces remain the same from old to new heap. I thought it better
> >>>>to go down this road than introduce a lot of duplicate code.
> >>>>
> >>>>
> >>>I have tried your patches and it works great. Thanks.
> >>>
> >>>One thing I noticed was if I change tablespace for a table having
> >>>indexes, they are left in the old tablespace and the table itself was
> >>>moved to the new tablespace. I regard this is a good thing since I
> >>>could assign different table spaces for table and indexes.
> >>>It would be even better to assign different tablespaces for each
> >>>index.
> >>>
> >>>
> >>Hm. It seems there's a problem with tablespaces. What I did was:
> >>
> >>pgbench -i test
> >>alter table accounts set tablespace mydb2;
> >>\d accounts
> >>
> >>backend crashes by signal 11...
> >>
> >>
> >
> >
> >
>
>
> !DSPAM:40d66cf4282571539216297!
>
>


Re: [PATCHES] ALTER TABLE ... SET TABLESPACE

From
Mark Kirkwood
Date:
I don't know if this provides any more info than you already have -
but is my last few lines from a single process backend run with valgrind :

==19666== Syscall param write(buf) contains uninitialised or 
unaddressable byte(s)
==19666==    at 0x404D94F8: __GI___libc_write (in /lib/libc-2.3.2.so)
==19666==    by 0x80934F8: XLogFlush (xlog.c:1414)
==19666==    by 0x8090723: RecordTransactionCommit (xact.c:550)
==19666==    by 0x8090BC0: CommitTransaction (xact.c:931)
==19666==    Address 0x4219236A is not stack'd, malloc'd or free'd
backend>         1: oid (typeid = 26, len = 4, typmod = -1, byval = t)        2: nspname     (typeid = 19, len = 64,
typmod= -1, byval = f)        3: relname     (typeid = 19, len = 64, typmod = -1, byval = f)       ----
 
==19666==
==19666== Invalid write of size 4
==19666==    at 0x8109B00: DLMoveToFront (dllist.c:237)
==19666==    by 0x81B2EB5: SearchCatCache (catcache.c:1155)
==19666==    by 0x81B7D72: GetSysCacheOid (syscache.c:606)
==19666==    by 0x81B8C7A: get_relname_relid (lsyscache.c:879)
==19666==    Address 0xCC3D5C04 is not stack'd, malloc'd or free'd
Segmentation fault


Gavin Sherry wrote:

>On Sun, 20 Jun 2004, Tatsuo Ishii wrote:
>
>  
>
>>>>Attached is a patch implementing this functionality.
>>>>
>>>>I've modified make_new_heap() as well as swap_relfilenodes() to not assume
>>>>that tablespaces remain the same from old to new heap. I thought it better
>>>>to go down this road than introduce a lot of duplicate code.
>>>>        
>>>>
>>>I have tried your patches and it works great. Thanks.
>>>
>>>One thing I noticed was if I change tablespace for a table having
>>>indexes, they are left in the old tablespace and the table itself was
>>>moved to the new tablespace. I regard this is a good thing since I
>>>could assign different table spaces for table and indexes.
>>>It would be even better to assign different tablespaces for each
>>>index.
>>>      
>>>
>>Hm. It seems there's a problem with tablespaces. What I did was:
>>
>>pgbench -i test
>>alter table accounts set tablespace mydb2;
>>\d accounts
>>
>>backend crashes by signal 11...
>>    
>>
>
>  
>