Thread: Re: [PATCHES] ALTER TABLE ... SET TABLESPACE
> Attached is a patch implementing this functionality. > > I've modified make_new_heap() as well as swap_relfilenodes() to not assume > that tablespaces remain the same from old to new heap. I thought it better > to go down this road than introduce a lot of duplicate code. I have tried your patches and it works great. Thanks. One thing I noticed was if I change tablespace for a table having indexes, they are left in the old tablespace and the table itself was moved to the new tablespace. I regard this is a good thing since I could assign different table spaces for table and indexes. It would be even better to assign different tablespaces for each index. -- Tatsuo Ishii
> > Attached is a patch implementing this functionality. > > > > I've modified make_new_heap() as well as swap_relfilenodes() to not assume > > that tablespaces remain the same from old to new heap. I thought it better > > to go down this road than introduce a lot of duplicate code. > > I have tried your patches and it works great. Thanks. > > One thing I noticed was if I change tablespace for a table having > indexes, they are left in the old tablespace and the table itself was > moved to the new tablespace. I regard this is a good thing since I > could assign different table spaces for table and indexes. > It would be even better to assign different tablespaces for each > index. Hm. It seems there's a problem with tablespaces. What I did was: pgbench -i test alter table accounts set tablespace mydb2; \d accounts backend crashes by signal 11... -- Tatsuo Ishii
On Sun, 20 Jun 2004, Tatsuo Ishii wrote: > > > Attached is a patch implementing this functionality. > > > > > > I've modified make_new_heap() as well as swap_relfilenodes() to not assume > > > that tablespaces remain the same from old to new heap. I thought it better > > > to go down this road than introduce a lot of duplicate code. > > > > I have tried your patches and it works great. Thanks. > > > > One thing I noticed was if I change tablespace for a table having > > indexes, they are left in the old tablespace and the table itself was > > moved to the new tablespace. I regard this is a good thing since I > > could assign different table spaces for table and indexes. > > It would be even better to assign different tablespaces for each > > index. > > Hm. It seems there's a problem with tablespaces. What I did was: > > pgbench -i test > alter table accounts set tablespace mydb2; > \d accounts > > backend crashes by signal 11... I seem to be clobbering memory some where but I cannot get assert or valgrind to tell me. Anyone got any ideas? Gavin
> On Sun, 20 Jun 2004, Tatsuo Ishii wrote: > > > > > Attached is a patch implementing this functionality. > > > > > > > > I've modified make_new_heap() as well as swap_relfilenodes() to not assume > > > > that tablespaces remain the same from old to new heap. I thought it better > > > > to go down this road than introduce a lot of duplicate code. > > > > > > I have tried your patches and it works great. Thanks. > > > > > > One thing I noticed was if I change tablespace for a table having > > > indexes, they are left in the old tablespace and the table itself was > > > moved to the new tablespace. I regard this is a good thing since I > > > could assign different table spaces for table and indexes. > > > It would be even better to assign different tablespaces for each > > > index. > > > > Hm. It seems there's a problem with tablespaces. What I did was: > > > > pgbench -i test > > alter table accounts set tablespace mydb2; > > \d accounts > > > > backend crashes by signal 11... > > I seem to be clobbering memory some where but I cannot get assert or > valgrind to tell me. Anyone got any ideas? First of all I would like to ask you if you intend to leave indexes in the old tables space or not. Also I think we need to enhance ALTER INDEX to assign new table spaces for indexes. Assigning different tables spaces for tables and indexes are essential to gain more I/O speed IMO. -- Tatsuo Ishii
On Mon, 21 Jun 2004, Tatsuo Ishii wrote: > > On Sun, 20 Jun 2004, Tatsuo Ishii wrote: > > > > > > > Attached is a patch implementing this functionality. > > > > > > > > > > I've modified make_new_heap() as well as swap_relfilenodes() to not assume > > > > > that tablespaces remain the same from old to new heap. I thought it better > > > > > to go down this road than introduce a lot of duplicate code. > > > > > > > > I have tried your patches and it works great. Thanks. > > > > > > > > One thing I noticed was if I change tablespace for a table having > > > > indexes, they are left in the old tablespace and the table itself was > > > > moved to the new tablespace. I regard this is a good thing since I > > > > could assign different table spaces for table and indexes. > > > > It would be even better to assign different tablespaces for each > > > > index. > > > > > > Hm. It seems there's a problem with tablespaces. What I did was: > > > > > > pgbench -i test > > > alter table accounts set tablespace mydb2; > > > \d accounts > > > > > > backend crashes by signal 11... > > > > I seem to be clobbering memory some where but I cannot get assert or > > valgrind to tell me. Anyone got any ideas? > > First of all I would like to ask you if you intend to leave indexes in > the old tables space or not. Yes, that is intentional. > > Also I think we need to enhance ALTER INDEX to assign new table spaces > for indexes. Assigning different tables spaces for tables and indexes > are essential to gain more I/O speed IMO. I thought about this. ALTER INDEX doesn't exist yet and I figured that, unlike the case of tables, its easy to drop and recreate indexes in new tablespaces. I'm still stumped as to where I am corrupting memory with this patch though. (There was another bug: I wasn't detecting the case where users set tablespace to the tablespace that the table is already in). gavin
> > Also I think we need to enhance ALTER INDEX to assign new table spaces > > for indexes. Assigning different tables spaces for tables and indexes > > are essential to gain more I/O speed IMO. > > I thought about this. ALTER INDEX doesn't exist yet and I figured that, > unlike the case of tables, its easy to drop and recreate indexes in new > tablespaces. Oh you are right. I forgot about CREATE INDEX ... TABLESPACE. > I'm still stumped as to where I am corrupting memory with this patch > though. (There was another bug: I wasn't detecting the case where users > set tablespace to the tablespace that the table is already in). > > gavin >
On Sun, 2004-06-20 at 17:15, Tatsuo Ishii wrote: > > > Also I think we need to enhance ALTER INDEX to assign new table spaces > > > for indexes. Assigning different tables spaces for tables and indexes > > > are essential to gain more I/O speed IMO. > > > > I thought about this. ALTER INDEX doesn't exist yet and I figured that, > > unlike the case of tables, its easy to drop and recreate indexes in new > > tablespaces. > > Oh you are right. I forgot about CREATE INDEX ... TABLESPACE. > > > I'm still stumped as to where I am corrupting memory with this patch > > though. (There was another bug: I wasn't detecting the case where users > > set tablespace to the tablespace that the table is already in). On a related note, will there be a way to have implicit index creation occur in a seperate table space automagically? I.e. create table test (id int4 primary key, n1 int unique); so that the indexes created in id and n1 here would have a different default namespace than the table? Just wondering.
Gavin Sherry <swm@linuxworld.com.au> writes: > On Mon, 21 Jun 2004, Tatsuo Ishii wrote: >> Also I think we need to enhance ALTER INDEX to assign new table spaces >> for indexes. Assigning different tables spaces for tables and indexes >> are essential to gain more I/O speed IMO. > I thought about this. ALTER INDEX doesn't exist yet and I figured that, > unlike the case of tables, its easy to drop and recreate indexes in new > tablespaces. The precedents we already have (ALTER OWNER, RENAME, SET STATISTICS) are that ALTER TABLE applies to any relation type for which it makes sense. So I'd expect ALTER TABLE SET TABLESPACE to just work on indexes, not that we'd go and invent an ALTER INDEX ... command. Given that you implement the data transfer as a straight block-by-block copy and not some kind of tuple-at-a-time thing, I would think that it would be trivial to consider them the same case from an implementation point of view, too. regards, tom lane
On Sun, 20 Jun 2004, Tom Lane wrote: > Gavin Sherry <swm@linuxworld.com.au> writes: > > On Mon, 21 Jun 2004, Tatsuo Ishii wrote: > >> Also I think we need to enhance ALTER INDEX to assign new table spaces > >> for indexes. Assigning different tables spaces for tables and indexes > >> are essential to gain more I/O speed IMO. > > > I thought about this. ALTER INDEX doesn't exist yet and I figured that, > > unlike the case of tables, its easy to drop and recreate indexes in new > > tablespaces. > > The precedents we already have (ALTER OWNER, RENAME, SET STATISTICS) > are that ALTER TABLE applies to any relation type for which it makes > sense. So I'd expect ALTER TABLE SET TABLESPACE to just work on > indexes, not that we'd go and invent an ALTER INDEX ... command. Yes, of course. > > Given that you implement the data transfer as a straight block-by-block > copy and not some kind of tuple-at-a-time thing, I would think that > it would be trivial to consider them the same case from an > implementation point of view, too. But I did implement it as a tuple at a time thing. I reused the code from rebuild_relation()... What did you have in mind? Gavin
Gavin Sherry <swm@linuxworld.com.au> writes: > But I did implement it as a tuple at a time thing. I reused the code from > rebuild_relation()... > What did you have in mind? Something about like for (b = 0; b < RelationGetNumberOfBlocks(src); b++){ smgrread(src, b, buf); smgrwrite(dst, b, buf);} Given that the only files people are going to be troubling to reassign to new tablespaces are enormous ones, you'd want the transfer to be as efficient as reasonably possible. The main thing this is omitting is "what about wal-logging the move"? Perhaps we could emit one WAL record showing the source and dest RelFileNodes and number of blocks for the copy, and then LSN-stamp each copied block with that record's LSN. However I'm not sure how to replay that if the source file isn't there anymore when the replay needs to run :-(. Maybe you have to dump each block into WAL as you copy it. That would be kinda ugly ... though in point of fact less of a WAL load than writing individual tuples ... regards, tom lane
On Sun, 20 Jun 2004, Tom Lane wrote: > Gavin Sherry <swm@linuxworld.com.au> writes: > > But I did implement it as a tuple at a time thing. I reused the code from > > rebuild_relation()... > > > What did you have in mind? > > Something about like > > for (b = 0; b < RelationGetNumberOfBlocks(src); b++) > { > smgrread(src, b, buf); > smgrwrite(dst, b, buf); > } > > Given that the only files people are going to be troubling to reassign > to new tablespaces are enormous ones, you'd want the transfer to be as > efficient as reasonably possible. > > The main thing this is omitting is "what about wal-logging the move"? Yes, that's what I was thinking. > Perhaps we could emit one WAL record showing the source and dest > RelFileNodes and number of blocks for the copy, and then LSN-stamp > each copied block with that record's LSN. However I'm not sure how to > replay that if the source file isn't there anymore when the replay needs > to run :-(. Maybe you have to dump each block into WAL as you copy it. > That would be kinda ugly ... though in point of fact less of a WAL load > than writing individual tuples ... Should I use the WAL-enabled case of _bt_blwritepage() as a guide here? Gavin
On Mon, 21 Jun 2004, Tom Lane wrote: > Gavin Sherry <swm@linuxworld.com.au> writes: > > On Sun, 20 Jun 2004, Tom Lane wrote: > >> Maybe you have to dump each block into WAL as you copy it. > >> That would be kinda ugly ... though in point of fact less of a WAL load > >> than writing individual tuples ... > > > Should I use the WAL-enabled case of _bt_blwritepage() as a guide here? > > Yeah, actually that is a very good parallel. If PITR archiving isn't > turned on, you don't have to dump pages into WAL; you can substitute > an fsync before commit, instead. And if it's a temp table then you > don't have to do either. (Not sure anyone would ever do SET TABLESPACE > on a temp table, but might as well get it right.) > > The xlog action here of copying a page image is currently > btree-specific, but maybe we should move it to a more widely visible > place, such as heapam.c. I don't see any value in having identical > xlog recovery actions in several different modules. I was just thinking that. I imagine that this would be useful for WAL logging of createdb() when that functionality gets implemented. We might also be able to use it as a speed up for cluster() (some time in the future). That is, we could form a complete page in memory in relation_rebuild() and then write it out directly. Gavin
Gavin Sherry <swm@linuxworld.com.au> writes: > On Sun, 20 Jun 2004, Tom Lane wrote: >> Maybe you have to dump each block into WAL as you copy it. >> That would be kinda ugly ... though in point of fact less of a WAL load >> than writing individual tuples ... > Should I use the WAL-enabled case of _bt_blwritepage() as a guide here? Yeah, actually that is a very good parallel. If PITR archiving isn't turned on, you don't have to dump pages into WAL; you can substitute an fsync before commit, instead. And if it's a temp table then you don't have to do either. (Not sure anyone would ever do SET TABLESPACE on a temp table, but might as well get it right.) The xlog action here of copying a page image is currently btree-specific, but maybe we should move it to a more widely visible place, such as heapam.c. I don't see any value in having identical xlog recovery actions in several different modules. regards, tom lane
Gavin Sherry <swm@linuxworld.com.au> writes: > On Mon, 21 Jun 2004, Tatsuo Ishii wrote: >> First of all I would like to ask you if you intend to leave indexes in >> the old tables space or not. > Yes, that is intentional. There's a related issue: what about the table's TOAST table (if any) and the index on same? During CREATE TABLE, the toast table is assigned to the same tablespace as the base table, as is its index. I don't think this is wrong exactly --- the fact that we separate data storage into main and toast tables is an implementation detail that users shouldn't have to think about. If you subscribe to that notion, then ALTER TABLE SET TABLESPACE had better move the toast table and index too. If you don't subscribe to it, then ALTER TABLE SET TABLESPACE had better be able to move toast tables, and the issue will have to be documented. regards, tom lane
I'm rewriting the patch so don't worry :-) Thanks, Gavin On Mon, 21 Jun 2004, Mark Kirkwood wrote: > I don't know if this provides any more info than you already have - > but is my last few lines from a single process backend run with valgrind : > > ==19666== Syscall param write(buf) contains uninitialised or > unaddressable byte(s) > ==19666== at 0x404D94F8: __GI___libc_write (in /lib/libc-2.3.2.so) > ==19666== by 0x80934F8: XLogFlush (xlog.c:1414) > ==19666== by 0x8090723: RecordTransactionCommit (xact.c:550) > ==19666== by 0x8090BC0: CommitTransaction (xact.c:931) > ==19666== Address 0x4219236A is not stack'd, malloc'd or free'd > backend> 1: oid (typeid = 26, len = 4, typmod = -1, byval = t) > 2: nspname (typeid = 19, len = 64, typmod = -1, byval = f) > 3: relname (typeid = 19, len = 64, typmod = -1, byval = f) > ---- > ==19666== > ==19666== Invalid write of size 4 > ==19666== at 0x8109B00: DLMoveToFront (dllist.c:237) > ==19666== by 0x81B2EB5: SearchCatCache (catcache.c:1155) > ==19666== by 0x81B7D72: GetSysCacheOid (syscache.c:606) > ==19666== by 0x81B8C7A: get_relname_relid (lsyscache.c:879) > ==19666== Address 0xCC3D5C04 is not stack'd, malloc'd or free'd > Segmentation fault > > > Gavin Sherry wrote: > > >On Sun, 20 Jun 2004, Tatsuo Ishii wrote: > > > > > > > >>>>Attached is a patch implementing this functionality. > >>>> > >>>>I've modified make_new_heap() as well as swap_relfilenodes() to not assume > >>>>that tablespaces remain the same from old to new heap. I thought it better > >>>>to go down this road than introduce a lot of duplicate code. > >>>> > >>>> > >>>I have tried your patches and it works great. Thanks. > >>> > >>>One thing I noticed was if I change tablespace for a table having > >>>indexes, they are left in the old tablespace and the table itself was > >>>moved to the new tablespace. I regard this is a good thing since I > >>>could assign different table spaces for table and indexes. > >>>It would be even better to assign different tablespaces for each > >>>index. > >>> > >>> > >>Hm. It seems there's a problem with tablespaces. What I did was: > >> > >>pgbench -i test > >>alter table accounts set tablespace mydb2; > >>\d accounts > >> > >>backend crashes by signal 11... > >> > >> > > > > > > > > > !DSPAM:40d66cf4282571539216297! > >
I don't know if this provides any more info than you already have - but is my last few lines from a single process backend run with valgrind : ==19666== Syscall param write(buf) contains uninitialised or unaddressable byte(s) ==19666== at 0x404D94F8: __GI___libc_write (in /lib/libc-2.3.2.so) ==19666== by 0x80934F8: XLogFlush (xlog.c:1414) ==19666== by 0x8090723: RecordTransactionCommit (xact.c:550) ==19666== by 0x8090BC0: CommitTransaction (xact.c:931) ==19666== Address 0x4219236A is not stack'd, malloc'd or free'd backend> 1: oid (typeid = 26, len = 4, typmod = -1, byval = t) 2: nspname (typeid = 19, len = 64, typmod= -1, byval = f) 3: relname (typeid = 19, len = 64, typmod = -1, byval = f) ---- ==19666== ==19666== Invalid write of size 4 ==19666== at 0x8109B00: DLMoveToFront (dllist.c:237) ==19666== by 0x81B2EB5: SearchCatCache (catcache.c:1155) ==19666== by 0x81B7D72: GetSysCacheOid (syscache.c:606) ==19666== by 0x81B8C7A: get_relname_relid (lsyscache.c:879) ==19666== Address 0xCC3D5C04 is not stack'd, malloc'd or free'd Segmentation fault Gavin Sherry wrote: >On Sun, 20 Jun 2004, Tatsuo Ishii wrote: > > > >>>>Attached is a patch implementing this functionality. >>>> >>>>I've modified make_new_heap() as well as swap_relfilenodes() to not assume >>>>that tablespaces remain the same from old to new heap. I thought it better >>>>to go down this road than introduce a lot of duplicate code. >>>> >>>> >>>I have tried your patches and it works great. Thanks. >>> >>>One thing I noticed was if I change tablespace for a table having >>>indexes, they are left in the old tablespace and the table itself was >>>moved to the new tablespace. I regard this is a good thing since I >>>could assign different table spaces for table and indexes. >>>It would be even better to assign different tablespaces for each >>>index. >>> >>> >>Hm. It seems there's a problem with tablespaces. What I did was: >> >>pgbench -i test >>alter table accounts set tablespace mydb2; >>\d accounts >> >>backend crashes by signal 11... >> >> > > >