Thread: CLUSTER TODO item
Hi guys, I've been looking at CLUSTER today. I've put together a patch which recreates all the indices which current CLUSTER drops and copies relacl from the old pg_class tuple and puts it in the new one. I was working on updating pg_inherits to handle the new OID when it occured to me that pg_inherits is not the only system table corrupted. pg_trigger, pg_rewrite (and therefore views) and pg_description need to be updated as well. It seems like the easiest thing to do would be to update the new relation to have to OID of the old relation. Is there any reason why we would not want to do this? Gavin
> Hi guys, > > I've been looking at CLUSTER today. I've put together a patch which > recreates all the indices which current CLUSTER drops and copies relacl > from the old pg_class tuple and puts it in the new one. > > I was working on updating pg_inherits to handle the new OID when it > occured to me that pg_inherits is not the only system table > corrupted. pg_trigger, pg_rewrite (and therefore views) and pg_description > need to be updated as well. It seems like the easiest thing to do would be > to update the new relation to have to OID of the old relation. Is there > any reason why we would not want to do this? We did strange things with this in the past because we didn't have pg_class.relfilenode. Now that we do, you can just recreate the heap table using a different oid filename and update relfilenode when you are done. Keep the same oid. Of course, as you noted, the old indexes have to be recreated because the heap has changed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
>> I've been looking at CLUSTER today. I've put together a patch which >> recreates all the indices which current CLUSTER drops and copies relacl >> from the old pg_class tuple and puts it in the new one. This is entirely the wrong way to go at it. > We did strange things with this in the past because we didn't have > pg_class.relfilenode. Now that we do, you can just recreate the heap > table using a different oid filename and update relfilenode when you are > done. Yes. CLUSTER should not do one single thing to the system catalogs, except heap_update the pg_class rows for the table and indexes with new relfilenode values. That is the facility that a rewrite of CLUSTER was waiting for, so now that we have it, it's pointless to put more work into the old CLUSTER implementation. Note: I'm not convinced that relfilenode and pg_class.oid are each used in exactly the right spots. Once we have cases where they can differ, we may well find some bugs to flush out. But that needs to happen anyway, so don't let it dissuade you from doing CLUSTER the right way. regards, tom lane
On Sun, 23 Sep 2001, Tom Lane wrote: > > Note: I'm not convinced that relfilenode and pg_class.oid are each > used in exactly the right spots. Once we have cases where they can > differ, we may well find some bugs to flush out. But that needs to > happen anyway, so don't let it dissuade you from doing CLUSTER the > right way. I think I may have broken stuff. I'm not sure. I've fiddled a fair bit but I'm still segfaulting in the storage manager. It might be because I'm heap_creating and then just stealing relfilenode - I don't know. Anyway, a patch is attached which clusters and then recreates the indexes - but then segfaults. Am I going about this all wrong? Thanks Gavin
On Thu, 11 Oct 2001, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Can I get a status on this? > > It's not gonna happen for 7.2, I think ... > > regards, tom lane I'd love it to go out with 7.2 but I've had no time to work on this patch lately. The reason I need time is that, after having fiddled a fair bit, I've decided that there really needs to be support for the creation of a new relfilenode in the storage manager. The current patch works like this: Create new heap (heap_create()) Copy tuples from old heap to new heap via index scan Form a new pg_class tuple simple_heap_update() update catalogue indices rebuild existing indices This causes an overflow in the localbuf.c so I guess this is wrong (included in patch on 24/sep) =). I've looked at various combinations of this: memcpy() the old Relation into a new Relation, update smgrunlink() the old Relation and heap_storage_create() the new relation. This dies because smgrunlink only schedules the drop, where as heap_storage_create() actually creates a file on the file system (open() returns with EEXIST). I've also tried just copying the structure but heap_open() relies on OID not relfilenode. I'm probably going about it the wrong way, but it strikes me that there needs to be a way to abstract the relfilenode from OID in the heap access code so that one can easily manipulate the heap on disk without having to play with OIDs. I would have included code examples/clearer description but the box I don't have access to the box I created the patches on atm =(. Gavin
Can I get a status on this? > On Sun, 23 Sep 2001, Tom Lane wrote: > > > > Note: I'm not convinced that relfilenode and pg_class.oid are each > > used in exactly the right spots. Once we have cases where they can > > differ, we may well find some bugs to flush out. But that needs to > > happen anyway, so don't let it dissuade you from doing CLUSTER the > > right way. > > I think I may have broken stuff. I'm not sure. I've fiddled a fair bit but > I'm still segfaulting in the storage manager. It might be because I'm > heap_creating and then just stealing relfilenode - I don't know. > > Anyway, a patch is attached which clusters and then recreates the indexes > - but then segfaults. > > Am I going about this all wrong? > > Thanks > > Gavin Content-Description: [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Can I get a status on this? It's not gonna happen for 7.2, I think ... regards, tom lane
Is there an updated version of this patch for 7.3? --------------------------------------------------------------------------- Gavin Sherry wrote: > On Thu, 11 Oct 2001, Tom Lane wrote: > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Can I get a status on this? > > > > It's not gonna happen for 7.2, I think ... > > > > regards, tom lane > > I'd love it to go out with 7.2 but I've had no time to work on this patch > lately. The reason I need time is that, after having fiddled a fair bit, > I've decided that there really needs to be support for the creation of a > new relfilenode in the storage manager. > > The current patch works like this: > > Create new heap (heap_create()) > Copy tuples from old heap to new heap via index scan > Form a new pg_class tuple > simple_heap_update() > update catalogue indices > rebuild existing indices > > This causes an overflow in the localbuf.c so I guess this is wrong > (included in patch on 24/sep) =). I've looked at various combinations of > this: > > memcpy() the old Relation into a new Relation, update smgrunlink() the old > Relation and heap_storage_create() the new relation. This dies because > smgrunlink only schedules the drop, where as heap_storage_create() actually > creates a file on the file system (open() returns with EEXIST). > > I've also tried just copying the structure but heap_open() relies on OID > not relfilenode. > > I'm probably going about it the wrong way, but it strikes me that there > needs to be a way to abstract the relfilenode from OID in the heap access > code so that one can easily manipulate the heap on disk without having to > play with OIDs. > > I would have included code examples/clearer description but the box I > don't have access to the box I created the patches on atm =(. > > Gavin > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Gavin, do you have an updated patch you want applied to 7.3? --------------------------------------------------------------------------- Gavin Sherry wrote: > On Thu, 11 Oct 2001, Tom Lane wrote: > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Can I get a status on this? > > > > It's not gonna happen for 7.2, I think ... > > > > regards, tom lane > > I'd love it to go out with 7.2 but I've had no time to work on this patch > lately. The reason I need time is that, after having fiddled a fair bit, > I've decided that there really needs to be support for the creation of a > new relfilenode in the storage manager. > > The current patch works like this: > > Create new heap (heap_create()) > Copy tuples from old heap to new heap via index scan > Form a new pg_class tuple > simple_heap_update() > update catalogue indices > rebuild existing indices > > This causes an overflow in the localbuf.c so I guess this is wrong > (included in patch on 24/sep) =). I've looked at various combinations of > this: > > memcpy() the old Relation into a new Relation, update smgrunlink() the old > Relation and heap_storage_create() the new relation. This dies because > smgrunlink only schedules the drop, where as heap_storage_create() actually > creates a file on the file system (open() returns with EEXIST). > > I've also tried just copying the structure but heap_open() relies on OID > not relfilenode. > > I'm probably going about it the wrong way, but it strikes me that there > needs to be a way to abstract the relfilenode from OID in the heap access > code so that one can easily manipulate the heap on disk without having to > play with OIDs. > > I would have included code examples/clearer description but the box I > don't have access to the box I created the patches on atm =(. > > Gavin > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Hi Bruce, I put this on hold since I sent that email. This is basically because the API doesn't lend itself well to attaching new relfilenodes to existing heaps. I definately want to put this patch in but just need to find time to modify the API to support this correctly. Gavin On Fri, 22 Feb 2002, Bruce Momjian wrote: > > Is there an updated version of this patch for 7.3? > > --------------------------------------------------------------------------- > > Gavin Sherry wrote: > > On Thu, 11 Oct 2001, Tom Lane wrote: > > > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > Can I get a status on this? > > > > > > It's not gonna happen for 7.2, I think ... > > > > > > regards, tom lane > > > > I'd love it to go out with 7.2 but I've had no time to work on this patch > > lately. The reason I need time is that, after having fiddled a fair bit, > > I've decided that there really needs to be support for the creation of a > > new relfilenode in the storage manager. > > > > The current patch works like this: > > > > Create new heap (heap_create()) > > Copy tuples from old heap to new heap via index scan > > Form a new pg_class tuple > > simple_heap_update() > > update catalogue indices > > rebuild existing indices > > > > This causes an overflow in the localbuf.c so I guess this is wrong > > (included in patch on 24/sep) =). I've looked at various combinations of > > this: > > > > memcpy() the old Relation into a new Relation, update smgrunlink() the old > > Relation and heap_storage_create() the new relation. This dies because > > smgrunlink only schedules the drop, where as heap_storage_create() actually > > creates a file on the file system (open() returns with EEXIST). > > > > I've also tried just copying the structure but heap_open() relies on OID > > not relfilenode. > > > > I'm probably going about it the wrong way, but it strikes me that there > > needs to be a way to abstract the relfilenode from OID in the heap access > > code so that one can easily manipulate the heap on disk without having to > > play with OIDs. > > > > I would have included code examples/clearer description but the box I > > don't have access to the box I created the patches on atm =(. > > > > Gavin > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > >
Gavin, please continue working on this feature and continue discussion the hackers list. --------------------------------------------------------------------------- Gavin Sherry wrote: > On Thu, 11 Oct 2001, Tom Lane wrote: > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Can I get a status on this? > > > > It's not gonna happen for 7.2, I think ... > > > > regards, tom lane > > I'd love it to go out with 7.2 but I've had no time to work on this patch > lately. The reason I need time is that, after having fiddled a fair bit, > I've decided that there really needs to be support for the creation of a > new relfilenode in the storage manager. > > The current patch works like this: > > Create new heap (heap_create()) > Copy tuples from old heap to new heap via index scan > Form a new pg_class tuple > simple_heap_update() > update catalogue indices > rebuild existing indices > > This causes an overflow in the localbuf.c so I guess this is wrong > (included in patch on 24/sep) =). I've looked at various combinations of > this: > > memcpy() the old Relation into a new Relation, update smgrunlink() the old > Relation and heap_storage_create() the new relation. This dies because > smgrunlink only schedules the drop, where as heap_storage_create() actually > creates a file on the file system (open() returns with EEXIST). > > I've also tried just copying the structure but heap_open() relies on OID > not relfilenode. > > I'm probably going about it the wrong way, but it strikes me that there > needs to be a way to abstract the relfilenode from OID in the heap access > code so that one can easily manipulate the heap on disk without having to > play with OIDs. > > I would have included code examples/clearer description but the box I > don't have access to the box I created the patches on atm =(. > > Gavin > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I put some work i to this last night. Here's how it works. 1) sanity checks 2) save information (IndexInfo *) about indices on the relation being clustered. 3) build an uncatalogued heap so that the API can be used without too much fuss and get a new relfilenode (heap_create_with_catalog(...,RELKIND_UNCATALOGED,...) 4) scan the index, inserting tuples in index order into the uncatalogued relation (ie, insert data into new relfilenode). 5) update relcache/system with new relfilenode for clustered relation 6) drop/create indices, given (2) 7) smgrunlink() old relfilenode 8) Remove uncatalogued relation from relation from relcache There are two issues here. One is minor: At step three I should create a new toast table entry, and then attach it to the clustered relation at step five. The second point is that postgres doesn't seem to like me creating uncataloged relations and then doing RelationForgetRelation(). I will look a little harder though. Gavin On Sun, 24 Feb 2002, Bruce Momjian wrote: > > Gavin, please continue working on this feature and continue discussion > the hackers list. > > --------------------------------------------------------------------------- > > Gavin Sherry wrote: > > On Thu, 11 Oct 2001, Tom Lane wrote: > > > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > Can I get a status on this? > > > > > > It's not gonna happen for 7.2, I think ... > > > > > > regards, tom lane > > > > I'd love it to go out with 7.2 but I've had no time to work on this patch > > lately. The reason I need time is that, after having fiddled a fair bit, > > I've decided that there really needs to be support for the creation of a > > new relfilenode in the storage manager. > > > > The current patch works like this: > > > > Create new heap (heap_create()) > > Copy tuples from old heap to new heap via index scan > > Form a new pg_class tuple > > simple_heap_update() > > update catalogue indices > > rebuild existing indices > > > > This causes an overflow in the localbuf.c so I guess this is wrong > > (included in patch on 24/sep) =). I've looked at various combinations of > > this: > > > > memcpy() the old Relation into a new Relation, update smgrunlink() the old > > Relation and heap_storage_create() the new relation. This dies because > > smgrunlink only schedules the drop, where as heap_storage_create() actually > > creates a file on the file system (open() returns with EEXIST). > > > > I've also tried just copying the structure but heap_open() relies on OID > > not relfilenode. > > > > I'm probably going about it the wrong way, but it strikes me that there > > needs to be a way to abstract the relfilenode from OID in the heap access > > code so that one can easily manipulate the heap on disk without having to > > play with OIDs. > > > > I would have included code examples/clearer description but the box I > > don't have access to the box I created the patches on atm =(. > > > > Gavin > > > > > >