Re: Table AM and DDLs - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Table AM and DDLs |
Date | |
Msg-id | 20210321231626.mt32oc653b7vmdn2@alap3.anarazel.de Whole thread Raw |
Responses |
Table AM and DROP TABLE [ Was: Table AM and DDLs]
|
List | pgsql-hackers |
Hi, On 2021-03-03 22:15:18 +0100, Mats Kindahl wrote: > On Tue, Feb 23, 2021 at 2:11 AM Andres Freund <andres@anarazel.de> wrote: > Thanks for the answer and sorry about the late reply. Mine is even later ;) > > I don't think that's quite right. It's not exactly obvious from the > > name, but RelationDropStorage() does not actually drop storage. Instead > > it *schedules* the storage to be dropped upon commit. > > > > The reason for deferring the dropping of table storage is that DDL in > > postgres is transactional. Therefore we cannot remove the storage at the > > moment the DROP TABLE is executed - only when the transaction that > > performed the DDL commits. Therefore just providing you with a callback > > that runs in heap_drop_with_catalog() doesn't really achieve much - > > you'd not have a way to execute the "actual" dropping of the relation at > > the later stage. > > > > Yeah, I found the chain (performDeletion -> deleteOneObject -> doDeletion > -> heap_drop_with_catalog) where the delete was just scheduled for deletion > but it appeared like this was the place to actually perform the "actual" > delete. Looking closer, I see this was the wrong location. However, the > intention was to get a callback when the "actual" delete should happen. > Before that, the blocks are still potentially alive and could be read, so > shouldn't be recycled. > > It seems the right location seems to be in the storage manager (smgr_unlink > in smgr.c), but that does not seem to be extensible, or are there any plans > to make it available so that you can implement something other than just > "magnetic disk"? There've been patches to add new types of storage below smgr.c, but not in way that can be done outside of build time. As far as I recall. > > Before I explain some more: Could you describe in a bit more detail what > > kind of optimization you'd like to make? > > > > This is not really about any optimizations, it more about a good API for > tables and managing storage. If a memory table can be implemented entirely > in the extension and storage managed fully, there is a lot of interesting > potential for various implementations of table backends. For this to work I > think it is necessary to be able to handle schema changes for the backend > storage in addition to scans, inserts, updates, and deletes, but I am not > sure if it is already possible in some way that I haven't discovered or if > I should just try to propose something (making the storage manager API > extensible seems like a good first attempt). As long as you have a compatible definition of what is acceptable "in place" ALTER TABLE (e.g. adding new columns, changing between compatible types), and what requires a table rewrite (e.g. an incompatible column type change), I don't see a real problem. Except for the unlink thing above. Any schema change requiring a table rewrite will trigger a new relation to be created, which in turn will involve tableam. After that you'll just get called back to re-insert all the tuples in the original relation. If you want a different definition on what needs a rewrite, good luck, it'll be a heck of a lot more work. > > Due to postgres' transactional DDL you cannot really change the storage > > layout of *existing data* when that DDL command is executed - the data > > still needs to be interpretable in case the DDL is rolled back > > (including when crashing). > > > > No, didn't expect this, but some means to see that a schema change is about > to happen. For anything that's not in-place you'll see a new table being created (c.f. ATRewriteTables() calling make_new_heap()). The relfilenode identifying the data (as opposed to the oid, identifying a relation), will then be swapped with the current table's relfilenode via finish_heap_swap(). > > Other changes, e.g. changing the type of a column "sufficiently", will > > cause a so called table rewrite. Which means that a new relation will be > > created (including a call to relation_set_new_filenode()), then that new > > relation will get all the new data inserted, and then > > pg_class->relfilenode for the "original" relation will be changed to the > > "rewritten" table (there's two variants of this, once for rewrites due > > to ALTER TABLE and a separate one for VACUUM FULL/CLUSTER). > But that is not visible in the access method interface. If I add debug > output to the memory table, I only see a call to needs_toast_table. If > there were a new call to create a new block and some additional information > about , this would be possible to handle. It should be. If I e.g. do CREATE TABLE blarg(id int4 not null); I get one call to table_relation_set_new_filenode() #0 table_relation_set_new_filenode (rel=0x7f84c2417b70, newrnode=0x7f84c2417b70, persistence=112 'p', freezeXid=0x7ffc8c61263c,minmulti=0x7ffc8c612638) at /home/andres/src/postgresql/src/include/access/tableam.h:1596 #1 0x000055b1901e9116 in heap_create (relname=0x7ffc8c612900 "blarg", relnamespace=2200, reltablespace=0, relid=3016410,relfilenode=3016410, accessmtd=2, tupDesc=0x55b191d2a8c8, relkind=114 'r', relpersistence=112 'p', shared_relation=false, mapped_relation=false, allow_system_table_mods=false, relfrozenxid=0x7ffc8c61263c, relminmxid=0x7ffc8c612638) at /home/andres/src/postgresql/src/backend/catalog/heap.c:436 #2 0x000055b1901eab28 in heap_create_with_catalog (relname=0x7ffc8c612900 "blarg", relnamespace=2200, reltablespace=0, relid=3016410,reltypeid=0, reloftypeid=0, ownerid=10, accessmtd=2, tupdesc=0x55b191d2a8c8, cooked_constraints=0x0, relkind=114 'r', relpersistence=112'p', shared_relation=false, mapped_relation=false, oncommit=ONCOMMIT_NOOP, reloptions=0, use_user_acl=true, allow_system_table_mods=false, is_internal=false,relrewrite=0, typaddress=0x0) at /home/andres/src/postgresql/src/backend/catalog/heap.c:1291 #3 0x000055b19030002a in DefineRelation (stmt=0x55b191d31478, relkind=114 'r', ownerId=10, typaddress=0x0, queryString=0x55b191c35bc0 "CREATE TABLE blarg(id int8 not null then when I do ALTER TABLE blarg ALTER COLUMN id TYPE int8; I see #0 table_relation_set_new_filenode (rel=0x7f84c241f2a8, newrnode=0x7f84c241f2a8, persistence=112 'p', freezeXid=0x7ffc8c61275c,minmulti=0x7ffc8c612758) at /home/andres/src/postgresql/src/include/access/tableam.h:1596 #1 0x000055b1901e9116 in heap_create (relname=0x7ffc8c612860 "pg_temp_3016404", relnamespace=2200, reltablespace=0, relid=3016407,relfilenode=3016407, accessmtd=2, tupDesc=0x7f84c24162a0, relkind=114 'r', relpersistence=112 'p', shared_relation=false, mapped_relation=false,allow_system_table_mods=true, relfrozenxid=0x7ffc8c61275c, relminmxid=0x7ffc8c612758) at /home/andres/src/postgresql/src/backend/catalog/heap.c:436 #2 0x000055b1901eab28 in heap_create_with_catalog (relname=0x7ffc8c612860 "pg_temp_3016404", relnamespace=2200, reltablespace=0,relid=3016407, reltypeid=0, reloftypeid=0, ownerid=10, accessmtd=2, tupdesc=0x7f84c24162a0, cooked_constraints=0x0, relkind=114 'r', relpersistence=112'p', shared_relation=false, mapped_relation=false, oncommit=ONCOMMIT_NOOP, reloptions=0, use_user_acl=false, allow_system_table_mods=true, is_internal=true,relrewrite=3016404, typaddress=0x0) at /home/andres/src/postgresql/src/backend/catalog/heap.c:1291 > I *was* expecting either a call of set_filenode with a new xact id, or > something like that, and with some information so that you can locate the > schema change planned (e.g., digging through pg_class and friends), I just > don't see that when I add debug output. You should. And it'll have the new table "schema" associated. E.g. in the above example the new table will have rel->rd_att->natts == 1 rel->rd_att->attrs[0].atttypid == 20 (i.e. int8) Greetings, Andres Freund
pgsql-hackers by date: