Re: Custom table AMs need to include heapam.h because of BulkInsertState - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Custom table AMs need to include heapam.h because of BulkInsertState |
Date | |
Msg-id | CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com Whole thread Raw |
In response to | Re: Custom table AMs need to include heapam.h because ofBulkInsertState (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Custom table AMs need to include heapam.h because ofBulkInsertState
|
List | pgsql-hackers |
On Sat, Jun 1, 2019 at 3:20 PM Andres Freund <andres@anarazel.de> wrote: > > I'd like to think that the best way to deal with that and reduce the > > confusion would be to move anything related to bulk inserts into their > > own header/file, meaning the following set: > > - ReleaseBulkInsertStatePin > > - GetBulkInsertState > > - FreeBulkInsertState > > There is the argument that we could also move that part into tableam.h > > itself though as some of the rather generic table-related callbacks, > > but that seems grotty. So I think that we could just move that stuff > > as backend/access/common/bulkinsert.c. > > Yea, I think we should do that at some point. But I'm not sure this is > the right design. Bulk insert probably needs to rather be something > that's allocated inside the AM. As far as I can see, any on-disk, row-oriented, block-based AM is likely to want the same implementation as the heap. Column stores might want to pin multiple buffers, and an in-memory AM might have a completely different set of requirements, but something like zheap really has no reason to depart from what the heap does. I think it's really important that new table AMs not only have the option to do something different than the heap in any particular area, but that they also have the option to do the SAME thing as the heap without having to duplicate a bunch of code. So I think it would be reasonable to start by doing some pure code movement here, along the lines proposed by Michael -- not sure if src/backend/access/common is right or if it should be src/backend/access/table -- and then add the abstraction afterwards. Worth noting is ReadBufferBI() also needs moving and is a actually a bigger problem than the functions that Michael mentioned, because the other functions are accessible if you're willing to stoop to including heap-specific headers, but that function is static and you'll have to just copy-and-paste it. Uggh. Here's a draft design for adding some abstraction, roughly modeled on the abstraction Andres added for TupleTableSlots: 1. a BulkInsertState becomes a struct whose only member is a pointer to const BulkInsertStateOps *const ops 2. that structure has a member for each defined operation on a BulkInsertState: void (*free)(BulkInsertState *); void (*release_pin)(BulkInsertState *); // maybe rename to make it more generic 3. table AM gets a new member BulkInsertState *(*create_bistate)(Relation Rel) and a corresponding function table_create_bistate(), analogous to table_create_slot(), which can call the constructor function for the appropriate type of BulkInsertState and return the result 4. each type of BulkInsertState has its own functions for making use of it, akin to ReadBufferBI. That particular function signature is only likely to be correct for something that does more-or-less what the existing type of BulkInsertState does; if you're using a column-store that pins multiple buffers or something, you'll need your own code path. But that's OK, because ReadBufferBI or whatever other functions you have are only going to get called from AM-specific code, which will know what type of BulkInsertState they have got, because they are in control of which kind of BulkInsertState gets created for their relations as per point #4, so they can just call the right functions. 5. The current implementation of BulkInsertState gets renamed to BlockBulkInsertState (or something else) and is used by heap and any AMs that like it. I see Michael's point about the relationship between finish_bulk_insert() and the BulkInsertState, and maybe if we could figure that out we could avoid the need for a BulkInsertState to have a free method (or maybe any methods at all, in which case it could just be an opaque struct, like a Node). However, it looks to me as though copy.c can create a bunch of BulkInsertStates but only call finish_bulk_insert() once, so unless that's a bug in need of fixing I don't quite see how to make that approach work. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: