Thread: Custom table AMs need to include heapam.h because of BulkInsertState
Hi all, I have been playing lately with the table AM API to do some stuff, and I got surprised that in the minimum set of headers which needs to be included for a table AM we have a hard dependency with heapam.h for BulkInsertState and vacuum.h for VacuumParams. I am fine to live with the dependency with vacuum.h as it is not that strange. However for BulkInsertState we get a hard dependency with a heap-related area and it seems to me that we had better move that part out of heapam.c, as we want a clear dependency cut with the heap AM for any new custom table AM. I'd like to think that the best way to deal with that and reduce the confusion would be to move anything related to bulk inserts into their own header/file, meaning the following set: - ReleaseBulkInsertStatePin - GetBulkInsertState - FreeBulkInsertState There is the argument that we could also move that part into tableam.h itself though as some of the rather generic table-related callbacks, but that seems grotty. So I think that we could just move that stuff as backend/access/common/bulkinsert.c. Thoughts? -- Michael
Attachment
Hi, On 2019-06-01 15:09:24 -0400, Michael Paquier wrote: > I have been playing lately with the table AM API to do some stuff, and > I got surprised that in the minimum set of headers which needs to be > included for a table AM we have a hard dependency with heapam.h for > BulkInsertState and vacuum.h for VacuumParams. I've noted this before as a future todo. > I'd like to think that the best way to deal with that and reduce the > confusion would be to move anything related to bulk inserts into their > own header/file, meaning the following set: > - ReleaseBulkInsertStatePin > - GetBulkInsertState > - FreeBulkInsertState > There is the argument that we could also move that part into tableam.h > itself though as some of the rather generic table-related callbacks, > but that seems grotty. So I think that we could just move that stuff > as backend/access/common/bulkinsert.c. Yea, I think we should do that at some point. But I'm not sure this is the right design. Bulk insert probably needs to rather be something that's allocated inside the AM. Greetings, Andres Freund
On Sat, Jun 01, 2019 at 12:19:43PM -0700, Andres Freund wrote: > Yea, I think we should do that at some point. But I'm not sure this is > the right design. Bulk insert probably needs to rather be something > that's allocated inside the AM. Yeah, actually you may be right that I am not taking the correct path here. At quick glance it looks that there is a strong relationship between the finish_bulk_insert callback and the bistate free already, so we could do much better than moving the code around. Perhaps we could just have a TODO? As one of the likely-doable items. -- Michael
Attachment
On Sat, Jun 1, 2019 at 3:09 PM Michael Paquier <michael@paquier.xyz> wrote: > I am fine to live with the dependency with vacuum.h as it is not that > strange. However for BulkInsertState we get a hard dependency with a > heap-related area and it seems to me that we had better move that part > out of heapam.c, as we want a clear dependency cut with the heap AM > for any new custom table AM. Yeah, I noticed this, too. +1 for doing something about it. Not sure exactly what is the best approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jun 04, 2019 at 10:18:03AM -0400, Robert Haas wrote: > On Sat, Jun 1, 2019 at 3:09 PM Michael Paquier <michael@paquier.xyz> wrote: >> I am fine to live with the dependency with vacuum.h as it is not that >> strange. However for BulkInsertState we get a hard dependency with a >> heap-related area and it seems to me that we had better move that part >> out of heapam.c, as we want a clear dependency cut with the heap AM >> for any new custom table AM. > > Yeah, I noticed this, too. +1 for doing something about it. Not sure > exactly what is the best approach. One thing which is a bit tricky is that for example with COPY FROM we have a code path which is able to release a buffer held by the bulk insert state. So I think that we could get easily out by combining the bistate free path with finish_bulk_insert, create the bistate within the AM when doing a single or multi tuple insert, and having one extra callback to release a buffer held. Still this last bit does not completely feel right in terms of flexibility and readability. Note as well that we never actually use bistate when calling table_tuple_insert_speculative() on HEAD. I guess that the argument is here for consistency with the tuple_insert callback. Could we do something separately about that? -- Michael
Attachment
On Thu, Jun 6, 2019 at 10:29 PM Michael Paquier <michael@paquier.xyz> wrote: > One thing which is a bit tricky is that for example with COPY FROM we > have a code path which is able to release a buffer held by the bulk > insert state. Are you talking about the call to ReleaseBulkInsertStatePin, or something else? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jun 1, 2019 at 3:20 PM Andres Freund <andres@anarazel.de> wrote: > > I'd like to think that the best way to deal with that and reduce the > > confusion would be to move anything related to bulk inserts into their > > own header/file, meaning the following set: > > - ReleaseBulkInsertStatePin > > - GetBulkInsertState > > - FreeBulkInsertState > > There is the argument that we could also move that part into tableam.h > > itself though as some of the rather generic table-related callbacks, > > but that seems grotty. So I think that we could just move that stuff > > as backend/access/common/bulkinsert.c. > > Yea, I think we should do that at some point. But I'm not sure this is > the right design. Bulk insert probably needs to rather be something > that's allocated inside the AM. As far as I can see, any on-disk, row-oriented, block-based AM is likely to want the same implementation as the heap. Column stores might want to pin multiple buffers, and an in-memory AM might have a completely different set of requirements, but something like zheap really has no reason to depart from what the heap does. I think it's really important that new table AMs not only have the option to do something different than the heap in any particular area, but that they also have the option to do the SAME thing as the heap without having to duplicate a bunch of code. So I think it would be reasonable to start by doing some pure code movement here, along the lines proposed by Michael -- not sure if src/backend/access/common is right or if it should be src/backend/access/table -- and then add the abstraction afterwards. Worth noting is ReadBufferBI() also needs moving and is a actually a bigger problem than the functions that Michael mentioned, because the other functions are accessible if you're willing to stoop to including heap-specific headers, but that function is static and you'll have to just copy-and-paste it. Uggh. Here's a draft design for adding some abstraction, roughly modeled on the abstraction Andres added for TupleTableSlots: 1. a BulkInsertState becomes a struct whose only member is a pointer to const BulkInsertStateOps *const ops 2. that structure has a member for each defined operation on a BulkInsertState: void (*free)(BulkInsertState *); void (*release_pin)(BulkInsertState *); // maybe rename to make it more generic 3. table AM gets a new member BulkInsertState *(*create_bistate)(Relation Rel) and a corresponding function table_create_bistate(), analogous to table_create_slot(), which can call the constructor function for the appropriate type of BulkInsertState and return the result 4. each type of BulkInsertState has its own functions for making use of it, akin to ReadBufferBI. That particular function signature is only likely to be correct for something that does more-or-less what the existing type of BulkInsertState does; if you're using a column-store that pins multiple buffers or something, you'll need your own code path. But that's OK, because ReadBufferBI or whatever other functions you have are only going to get called from AM-specific code, which will know what type of BulkInsertState they have got, because they are in control of which kind of BulkInsertState gets created for their relations as per point #4, so they can just call the right functions. 5. The current implementation of BulkInsertState gets renamed to BlockBulkInsertState (or something else) and is used by heap and any AMs that like it. I see Michael's point about the relationship between finish_bulk_insert() and the BulkInsertState, and maybe if we could figure that out we could avoid the need for a BulkInsertState to have a free method (or maybe any methods at all, in which case it could just be an opaque struct, like a Node). However, it looks to me as though copy.c can create a bunch of BulkInsertStates but only call finish_bulk_insert() once, so unless that's a bug in need of fixing I don't quite see how to make that approach work. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, (David, see bottom if you're otherwise not interested). On 2019-06-07 09:48:29 -0400, Robert Haas wrote: > On Sat, Jun 1, 2019 at 3:20 PM Andres Freund <andres@anarazel.de> wrote: > > > I'd like to think that the best way to deal with that and reduce the > > > confusion would be to move anything related to bulk inserts into their > > > own header/file, meaning the following set: > > > - ReleaseBulkInsertStatePin > > > - GetBulkInsertState > > > - FreeBulkInsertState > > > There is the argument that we could also move that part into tableam.h > > > itself though as some of the rather generic table-related callbacks, > > > but that seems grotty. So I think that we could just move that stuff > > > as backend/access/common/bulkinsert.c. > > > > Yea, I think we should do that at some point. But I'm not sure this is > > the right design. Bulk insert probably needs to rather be something > > that's allocated inside the AM. > > As far as I can see, any on-disk, row-oriented, block-based AM is > likely to want the same implementation as the heap. I'm pretty doubtful about that. It'd e.g. would make a ton of sense to keep the VM pinned, even for heap. You could also do a lot better with toast. And for zheap we'd - unless we change the design - quite possibly benefit from keeping the last needed tpd buffer around. > Here's a draft design for adding some abstraction, roughly modeled on > the abstraction Andres added for TupleTableSlots: Hm, I'm not sure I see the need for a vtable based approach here. Won't every AM know exactly what they need / have? I'm not convinced it's worthwhile to treat that separately from the tableam. I.e. have a BulkInsertState struct with *no* members, and then, as you suggest: > > 3. table AM gets a new member BulkInsertState > *(*create_bistate)(Relation Rel) and a corresponding function > table_create_bistate(), analogous to table_create_slot(), which can > call the constructor function for the appropriate type of > BulkInsertState and return the result but also route the following through the AM: > 2. that structure has a member for each defined operation on a BulkInsertState: > > void (*free)(BulkInsertState *); > void (*release_pin)(BulkInsertState *); // maybe rename to make it more generic Where free would just be part of finish_bulk_insert, and release_pin a new callback. > 4. each type of BulkInsertState has its own functions for making use > of it, akin to ReadBufferBI. Right, I don't think that's avoidable unfortunately. > I see Michael's point about the relationship between > finish_bulk_insert() and the BulkInsertState, and maybe if we could > figure that out we could avoid the need for a BulkInsertState to have > a free method (or maybe any methods at all, in which case it could > just be an opaque struct, like a Node). Right, so we actually eneded up at the same place. And you found a bug: > However, it looks to me as though copy.c can create a bunch of > BulkInsertStates but only call finish_bulk_insert() once, so unless > that's a bug in need of fixing I don't quite see how to make that > approach work. That is a bug. Not a currently "active" one with in-core AMs (no dangerous bulk insert flags ever get set for partitioned tables), but we obviously need to fix it nevertheless. Robert, seems we'll have to - regardless of where we come down on fixing this bug - have to make copy use multiple BulkInsertState's, even in the CIM_SINGLE (with proute) case. Or do you have a better idea? David, any opinions on how to best fix this? It's not extremely obvious how to do so best in the current setup of the partition actually being hidden somewhere a few calls away (i.e. the table_open happening in ExecInitPartitionInfo()). Greetings, Andres Freund
On Fri, Jun 07, 2019 at 08:55:36AM -0400, Robert Haas wrote: > Are you talking about the call to ReleaseBulkInsertStatePin, or something else? Yes, I was referring to ReleaseBulkInsertStatePin() -- Michael
Attachment
On Sat, 8 Jun 2019 at 04:51, Andres Freund <andres@anarazel.de> wrote: > > On 2019-06-07 09:48:29 -0400, Robert Haas wrote: > > However, it looks to me as though copy.c can create a bunch of > > BulkInsertStates but only call finish_bulk_insert() once, so unless > > that's a bug in need of fixing I don't quite see how to make that > > approach work. > > That is a bug. Not a currently "active" one with in-core AMs (no > dangerous bulk insert flags ever get set for partitioned tables), but we > obviously need to fix it nevertheless. > > Robert, seems we'll have to - regardless of where we come down on fixing > this bug - have to make copy use multiple BulkInsertState's, even in the > CIM_SINGLE (with proute) case. Or do you have a better idea? > > David, any opinions on how to best fix this? It's not extremely obvious > how to do so best in the current setup of the partition actually being > hidden somewhere a few calls away (i.e. the table_open happening in > ExecInitPartitionInfo()). That's been overlooked. I agree it's not a bug with heap, since heapam_finish_bulk_insert() only does anything there when we're skipping WAL, which we don't do in copy.c for partitioned tables. However, who knows what other AMs will need, so we'd better fix that. My proposed patch is attached. I ended up moving the call to CopyMultiInsertInfoCleanup() down to after we call table_finish_bulk_insert for the main table. This might not be required but I lack imagination right now to what AMs might put in the finish_bulk_insert call, so doing this seems safer. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Fri, Jun 7, 2019 at 12:51 PM Andres Freund <andres@anarazel.de> wrote: > > As far as I can see, any on-disk, row-oriented, block-based AM is > > likely to want the same implementation as the heap. > > I'm pretty doubtful about that. It'd e.g. would make a ton of sense to > keep the VM pinned, even for heap. You could also do a lot better with > toast. And for zheap we'd - unless we change the design - quite > possibly benefit from keeping the last needed tpd buffer around. That's fair enough to a point, but I'm not trying to enforce code reuse; I'm trying to make it possible. If it's good enough for the heap, which is really the gold standard for AMs until somebody manages to do better, it's entirely reasonable for somebody else to want to just do it the way the heap does. We gain nothing by making that difficult. > > Here's a draft design for adding some abstraction, roughly modeled on > > the abstraction Andres added for TupleTableSlots: > > Hm, I'm not sure I see the need for a vtable based approach here. Won't > every AM know exactly what they need / have? I'm not convinced it's > worthwhile to treat that separately from the tableam. I.e. have a > BulkInsertState struct with *no* members, and then, as you suggest: Hmm, so what we would we do here? Just 'struct BulkInsertState; typedef struct BulkInsertState BulkInsertState;' ... and then never actually define the struct anywhere? > > 3. table AM gets a new member BulkInsertState > > *(*create_bistate)(Relation Rel) and a corresponding function > > table_create_bistate(), analogous to table_create_slot(), which can > > call the constructor function for the appropriate type of > > BulkInsertState and return the result > > but also route the following through the AM: > > > 2. that structure has a member for each defined operation on a BulkInsertState: > > > > void (*free)(BulkInsertState *); > > void (*release_pin)(BulkInsertState *); // maybe rename to make it more generic > > Where free would just be part of finish_bulk_insert, and release_pin a > new callback. OK, that's an option. I guess we'd change free_bulk_insert to take the BulkInsertState as an additional option? > Robert, seems we'll have to - regardless of where we come down on fixing > this bug - have to make copy use multiple BulkInsertState's, even in the > CIM_SINGLE (with proute) case. Or do you have a better idea? Nope. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 10 Jun 2019 at 11:45, David Rowley <david.rowley@2ndquadrant.com> wrote: > > On Sat, 8 Jun 2019 at 04:51, Andres Freund <andres@anarazel.de> wrote: > > David, any opinions on how to best fix this? It's not extremely obvious > > how to do so best in the current setup of the partition actually being > > hidden somewhere a few calls away (i.e. the table_open happening in > > ExecInitPartitionInfo()). > > That's been overlooked. I agree it's not a bug with heap, since > heapam_finish_bulk_insert() only does anything there when we're > skipping WAL, which we don't do in copy.c for partitioned tables. > However, who knows what other AMs will need, so we'd better fix that. > > My proposed patch is attached. > > I ended up moving the call to CopyMultiInsertInfoCleanup() down to > after we call table_finish_bulk_insert for the main table. This might > not be required but I lack imagination right now to what AMs might put > in the finish_bulk_insert call, so doing this seems safer. Andres, do you want to look at this before I look again? Do you see any issue with calling table_finish_bulk_insert() when the partition's CopyMultiInsertBuffer is evicted from the CopyMultiInsertInfo rather than at the end of the copy? It can mean that we call the function multiple times per partition. I assume the function is only really intended to flush bulk inserted tuple to the storage, so calling it more than once would just mean an inefficiency rather than a bug. Let me know your thoughts. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On June 12, 2019 6:42:11 PM PDT, David Rowley <david.rowley@2ndquadrant.com> wrote: >On Mon, 10 Jun 2019 at 11:45, David Rowley ><david.rowley@2ndquadrant.com> wrote: >> >> On Sat, 8 Jun 2019 at 04:51, Andres Freund <andres@anarazel.de> >wrote: >> > David, any opinions on how to best fix this? It's not extremely >obvious >> > how to do so best in the current setup of the partition actually >being >> > hidden somewhere a few calls away (i.e. the table_open happening in >> > ExecInitPartitionInfo()). >> >> That's been overlooked. I agree it's not a bug with heap, since >> heapam_finish_bulk_insert() only does anything there when we're >> skipping WAL, which we don't do in copy.c for partitioned tables. >> However, who knows what other AMs will need, so we'd better fix that. >> >> My proposed patch is attached. >> >> I ended up moving the call to CopyMultiInsertInfoCleanup() down to >> after we call table_finish_bulk_insert for the main table. This might >> not be required but I lack imagination right now to what AMs might >put >> in the finish_bulk_insert call, so doing this seems safer. > >Andres, do you want to look at this before I look again? > >Do you see any issue with calling table_finish_bulk_insert() when the >partition's CopyMultiInsertBuffer is evicted from the >CopyMultiInsertInfo rather than at the end of the copy? It can mean >that we call the function multiple times per partition. I assume the >function is only really intended to flush bulk inserted tuple to the >storage, so calling it more than once would just mean an inefficiency >rather than a bug. > >Let me know your thoughts. I'm out on vacation until Monday (very needed, pretty damn exhausted). So I can't really give you a in depth answer rightnow. Off the cuff, I'd say it's worthwhile to try somewhat hard to avoid superfluous finish calls. They can be quite expensive(fsync), and we ought to have nearly all the state for doing it only as much as necessary. Possibly we need onebool per partition to track whether any rows where inserted, but thats peanuts in comparison to all the other state. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Fri, 14 Jun 2019 at 07:53, Andres Freund <andres@anarazel.de> wrote: > > On June 12, 2019 6:42:11 PM PDT, David Rowley <david.rowley@2ndquadrant.com> wrote: > >Do you see any issue with calling table_finish_bulk_insert() when the > >partition's CopyMultiInsertBuffer is evicted from the > >CopyMultiInsertInfo rather than at the end of the copy? It can mean > >that we call the function multiple times per partition. I assume the > >function is only really intended to flush bulk inserted tuple to the > >storage, so calling it more than once would just mean an inefficiency > >rather than a bug. > > > >Let me know your thoughts. > > I'm out on vacation until Monday (very needed, pretty damn exhausted). So I can't really give you a in depth answer rightnow. > > Off the cuff, I'd say it's worthwhile to try somewhat hard to avoid superfluous finish calls. They can be quite expensive(fsync), and we ought to have nearly all the state for doing it only as much as necessary. Possibly we need onebool per partition to track whether any rows where inserted, but thats peanuts in comparison to all the other state. No worries. I'll just park this patch here until you're ready to give it a look. With the attached version I'm just calling table_finish_bulk_insert() once per partition at the end of CopyFrom(). We've got an array with all the ResultRelInfos we touched in the proute, so it's mostly a matter of looping over that array and calling the function on each ResultRelInfo's ri_RelationDesc. However, to make it more complex, PartitionTupleRouting is private to execPartition.c so we can't do this directly... After staring at my screen for a while, I decided to write a function that calls a callback function on each ResultRelInfo in the PartitionTupleRouting. The three alternative ways I thought of were: 1) Put PartitionTupleRouting back into execPartition.h and write the loop over each ResultRelInfo in copy.c. 2) Write a specific function in execPartition.c that calls table_finish_bulk_insert() 3) Modify ExecCleanupTupleRouting to pass in the ti_options and a bool to say if it should call table_finish_bulk_insert() or not. I didn't really like either of those. For #1, I'd rather keep it private. For #2, it just seems a bit too specific a function to go into execPartition.c. For #3 I really don't want to slow down ExecCleanupTupleRouting() any. I designed those to be as fast as possible since they're called for single-row INSERTs into partitioned tables. Quite a bit of work went into PG12 to make those fast. Of course, someone might see one of the alternatives as better than what the patch does, so comments welcome. The other thing I noticed is that we call table_finish_bulk_insert(cstate->rel, ti_options); in copy.c regardless of if we've done any bulk inserts or not. Perhaps that should be under an if (insertMethod != CIM_SINGLE) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Sat, Jun 15, 2019 at 12:25:12PM +1200, David Rowley wrote: > With the attached version I'm just calling table_finish_bulk_insert() > once per partition at the end of CopyFrom(). We've got an array with > all the ResultRelInfos we touched in the proute, so it's mostly a > matter of looping over that array and calling the function on each > ResultRelInfo's ri_RelationDesc. However, to make it more complex, > PartitionTupleRouting is private to execPartition.c so we can't do > this directly... After staring at my screen for a while, I decided to > write a function that calls a callback function on each ResultRelInfo > in the PartitionTupleRouting. Don't take me bad, but I find the solution of defining and using a new callback to call the table AM callback not really elegant, and keeping all table AM callbacks called at a higher level than the executor makes the code easier to follow. Shouldn't we try to keep any calls to table_finish_bulk_insert() within copy.c for each partition instead? > The other thing I noticed is that we call > table_finish_bulk_insert(cstate->rel, ti_options); in copy.c > regardless of if we've done any bulk inserts or not. Perhaps that > should be under an if (insertMethod != CIM_SINGLE) Yeah, good point. -- Michael
Attachment
On Mon, 24 Jun 2019 at 22:16, Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Jun 15, 2019 at 12:25:12PM +1200, David Rowley wrote: > > With the attached version I'm just calling table_finish_bulk_insert() > > once per partition at the end of CopyFrom(). We've got an array with > > all the ResultRelInfos we touched in the proute, so it's mostly a > > matter of looping over that array and calling the function on each > > ResultRelInfo's ri_RelationDesc. However, to make it more complex, > > PartitionTupleRouting is private to execPartition.c so we can't do > > this directly... After staring at my screen for a while, I decided to > > write a function that calls a callback function on each ResultRelInfo > > in the PartitionTupleRouting. > > Don't take me bad, but I find the solution of defining and using a new > callback to call the table AM callback not really elegant, and keeping > all table AM callbacks called at a higher level than the executor > makes the code easier to follow. Shouldn't we try to keep any calls > to table_finish_bulk_insert() within copy.c for each partition > instead? I'm not quite sure if I follow you since the call to table_finish_bulk_insert() is within copy.c still. The problem was that PartitionTupleRouting is private to execPartition.c, and we need a way to determine which of the partitions we routed tuples to. It seems inefficient to flush all of them if only a small number had tuples inserted into them and to me, it seems inefficient to add some additional tracking in CopyFrom(), like a hash table to store partition Oids that we inserted into. Using PartitionTupleRouting makes sense. It's just a question of how to access it, which is not so easy due to it being private. I did suggest a few other ways that we could solve this. I'm not so clear on which one of those you're suggesting or if you're thinking of something new. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, 24 Jun 2019 at 23:12, David Rowley <david.rowley@2ndquadrant.com> wrote: > > On Mon, 24 Jun 2019 at 22:16, Michael Paquier <michael@paquier.xyz> wrote: > > > > Don't take me bad, but I find the solution of defining and using a new > > callback to call the table AM callback not really elegant, and keeping > > all table AM callbacks called at a higher level than the executor > > makes the code easier to follow. Shouldn't we try to keep any calls > > to table_finish_bulk_insert() within copy.c for each partition > > instead? > > I'm not quite sure if I follow you since the call to > table_finish_bulk_insert() is within copy.c still. > > The problem was that PartitionTupleRouting is private to > execPartition.c, and we need a way to determine which of the > partitions we routed tuples to. It seems inefficient to flush all of > them if only a small number had tuples inserted into them and to me, > it seems inefficient to add some additional tracking in CopyFrom(), > like a hash table to store partition Oids that we inserted into. Using > PartitionTupleRouting makes sense. It's just a question of how to > access it, which is not so easy due to it being private. > > I did suggest a few other ways that we could solve this. I'm not so > clear on which one of those you're suggesting or if you're thinking of > something new. Any further thoughts on this Michael? Or Andres? Do you have a preference to which of the approaches (mentioned upthread) I use for the fix? If I don't hear anything I'll probably just push the first fix. The inefficiency does not affect heap, so likely the people with the most interest in improving that will be authors of other table AMs that actually do something during table_finish_bulk_insert() for partitions. We could revisit this in PG13 if someone comes up with a need to improve things here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, 30 Jun 2019 at 17:54, David Rowley <david.rowley@2ndquadrant.com> wrote: > Any further thoughts on this Michael? > > Or Andres? Do you have a preference to which of the approaches > (mentioned upthread) I use for the fix? > > If I don't hear anything I'll probably just push the first fix. The > inefficiency does not affect heap, so likely the people with the most > interest in improving that will be authors of other table AMs that > actually do something during table_finish_bulk_insert() for > partitions. We could revisit this in PG13 if someone comes up with a > need to improve things here. I've pushed the original patch plus a small change to only call table_finish_bulk_insert() for the target of the copy when we're using bulk inserts. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jul 02, 2019 at 01:26:26AM +1200, David Rowley wrote: > I've pushed the original patch plus a small change to only call > table_finish_bulk_insert() for the target of the copy when we're using > bulk inserts. Yes, sorry for coming late at the party here. What I meant previously is that I did not find the version published at [1] to be natural with its structure to define an executor callback which then calls a callback for table AMs, still I get your point that it would be better to try to avoid unnecessary fsync calls on partitions where no tuples have been redirected with a COPY. The version 1 of the patch attached at [2] felt much more straight-forward and cleaner by keeping all the table AM callbacks within copy.c. This has been reverted as of f5db56f, still it seems to me that this was moving in the right direction. [1]: https://postgr.es/m/CAKJS1f95sB21LBF=1MCsEV+XLtA_JC3mtXx5kgDuHDsOGoWhKg@mail.gmail.com [2]: https://postgr.es/m/CAKJS1f_0t-K0_3xe+erXPQ-jgaOb6tRZayErCXF2RpGdUVMt9g@mail.gmail.com -- Michael
Attachment
On Wed, 3 Jul 2019 at 19:35, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jul 02, 2019 at 01:26:26AM +1200, David Rowley wrote: > > I've pushed the original patch plus a small change to only call > > table_finish_bulk_insert() for the target of the copy when we're using > > bulk inserts. > > Yes, sorry for coming late at the party here. What I meant previously > is that I did not find the version published at [1] to be natural with > its structure to define an executor callback which then calls a > callback for table AMs, still I get your point that it would be better > to try to avoid unnecessary fsync calls on partitions where no tuples > have been redirected with a COPY. The version 1 of the patch attached > at [2] felt much more straight-forward and cleaner by keeping all the > table AM callbacks within copy.c. > > This has been reverted as of f5db56f, still it seems to me that this > was moving in the right direction. I think the only objection to doing it the way [2] did was, if there are more than MAX_PARTITION_BUFFERS partitions then we may end up evicting the CopyMultiInsertBuffer out of the CopyMultiInsertInfo and thus cause a call to table_finish_bulk_insert() before we're done with the copy. It's not impossible that this could happen many times for a given partition. I agree that a working version of [2] is cleaner than [1] but it's just the thought of those needless calls. For [1], I wasn't very happy with the way it turned out which is why I ended up suggesting a few other ideas. I just don't really like either of them any better than [1], so I didn't chase those up, and that's why I ended up going for [2]. If you think any of the other ideas I suggested are better (apart from [2]) then let me know and I can see about writing a patch. Otherwise, I plan to just fix [2] and push. > [1]: https://postgr.es/m/CAKJS1f95sB21LBF=1MCsEV+XLtA_JC3mtXx5kgDuHDsOGoWhKg@mail.gmail.com > [2]: https://postgr.es/m/CAKJS1f_0t-K0_3xe+erXPQ-jgaOb6tRZayErCXF2RpGdUVMt9g@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, 3 Jul 2019 at 19:35, Michael Paquier <michael@paquier.xyz> wrote: > This has been reverted as of f5db56f, still it seems to me that this > was moving in the right direction. I've pushed this again, this time with the cleanup code done in the right order. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi David, On Wed, Jul 10, 2019 at 09:40:59PM +1200, David Rowley wrote: > On Wed, 3 Jul 2019 at 19:35, Michael Paquier <michael@paquier.xyz> wrote: >> This has been reverted as of f5db56f, still it seems to me that this >> was moving in the right direction. > > I've pushed this again, this time with the cleanup code done in the > right order. I have spent some time lately analyzing f7c830f as I was curious about the logic behind it, and FWIW the result looks good. Thanks! -- Michael
Attachment
Hi, Sorry for not chiming in again earlier, I was a bit exhausted... On 2019-07-03 19:46:06 +1200, David Rowley wrote: > I think the only objection to doing it the way [2] did was, if there > are more than MAX_PARTITION_BUFFERS partitions then we may end up > evicting the CopyMultiInsertBuffer out of the CopyMultiInsertInfo and > thus cause a call to table_finish_bulk_insert() before we're done with > the copy. Right. > It's not impossible that this could happen many times for a > given partition. I agree that a working version of [2] is cleaner > than [1] but it's just the thought of those needless calls. I think it's fairly important to optimize this. E.g. emitting unnecessary fsyncs as it'd happen for heap is a pretty huge constant to add to bulk loading. > For [1], I wasn't very happy with the way it turned out which is why I > ended up suggesting a few other ideas. I just don't really like either > of them any better than [1], so I didn't chase those up, and that's > why I ended up going for [2]. Yea, I don't like [1] either - they all seems too tied to copy.c's usage. Ideas: 1) Have ExecFindPartition() return via a bool* whether the partition is being accessed for the first time. In copy.c push the partition onto a list of to-be-bulk-finished tables. 2) Add a execPartition.c function that returns all the used tables from a PartitionTupleRouting*. both seem cleaner to me than your proposals in [1], albeit not perfect either. I think knowing which partitions are referenced is a reasonable thing to want from the partition machinery. But using bulk-insert etc seems outside of execPartition.c's remit, so doing that in copy.c seems to make sense. Greetings, Andres Freund
On Wed, 17 Jul 2019 at 06:46, Andres Freund <andres@anarazel.de> wrote: > 1) Have ExecFindPartition() return via a bool* whether the partition is > being accessed for the first time. In copy.c push the partition onto > a list of to-be-bulk-finished tables. > 2) Add a execPartition.c function that returns all the used tables from > a PartitionTupleRouting*. #2 seems better than #1 as it does not add overhead to ExecFindPartition(). Are you thinking this should go back into v12, or just for v13 only? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2019-07-18 11:29:37 +1200, David Rowley wrote: > On Wed, 17 Jul 2019 at 06:46, Andres Freund <andres@anarazel.de> wrote: > > 1) Have ExecFindPartition() return via a bool* whether the partition is > > being accessed for the first time. In copy.c push the partition onto > > a list of to-be-bulk-finished tables. > > 2) Add a execPartition.c function that returns all the used tables from > > a PartitionTupleRouting*. > > #2 seems better than #1 as it does not add overhead to ExecFindPartition(). I don't see how #1 would add meaningful overhead compared to the other costs of that function. Wouldn't it just be adding if (isnew) *isnew = false; to the "/* ResultRelInfo already built */" branch, and the reverse to the else? That got to be several orders of magnitude cheaper than e.g. FormPartitionKeyDatum() which is unconditionally executed? > Are you thinking this should go back into v12, or just for v13 only? Not sure, tbh. Probably depends a bit on how complicated it'd look? Greetings, Andres Freund
On Thu, 18 Jul 2019 at 11:36, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2019-07-18 11:29:37 +1200, David Rowley wrote: > > On Wed, 17 Jul 2019 at 06:46, Andres Freund <andres@anarazel.de> wrote: > > > 1) Have ExecFindPartition() return via a bool* whether the partition is > > > being accessed for the first time. In copy.c push the partition onto > > > a list of to-be-bulk-finished tables. > > > 2) Add a execPartition.c function that returns all the used tables from > > > a PartitionTupleRouting*. > > > > #2 seems better than #1 as it does not add overhead to ExecFindPartition(). > > I don't see how #1 would add meaningful overhead compared to the other > costs of that function. Wouldn't it just be adding if (isnew) *isnew = > false; to the "/* ResultRelInfo already built */" branch, and the > reverse to the else? Yes > That got to be several orders of magnitude cheaper > than e.g. FormPartitionKeyDatum() which is unconditionally executed? Probably. However, I spent quite a bit of time trying to make that function as fast as possible in v12, and since #2 seems like a perfectly good alternative, I'd rather go with that than to add pollution to ExecFindPartition's signature. Also, #2 seems better since it keeps CopyFrom() from having to maintain a list. I think we all agreed somewhere that that code is more complex than we'd all like it to be. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2019-07-18 11:57:44 +1200, David Rowley wrote: > However, I spent quite a bit of time trying to make that function as > fast as possible in v12, and since #2 seems like a perfectly good > alternative, I'd rather go with that than to add pollution to > ExecFindPartition's signature. Also, #2 seems better since it keeps > CopyFrom() from having to maintain a list. I think we all agreed > somewhere that that code is more complex than we'd all like it to be. Fair enough. One last thought for #1: I was wondering about is whether a the bool * approach might be useful for nodeModifyTable.c too? I thought that maybe that could be used to avoid some checks for setting up per partition state, but it seems not to be the case ATM. Greetings, Andres Freund
On Wed, 17 Jul 2019 at 06:46, Andres Freund <andres@anarazel.de> wrote: > 2) Add a execPartition.c function that returns all the used tables from > a PartitionTupleRouting*. Here's a patch which implements it that way. I struggled a bit to think of a good name for the execPartition.c function. I ended up with ExecGetRoutedToRelations. I'm open to better ideas. I also chose to leave the change of function signatures done in f7c830f1a in place. I don't think the additional now unused parameter is that out of place. Also, the function is inlined, so removing it wouldn't help performance any. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services