Thread: RFC: list API / memory allocations
Hi List, In many scenarios memory allocation is one of the top 3 functions showing up in profiles. Looking at hierarchical profiles (-fno-omit-frame-pointer) at least during parsing, planning and executor startup most of that is spent around the list API. Many - especially in the parse-analyze phase - of those allocations can be avoided because the lists are immutable and their size is known upfront. Some examples: parser: list_make$n, buildRelationAliases planner: expression_tree_mutator, get_relation_info; executor: ExecInitExpr For that I added new functions/defines which allocate all the needed memory in one hunk: list_immutable_make$n(), List *list_new_immutable_n(NodeTag t, size_t n); List *list_make_n(NodeTag t, size_t n, ...); With those I could avoid about 1/3 of the allocations in some example scenarios (pgbench, custom benchmarks) by replacing a few notable callsites to use statically allocated lists. The obvious problem with that approach is that approach is that those List's and ListCell's cannot be individually deallocated. Which especially in the planner isn't done anyway. To check that I added an allocate_only to both when USE_ASSERT_CHECKING. Using that approach I did measure improvements between 0.5-20% depending on the statement (using -M simple). Complex statements which are quick to execute and not too complicated to plan are the ones benefiting most. Which is not surprising. The questions I have here: * is that approach acceptable? I converted a the most notable callsites and the benefit is quite measurable. On the other hand one has to be rather careful when analyzing whether lists will be deallocated later on. Also its touching rather much code * I plan to add list_copy_immutable as another function as that is the next biggest scenario where memory is allocated in forseeable scenarios. * any suggestions what name to use instead of immutable? That doesn't really cut the real meaning of "allocate only". I didn't find a name though. Then there is the 2/3 of calls which I couldn't beat with that approach. Mostly because the size of the lists is too hard to figure out. My current approach to that would be to preallocate listcells by some caller defined amount. A *very* quick hack which always overallocates Lists to contain 10 elements yields promising results (around another 2-20% of parse only time). The problem with that is that is that the most sensible way seems to be to define the amount of preallocation per list. Which is currently not easily representable because empty lists are just 0/NILL. So I have 3 solutions. All of which are not that great: * allow empty Lists. This possibly requires a bit of code churn but seems somewhat sensible. * Save the amount of preallocation needed in NIL by defining that no pointer can be less than say 0x100. That also requires some churn and gets points for uglyness * always overallocate new lists. This is rather wasteful. Any ideas? if anybody wants the preliminary patches I am happy to send them but I would much rather prefer to make them somewhat after input. Andres
Andres Freund <andres@anarazel.de> writes: > In many scenarios memory allocation is one of the top 3 functions showing up > in profiles. Looking at hierarchical profiles (-fno-omit-frame-pointer) at least > during parsing, planning and executor startup most of that is spent around the > list API. > Many - especially in the parse-analyze phase - of those allocations can be > avoided because the lists are immutable and their size is known upfront. The fundamental problem with all of those proposals is that now you have some lists in the system that aren't like other lists, and will result in dumping core if the wrong sorts of operations are applied to them. I don't particularly care for introducing that kind of fragility into the system in return for marginal speed gains. I'm not impressed by Asserts showing that no such thing happens in the cases you tested; the test coverage won't be complete, and even if it is, innocent-looking code changes later on could create new problems. Now, if you could do something that *doesn't* restrict what operations could be applied to the lists, that would be good. I've wished for a long while that we could allocate the list header and the first list cell in a single palloc cycle. This would basically require getting list_delete_cell to refrain from pfree'ing a cell that got allocated that way, which is easy as long as you have the list header at hand, but what happens if the list is later concat'd to another? A subsequent delete operation would be referring to the other list header and would come to the wrong conclusion. While thinking about this just now, it occurred to me that maybe the issues could be dodged if the cell, not the header, were first in the combined palloc block. list_concat is then no longer a problem, as long as it doesn't try to throw away the second list's header. But I haven't thought long enough to be sure how well that would work. regards, tom lane
On Friday, November 18, 2011 10:11:29 PM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > In many scenarios memory allocation is one of the top 3 functions showing > > up in profiles. Looking at hierarchical profiles > > (-fno-omit-frame-pointer) at least during parsing, planning and executor > > startup most of that is spent around the list API. > > > > Many - especially in the parse-analyze phase - of those allocations can > > be avoided because the lists are immutable and their size is known > > upfront. > > The fundamental problem with all of those proposals is that now you have > some lists in the system that aren't like other lists, and will result > in dumping core if the wrong sorts of operations are applied to them. > I don't particularly care for introducing that kind of fragility into > the system in return for marginal speed gains. I'm not impressed by > Asserts showing that no such thing happens in the cases you tested; > the test coverage won't be complete, and even if it is, innocent-looking > code changes later on could create new problems. Yes. I dislike that myself (as noted). It seems rather fragile although I think at least during parsing we could simply generally refrain from parsing I don't think that the gains are marginal though. After covering only a small number of cases there are not uncommon/artificial cases gaining more than 20% parsing speed. One prime example of workloads benefiting hugely is something like SELECT * FROM pg_class WHERE oid = ...; Essentially all queries which request few rows with a large number of columns benefit rather measurably. > Now, if you could do something that *doesn't* restrict what operations > could be applied to the lists, that would be good. If every list cell/header would grow another field "allocate_only" (which I currently added for cassert only) those could just get skipped when deleting. Some places are directly freeing list headers but that seems to be a bad idea anyway. The disadvantage is that those would essentially be there till context reset unless done via some special api. Also not that nice... On the other hand only very few callsites free list(-cells) during parsing anyway. Without looking I didn't see measurable increase in memory usage due to the new field with that approach. The only way out of that seems to be to add refcounted list cells/headers :(. I fear that would be rather complex, expensive and failure prone so I don't like to go there. > I've wished for a long while that we could allocate the list header and > the first list cell in a single palloc cycle. Yea. Although that is only a rather small portion of the problems/allocations. > This would basically > require getting list_delete_cell to refrain from pfree'ing a cell that > got allocated that way, which is easy as long as you have the list > header at hand, but what happens if the list is later concat'd to > another? A subsequent delete operation would be referring to the other > list header and would come to the wrong conclusion. I don't think any such scheme is safe. > While thinking about this just now, it occurred to me that maybe the > issues could be dodged if the cell, not the header, were first in the > combined palloc block. list_concat is then no longer a problem, as long > as it doesn't try to throw away the second list's header. But I haven't > thought long enough to be sure how well that would work. I don't think that would work without carefully revising list usage all around... Several places remove nodes from a list and then do list_free() on the remainder. Something aside: For my POC memory allocator I added "intrusive" lists which have the next, prev elements embedded in the stored element. I wonder if some of the list usage could be replaced by such a scheme. Obviously for every embeded list_node a Node can only be in one list... Andres
Andres, * Andres Freund (andres@anarazel.de) wrote: > For that I added new functions/defines which allocate all the needed memory in > one hunk: > list_immutable_make$n(), > List *list_new_immutable_n(NodeTag t, size_t n); > List *list_make_n(NodeTag t, size_t n, ...); A while back, I posted a patch to try and address this same issue. The approach that I took was to always pre-allocate a certain (#defined) amount (think it was 5 or 10 elements). There were a number of places that caused problems with that approach because they hacked on the list element structures directly (instead of using the macros/functions)- you'll want to watch out for those areas in any work on lists. That patch is here: http://archives.postgresql.org/pgsql-hackers/2011-05/msg01213.php The thread on it might also be informative. I do like your approach of being able to pass the ultimate size of the list in.. Perhaps the two approaches could be merged? I was able to make everything work with my approach, provided all the callers used the list API (I did that by making sure the links, etc, actually pointed to the right places in the pre-allocated array). One downside was that the size ended up being larger that it might have been in some cases. Thanks, Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Now, if you could do something that *doesn't* restrict what operations > could be applied to the lists, that would be good. If the API is followed, I believe my previous patch works for everything, but it wasn't variable about the size of the new list. Perhaps we could combine the two approaches, though there would be more overhead from dealing with a variable bitmap for what's currently used. > I've wished for a long while that we could allocate the list header and > the first list cell in a single palloc cycle. You've mentioned that before and, to be honest, I could have sworn that we're doing that already.. Thanks, Stephen
On Sat, Nov 19, 2011 at 12:33 PM, Stephen Frost <sfrost@snowman.net> wrote: > You've mentioned that before and, to be honest, I could have sworn that > we're doing that already.. I tried to write a patch for that at one point, but it crashed and burned over the exact same set of issues discussed upthread, which I wasn't able to resolve satisfactorily. It's just really difficult to change the API for something like memory allocation after the fact; it's too hard to find the bits of code that do whatever naughty thing you don't want them to. One random idea... would there by any sense in having a palloc-like function that is defined to allocate multiple objects at once? In other words, if you need 4 list cells, then instead of asking palloc for them one at a time, you make one function call and get four pointers back at one go. I'm not sure whether that would help at all; palloc might not be able to take advantage of the additional information usefully. To some extent I feel like this is all optimizing something that's likely already so well-optimized that future gains, if any, are likely to be small. I feel like the only way we're likely to get much of a win here is if we can reduce the amount of memory that has to be allocated in the first place (allocate fewer data structures, don't copy them as much, etc.). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Did we make any headway on this? --------------------------------------------------------------------------- On Sat, Nov 19, 2011 at 12:31:09PM -0500, Stephen Frost wrote: > Andres, > > * Andres Freund (andres@anarazel.de) wrote: > > For that I added new functions/defines which allocate all the needed memory in > > one hunk: > > list_immutable_make$n(), > > List *list_new_immutable_n(NodeTag t, size_t n); > > List *list_make_n(NodeTag t, size_t n, ...); > > A while back, I posted a patch to try and address this same issue. The > approach that I took was to always pre-allocate a certain (#defined) > amount (think it was 5 or 10 elements). There were a number of places > that caused problems with that approach because they hacked on the list > element structures directly (instead of using the macros/functions)- > you'll want to watch out for those areas in any work on lists. > > That patch is here: > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01213.php > > The thread on it might also be informative. > > I do like your approach of being able to pass the ultimate size of the > list in.. Perhaps the two approaches could be merged? I was able to > make everything work with my approach, provided all the callers used the > list API (I did that by making sure the links, etc, actually pointed to > the right places in the pre-allocated array). One downside was that the > size ended up being larger that it might have been in some cases. > > Thanks, > > Stephen -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Is this a TODO? --------------------------------------------------------------------------- On Sat, Nov 19, 2011 at 12:31:09PM -0500, Stephen Frost wrote: > Andres, > > * Andres Freund (andres@anarazel.de) wrote: > > For that I added new functions/defines which allocate all the needed memory in > > one hunk: > > list_immutable_make$n(), > > List *list_new_immutable_n(NodeTag t, size_t n); > > List *list_make_n(NodeTag t, size_t n, ...); > > A while back, I posted a patch to try and address this same issue. The > approach that I took was to always pre-allocate a certain (#defined) > amount (think it was 5 or 10 elements). There were a number of places > that caused problems with that approach because they hacked on the list > element structures directly (instead of using the macros/functions)- > you'll want to watch out for those areas in any work on lists. > > That patch is here: > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01213.php > > The thread on it might also be informative. > > I do like your approach of being able to pass the ultimate size of the > list in.. Perhaps the two approaches could be merged? I was able to > make everything work with my approach, provided all the callers used the > list API (I did that by making sure the links, etc, actually pointed to > the right places in the pre-allocated array). One downside was that the > size ended up being larger that it might have been in some cases. > > Thanks, > > Stephen -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > Did we make any headway on this? I've got the code written but I need to test it in a stable environment to see what kind of improvment it provides. I've actually been fighting for the past 2 months with the box that I want to use and think I may just give up and steal a different server. It hasn't fallen off my list of things to do, just hasn't been a high priority. Thanks, Stephen
On Thu, Aug 16, 2012 at 09:51:28PM -0400, Stephen Frost wrote: > Bruce, > > * Bruce Momjian (bruce@momjian.us) wrote: > > Did we make any headway on this? > > I've got the code written but I need to test it in a stable environment > to see what kind of improvment it provides. I've actually been fighting > for the past 2 months with the box that I want to use and think I may > just give up and steal a different server. > > It hasn't fallen off my list of things to do, just hasn't been a high > priority. OK, thanks for the status update. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +