Thread: counting pallocs
The attached patch provides some rough instrumentation for determining where palloc calls are coming from. This is obviously just for noodling around with, not for commit, and there may well be bugs. But enjoy. I gave this a quick spin on a couple of test workloads: a very short pgbench test, a very short pgbench -S test, and the regression tests. On the pgbench test, the top culprits are ExecInitExpr() and expression_tree_mutator(); in both cases, the lappend() call for the T_List case is the major contributor. Other significant contributors include _copyVar(), which I haven't drilled into terribly far but seems to be coming mostly from add_vars_to_targetlist(); buildRelationAliases() via lappend, pstrdup, and makeString; ExecAllocTupleTableSlot(); and makeColumnRef() via makeNode, lcons, and makeString. The pgbench -S results are similar, but build_physical_tlist() also pops up fairly high. On the regression tests, heap_tuple_untoast_attr() is at the very top of the list, and specifically for the VARATT_IS_SHORT() case. It might be good to disaggregate this some more, but I'm too tired for that right now. index_form_tuple()'s palloc0 call comes in second, and heap_form_minimal_tuple()'s palloc0 is third. LockAcquireExtended()'s allocation of a new LOCALLOCK entry also comes in prettyhigh; ExecInitExpr() shows up here too; and heap_form_tuple() shows up as well. One piece of reasonably low-hanging fruit appears to be OpExpr. It seems like it would be better all around to put Node *arg1 and Node *arg2 in there instead of a list... aside from saving pallocs, it seems like it would generally simplify the code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 17.05.2012 06:43, Robert Haas wrote: > The attached patch provides some rough instrumentation for determining > where palloc calls are coming from. This is obviously just for > noodling around with, not for commit, and there may well be bugs. But > enjoy. > > I gave this a quick spin on a couple of test workloads: a very short > pgbench test, a very short pgbench -S test, and the regression tests. > On the pgbench test, the top culprits are ExecInitExpr() and > expression_tree_mutator(); in both cases, the lappend() call for the > T_List case is the major contributor. Other significant contributors > include _copyVar(), which I haven't drilled into terribly far but > seems to be coming mostly from add_vars_to_targetlist(); > buildRelationAliases() via lappend, pstrdup, and makeString; > ExecAllocTupleTableSlot(); and makeColumnRef() via makeNode, lcons, > and makeString. What percentage of total CPU usage is the palloc() overhead in these tests? If we could totally eliminate the palloc() overhead, how much faster would the test run? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, May 17, 2012 at 2:28 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > What percentage of total CPU usage is the palloc() overhead in these tests? > If we could totally eliminate the palloc() overhead, how much faster would > the test run? AllocSetAlloc is often the top CPU consumer in profiling results, but it's typically only in the single-digit percentages. However, there's also some distributed overhead that's more difficult to measure. For example, the fact that OpExpr uses a List instead of directly pointing to its arguments costs us three pallocs - plus three more if we ever copy it - but it also means that accessing the first element of an OpExpr requires three pointer dereferences instead of one, and accessing the second one requires four pointer dereferences instead of one. There's no real way to isolate the overhead of that, but it's got to cost at least something. The reality - I'm not sure whether it's a happy reality or a sad reality - is that most CPU profiles of PostgreSQL are pretty flat. The nails that stick up have, for the most part, long since been pounded down. If we want to make further improvements to our parse and plan time, and I do, because I think we lag our competitors, then I think this is the kind of stuff we need to look at. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > One piece of reasonably low-hanging fruit appears to be OpExpr. It > seems like it would be better all around to put Node *arg1 and Node > *arg2 in there instead of a list... aside from saving pallocs, it > seems like it would generally simplify the code. Obviously, Stephen Frost's list-allocation patch would affect your results here ... but I wonder how much the above change would affect *his* results. Specifically, the observation that most lists are 1 or 2 elements long would presumably become less true, but I wonder by how much exactly. regards, tom lane