Thread: counting pallocs

counting pallocs

From

Robert Haas

Date:

17 May 2012, 00:43:41

The attached patch provides some rough instrumentation for determining
where palloc calls are coming from.  This is obviously just for
noodling around with, not for commit, and there may well be bugs.  But
enjoy.

I gave this a quick spin on a couple of test workloads: a very short
pgbench test, a very short pgbench -S test, and the regression tests.
On the pgbench test, the top culprits are ExecInitExpr() and
expression_tree_mutator(); in both cases, the lappend() call for the
T_List case is the major contributor.  Other significant contributors
include _copyVar(), which I haven't drilled into terribly far but
seems to be coming mostly from add_vars_to_targetlist();
buildRelationAliases() via lappend, pstrdup, and makeString;
ExecAllocTupleTableSlot(); and makeColumnRef() via makeNode, lcons,
and makeString.

The pgbench -S results are similar, but build_physical_tlist() also
pops up fairly high.

On the regression tests, heap_tuple_untoast_attr() is at the very top
of the list, and specifically for the VARATT_IS_SHORT() case.  It
might be good to disaggregate this some more, but I'm too tired for
that right now.  index_form_tuple()'s palloc0 call comes in second,
and heap_form_minimal_tuple()'s palloc0 is third.
LockAcquireExtended()'s allocation of a new LOCALLOCK entry also comes
in prettyhigh; ExecInitExpr() shows up here too; and heap_form_tuple()
shows up as well.

One piece of reasonably low-hanging fruit appears to be OpExpr.  It
seems like it would be better all around to put Node *arg1 and Node
*arg2 in there instead of a list...  aside from saving pallocs, it
seems like it would generally simplify the code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

count-pallocs.patch

Re: counting pallocs

From

Heikki Linnakangas

Date:

17 May 2012, 03:29:12

On 17.05.2012 06:43, Robert Haas wrote:
> The attached patch provides some rough instrumentation for determining
> where palloc calls are coming from.  This is obviously just for
> noodling around with, not for commit, and there may well be bugs.  But
> enjoy.
>
> I gave this a quick spin on a couple of test workloads: a very short
> pgbench test, a very short pgbench -S test, and the regression tests.
> On the pgbench test, the top culprits are ExecInitExpr() and
> expression_tree_mutator(); in both cases, the lappend() call for the
> T_List case is the major contributor.  Other significant contributors
> include _copyVar(), which I haven't drilled into terribly far but
> seems to be coming mostly from add_vars_to_targetlist();
> buildRelationAliases() via lappend, pstrdup, and makeString;
> ExecAllocTupleTableSlot(); and makeColumnRef() via makeNode, lcons,
> and makeString.

What percentage of total CPU usage is the palloc() overhead in these 
tests? If we could totally eliminate the palloc() overhead, how much 
faster would the test run?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: counting pallocs

From

Robert Haas

Date:

17 May 2012, 09:30:36

On Thu, May 17, 2012 at 2:28 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> What percentage of total CPU usage is the palloc() overhead in these tests?
> If we could totally eliminate the palloc() overhead, how much faster would
> the test run?

AllocSetAlloc is often the top CPU consumer in profiling results, but
it's typically only in the single-digit percentages.  However, there's
also some distributed overhead that's more difficult to measure.  For
example, the fact that OpExpr uses a List instead of directly pointing
to its arguments costs us three pallocs - plus three more if we ever
copy it - but it also means that accessing the first element of an
OpExpr requires three pointer dereferences instead of one, and
accessing the second one requires four pointer dereferences instead of
one.  There's no real way to isolate the overhead of that, but it's
got to cost at least something.

The reality - I'm not sure whether it's a happy reality or a sad
reality - is that most CPU profiles of PostgreSQL are pretty flat.
The nails that stick up have, for the most part, long since been
pounded down.  If we want to make further improvements to our parse
and plan time, and I do, because I think we lag our competitors, then
I think this is the kind of stuff we need to look at.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting pallocs

From

Tom Lane

Date:

17 May 2012, 09:44:16

Robert Haas <robertmhaas@gmail.com> writes:
> One piece of reasonably low-hanging fruit appears to be OpExpr.  It
> seems like it would be better all around to put Node *arg1 and Node
> *arg2 in there instead of a list...  aside from saving pallocs, it
> seems like it would generally simplify the code.

Obviously, Stephen Frost's list-allocation patch would affect your
results here ... but I wonder how much the above change would affect
*his* results.  Specifically, the observation that most lists are 1
or 2 elements long would presumably become less true, but I wonder
by how much exactly.
        regards, tom lane