Re: EXPLAIN (plan off, rewrite off) for benchmarking - Mailing list pgsql-hackers

From Robert Haas
Subject Re: EXPLAIN (plan off, rewrite off) for benchmarking
Date
Msg-id CA+Tgmobn6Pw0T2SG-dD1biYkn4mSzpczWzSV=PJ4f_guKQJV8w@mail.gmail.com
Whole thread Raw
In response to Re: EXPLAIN (plan off, rewrite off) for benchmarking  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: EXPLAIN (plan off, rewrite off) for benchmarking
List pgsql-hackers
On Mon, Nov 21, 2011 at 8:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> ... Maybe we could find a way to reduce the size of the parse
>> tree (i.e. fewer nodes), or the number of times that it has to be
>> walked/copied.
>
> We could eliminate some annoying tree-copy steps if we could institute
> the policy that parse analysis doesn't scribble on the raw parse tree,
> rewriter doesn't modify parse analysis output, and planner doesn't
> modify rewriter output.  However, it would be a lot of work, and I'm not
> entirely sure that we'd end up with a significant speed benefit.  In a
> lot of places, the only way to not scribble on the input is to copy it
> anyway ...

This is probably a stupid question, but why does it matter if parse
analysis scribbles on the raw parse tree, or the rewriter on the parse
analysis output?  I understand that we may sometimes need to replan
the output of the rewriter, so we'd better not modify it
destructively, but I would have thought that parse and parse analysis
would always be done together, in which case it doesn't obviously
matter.  I'm probably missing something here...

Another thing we might want to consider doing is introducing some kind
of container data structure other than List.  I think that trying to
change the semantics of the existing List datatype in any meaningful
way is probably doomed, but if we introduce a new abstraction and
gradually convert things over, we have a lot more flexibility.  What
I'm imagining is something that is optimized for holding a small
number of pointers (say, 4) that are normally added in order, but
which can grow if needed, at some cost in performance.  For example:

struct Thingy
{   unsigned short nused;   unsigned short nalloc;   struct Thingy *next;   /* if we run out of elements in this
Thingy, we can link to another Thingy with more space */   void *item[FLEXIBLE_ARRAY_MEMBER];        /* really nalloc
*/
};

This would mean fewer palloc() calls and less overall memory usage
than a List, and might also improve memory access locality.  Right
now, walking a three element list could involve pulling in one cache
line for the list, three more for the list cells, and then
(presumably) three more for the elements themselves.  There's not much
help for the fact that the elements might end up in different cache
lines, but at least if we did something like this you wouldn't need to
bounce around to find the list cells.  This is particularly obnoxious
for things like RangeVars, where we build up the list representation
mostly as an intermediate step and then flatten it again.

Now, one obvious problem with this line of attack is that each
individual patch in this area isn't likely to save anything noticeable
on real workloads.  But I don't think that should stop us from trying.With every performance patch that goes in, the
numberof workloads 
that are dominated by the cost of backend-local memory allocation is
growing.  The cost is extremely distributed and it's very hard to pin
down where it's all coming from, but we've gotta start somewhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Singleton range constructors versus functional coercion notation
Next
From: Tom Lane
Date:
Subject: Re: EXPLAIN (plan off, rewrite off) for benchmarking