Re: Performance improvement hints + measurement - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Performance improvement hints + measurement
Date
Msg-id 15856.968977050@sss.pgh.pa.us
Whole thread Raw
In response to Performance improvement hints  (devik@cdi.cz)
List pgsql-hackers
devik@cdi.cz writes:
>> You could probably generalize the existing code for hashjoin tables
>> to support hash aggregation as well.  Now that I think about it, that
>> sounds like a really cool idea.  Should put it on the TODO list.

> Yep. It should be easy. It could be used as part of Hash
> node by extending ExecHash to return all hashed rows and
> adding value{1,2}[nbuckets] to HashJoinTableData.

Actually I think what we want is a hash table indexed by the
grouping-column value(s) and storing the current running aggregate
states for each agg function being computed.  You wouldn't really
need to store any of the original tuples.  You might want to form
the agg states for each entry into a tuple just for convenience of
storage though.

> By the way, what is the "portal" and "slot" ?

As far as the hash code is concerned, a portal is just a memory
allocation context.  Destroying the portal gets rid of all the
memory allocated therein, without the hassle of finding and freeing
each palloc'd block individually.

As for slots, you are probably thinking of tuple table slots, which
are used to hold the tuples returned by plan nodes.  The input
tuples read by the hash node are stored in a slot that's filled 
by the child Plan node each time it's called.  Similarly, the hash
join node has to return a new tuple in its output slot each time
it's called.  It's a pretty simplistic form of memory management,
but it works fine for plan node output tuples.

If you are interested in working on this idea, you should be looking
at current sources --- both the memory management for hash tables
and the implementation of aggregate state storage have changed
materially since 7.0, so code based on 7.0 would need a lot of work
to be usable.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Hiroshi Inoue"
Date:
Subject: RE: Status of new relation file naming
Next
From: Philip Warner
Date:
Subject: Re: pg_dump of regression (again)