Home > mailing lists

Re: Solving hash table overrun problems - Mailing list pgsql-hackers

From	Aaron Birkland
Subject	Re: Solving hash table overrun problems
Date	March 7, 2005 00:46:34
Msg-id	19ab0ccd05030616463919926a@mail.gmail.com Whole thread Raw
In response to	Re: Solving hash table overrun problems (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

> > This also brings up a line of thought I had a while ago on a related
> > topic.  Something like a "HashDistinct" might be useful, if it had no
> > startup cost.  

> We already have that: the planner will use a HashAgg node in this
> fashion in some contexts (I think just as one of the ways to do IN,
> at the moment).  

Hmm.. I see HashAggregate being used that way in the IN queries, but I
have not observed it used in a way that incurrs no startup cost.  It
looked to me that in doing hash aggregation in ExecAgg (nodeAgg.c),
agg_fill_hash_table() would have to be called, which iterate through
every output of the child plan building the hash table before it
returns, thus incurring at least the startup cost of executing the
entire subplan of the child node at the aggregation stage.  I'm not
too familiar with the code, so there is probably something I'm missing
somewhere :(

> It's not yet bright enough to consider doing it for
> SELECT DISTINCT.  The DISTINCT planning code is old and crufty and
> pretty tightly interwired with ORDER BY ... it needs work.

Yes, SELECT DISTINCT was my motivating example, though in my specific
application latency (i.e. not having to wait for the entire query
below the DISTINCT operation to finish) was also an important factor,
hence my thoughts on a zero startup cost hash aggregation and
wondering if it would really be any kind of win in the end.
 -Aaron

pgsql-hackers by date:

From: mchron@aueb.gr
Date: 07 March 2005, 00:38:53
Subject: ERROR: unrecognized node type in PostgresMain( )

From: Tom Lane
Date: 07 March 2005, 02:36:30
Subject: Re: ERROR: unrecognized node type in PostgresMain( )

Re: Solving hash table overrun problems - Mailing list pgsql-hackers

Previous

Next