BUG #8687: HashAggregate runs out of memory - Mailing list pgsql-bugs

From Jeff Janes
Subject BUG #8687: HashAggregate runs out of memory
Date
Msg-id CAMkU=1ySdc8TqfU=MUhDTUKB32s8ZzxZJB0yFWenYZrK_OiXCg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #8687: HashAggregate runs out of memory  ("Sylvester, Peter A." <peters@mitre.org>)
List pgsql-bugs
On Monday, December 23, 2013, Sylvester, Peter A. wrote:

> Michael,
>
> Attached is the script for reproducing the problem that I was seeing.
> The actual update  was being run from a single plsql session, so I don't
> think multiple work_mem areas would have been allocated.
> Note that the join produces more records than the larger table, which is
> likely a contributor to the issue.
>

Just doing the select is sufficient to show the problem, inserting the
result into another table is not necessary.

Did this work in a previous version of PostgreSQL?  Otherwise, I don't
think that there is anything very interesting going on.  It thinks the
result will be 5 million rows, but instead it is 171 million.  So the hash
table takes up a lot more memory than it thought it would.

You are unlikely to get good selectivity estimates when applying the like
operator to a function's results, as it doesn't store histograms for that.
 So, try to avoid doing that where selectivity estimates are important.
 You could store the upper version in its own column, for example. Or you
could build a function-based index on upper(text2).  Either one would
provide the stats necessary to come up with better estimates, and the index
might be useful in its own right as well.

I would say that this is not really a bug.  There is an opportunity for
improvement, such as hybrid hashing or dynamically switching from one plan
to another when it realizes things are going awry, but I doubt those will
happen any time soon, nor will they be back-patched to existing versions
if/when they are implemented.

Cheers,

Jeff

pgsql-bugs by date:

Previous
From: Brett Neumeier
Date:
Subject: Re: BUG #8684: Tables with custom range domain type cannot be analyzed
Next
From: Peter Geoghegan
Date:
Subject: Obsolete comment above _bt_doinsert()