Thread: BUG #8687: HashAggregate runs out of memory

BUG #8687: HashAggregate runs out of memory

From

peters@mitre.org

Date:

20 December 2013, 17:23:55

The following bug has been logged on the website:

Bug reference:      8687
Logged by:          Peter Sylvester
Email address:      peters@mitre.org
PostgreSQL version: 9.3.2
Operating system:   CENTOS6
Description:

I have a query which causes out of memory conditions with a HashAggregate
plan.


The query involves a join between a 10M row table and a 1K row table,
work_mem=300MB, process space goes over 6GB then machine runs out of memory
and swap and the OS kills the back end process.


I have a test script which I can upload to recreate the issue.

Re: BUG #8687: HashAggregate runs out of memory

From

Michael Paquier

Date:

23 December 2013, 17:40:34

On Fri, Dec 20, 2013 at 12:09 PM,  <peters@mitre.org> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      8687
> Logged by:          Peter Sylvester
> Email address:      peters@mitre.org
> PostgreSQL version: 9.3.2
> Operating system:   CENTOS6
> Description:
>
> I have a query which causes out of memory conditions with a HashAggregate
> plan.
>
>
> The query involves a join between a 10M row table and a 1K row table,
> work_mem=300MB, process space goes over 6GB then machine runs out of memory
> and swap and the OS kills the back end process.
>
> I have a test script which I can upload to recreate the issue.
And this would be welcome. In this case the most interesting part is
the test case you could provide, perhaps this could show up some
memory not free'd in the code path you use for your query. It is hard
to tell what might be going wrong in your case, so first are you sure
that you weren't trapped by the fact that work_mem allocates its
amount of memory for each sort/limit/hash operation?

Regards,
--
Michael

Re: BUG #8687: HashAggregate runs out of memory

From

"Sylvester, Peter A."

Date:

23 December 2013, 21:47:21

Michael,

Attached is the script for reproducing the problem that I was seeing.
The actual update  was being run from a single plsql session, so I don't think multiple work_mem areas would have been
allocated.
Note that the join produces more records than the larger table, which is likely a contributor to the issue.

--Peter

-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Sent: Monday, December 23, 2013 9:40 AM
To: Sylvester, Peter A.
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #8687: HashAggregate runs out of memory

On Fri, Dec 20, 2013 at 12:09 PM,  <peters@mitre.org> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      8687
> Logged by:          Peter Sylvester
> Email address:      peters@mitre.org
> PostgreSQL version: 9.3.2
> Operating system:   CENTOS6
> Description:
>
> I have a query which causes out of memory conditions with a HashAggregate
> plan.
>
>
> The query involves a join between a 10M row table and a 1K row table,
> work_mem=300MB, process space goes over 6GB then machine runs out of memory
> and swap and the OS kills the back end process.
>
> I have a test script which I can upload to recreate the issue.
And this would be welcome. In this case the most interesting part is
the test case you could provide, perhaps this could show up some
memory not free'd in the code path you use for your query. It is hard
to tell what might be going wrong in your case, so first are you sure
that you weren't trapped by the fact that work_mem allocates its
amount of memory for each sort/limit/hash operation?

Regards,
--
Michael

Attachment

pg_hashagg_memory_issue.txt

BUG #8687: HashAggregate runs out of memory

From

Jeff Janes

Date:

25 December 2013, 21:29:01

On Monday, December 23, 2013, Sylvester, Peter A. wrote:

> Michael,
>
> Attached is the script for reproducing the problem that I was seeing.
> The actual update  was being run from a single plsql session, so I don't
> think multiple work_mem areas would have been allocated.
> Note that the join produces more records than the larger table, which is
> likely a contributor to the issue.
>

Just doing the select is sufficient to show the problem, inserting the
result into another table is not necessary.

Did this work in a previous version of PostgreSQL?  Otherwise, I don't
think that there is anything very interesting going on.  It thinks the
result will be 5 million rows, but instead it is 171 million.  So the hash
table takes up a lot more memory than it thought it would.

You are unlikely to get good selectivity estimates when applying the like
operator to a function's results, as it doesn't store histograms for that.
 So, try to avoid doing that where selectivity estimates are important.
 You could store the upper version in its own column, for example. Or you
could build a function-based index on upper(text2).  Either one would
provide the stats necessary to come up with better estimates, and the index
might be useful in its own right as well.

I would say that this is not really a bug.  There is an opportunity for
improvement, such as hybrid hashing or dynamically switching from one plan
to another when it realizes things are going awry, but I doubt those will
happen any time soon, nor will they be back-patched to existing versions
if/when they are implemented.

Cheers,

Jeff