Thread: postmaster consuming /lots/ of memory with hash aggregate. why?

postmaster consuming /lots/ of memory with hash aggregate. why?

From
Jon Nelson
Date:
I have a postgres 8.4.5 instance on CentOS 5 (x86_64) which appears to
go crazy with the amount of memory it consumes.
When I run the query below, in a matter of a few seconds memory
balloons to 5.3G (virtual), 4.6G (resident) and 1840 (shared), and
eventually the oom killer is invoked, killing the entire process.

Physical memory is 8GB but other processes on the box consume
approximately 4GB of that.

The settings changed from their defaults:

effective_cache_size = 4GB
work_mem = 16MB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 16
shared_buffers = 384MB
checkpoint_segments = 64

and

default_statistics_target = 100

The query is this:

insert into d_2010_09_13_sum
        select FOO.i, FOO.n, sum(FOO.cc) as cc, sum(FOO.oc) as oc
        from (
          select * from d_2010_09_12_sum
          union all
          select * from d_2010_09_13
        ) AS FOO group by i, n;

here is the explain:

 Subquery Scan "*SELECT*"  (cost=1200132.06..1201332.06 rows=40000 width=80)
   ->  HashAggregate  (cost=1200132.06..1200732.06 rows=40000 width=41)
         ->  Append  (cost=0.00..786531.53 rows=41360053 width=41)
               ->  Seq Scan on d_2010_09_12_sum  (cost=0.00..520066.48
rows=27272648 width=42)
               ->  Seq Scan on d_2010_09_13  (cost=0.00..266465.05
rows=14087405 width=40)

Both source tables freshly vacuum analyze'd.
The row estimates are correct for both source tables.

If I use "set enable_hashagg = false" I get this plan:

 Subquery Scan "*SELECT*"  (cost=8563632.73..9081838.25 rows=40000 width=80)
   ->  GroupAggregate  (cost=8563632.73..9081238.25 rows=40000 width=41)
         ->  Sort  (cost=8563632.73..8667033.84 rows=41360441 width=41)
               Sort Key: d_2010_09_12_sum.i, d_2010_09_12_sum.n
               ->  Result  (cost=0.00..786535.41 rows=41360441 width=41)
                     ->  Append  (cost=0.00..786535.41 rows=41360441 width=41)
                           ->  Seq Scan on d_2010_09_12_sum
(cost=0.00..520062.04 rows=27272204 width=42)
                           ->  Seq Scan on d_2010_09_13
(cost=0.00..266473.37 rows=14088237 width=40)

and postmaster's memory never exceeds (roughly) 548M (virtual), 27M
(resident), 5M (shared).

I even set default_statistics_target to 1000 and re-ran "vacuum
analyze verbose" on both tables - no change.
If I set work_mem to 1MB (from 16MB) then the GroupAggregate variation
is chosen instead.
Experimentally, HashAggregate is chosen when work_mem is 16MB, 8MB,
6MB, 5MB but not 4MB and on down.

Two things I don't understand:

1. Why, when hash aggregation is allowed, does memory absolutely
explode (eventually invoking the wrath of the oom killer). 16MB for
work_mem does not seem outrageously high. For that matter, neither
does 5MB.

2. Why do both HashAggregate and GroupAggregate say the cost estimate
is 40000 rows?

--
Jon

Re: postmaster consuming /lots/ of memory with hash aggregate. why?

From
"Pierre C"
Date:
> 2. Why do both HashAggregate and GroupAggregate say the cost estimate
> is 40000 rows?

I've reproduced this :


CREATE TABLE popo AS SELECT (x%1000) AS a,(x%1001) AS b FROM
generate_series( 1,1000000 ) AS x;
VACUUM ANALYZE popo;
EXPLAIN ANALYZE SELECT a,b,count(*) FROM (SELECT * FROM popo UNION ALL
SELECT * FROM popo) AS foo GROUP BY a,b;
                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=43850.00..44350.00 rows=40000 width=8) (actual
time=1893.441..2341.780 rows=1000000 loops=1)
    ->  Append  (cost=0.00..28850.00 rows=2000000 width=8) (actual
time=0.025..520.581 rows=2000000 loops=1)
          ->  Seq Scan on popo  (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.025..142.639 rows=1000000 loops=1)
          ->  Seq Scan on popo  (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.003..114.257 rows=1000000 loops=1)
  Total runtime: 2438.741 ms
(5 lignes)

Temps : 2439,247 ms

I guess the row count depends on the correlation of a and b, which pg has
no idea about. In the first example, there is no correlation, now with
full correlation :


UPDATE popo SET a=b;
VACUUM FULL popo;
VACUUM FULL popo;
ANALYZE popo;
EXPLAIN ANALYZE SELECT a,b,count(*) FROM (SELECT * FROM popo UNION ALL
SELECT * FROM popo) AS foo GROUP BY a,b;
                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=43850.00..44350.00 rows=40000 width=8) (actual
time=1226.201..1226.535 rows=1001 loops=1)
    ->  Append  (cost=0.00..28850.00 rows=2000000 width=8) (actual
time=0.008..518.068 rows=2000000 loops=1)
          ->  Seq Scan on popo  (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.007..128.609 rows=1000000 loops=1)
          ->  Seq Scan on popo  (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.005..128.502 rows=1000000 loops=1)
  Total runtime: 1226.797 ms

Re: postmaster consuming /lots/ of memory with hash aggregate. why?

From
Jon Nelson
Date:
On Fri, Nov 5, 2010 at 7:26 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
> I have a postgres 8.4.5 instance on CentOS 5 (x86_64) which appears to
> go crazy with the amount of memory it consumes.
> When I run the query below, in a matter of a few seconds memory
> balloons to 5.3G (virtual), 4.6G (resident) and 1840 (shared), and
> eventually the oom killer is invoked, killing the entire process.
>
> Physical memory is 8GB but other processes on the box consume
> approximately 4GB of that.
>
> The settings changed from their defaults:
>
> effective_cache_size = 4GB
> work_mem = 16MB
> maintenance_work_mem = 128MB
> wal_buffers = 16MB
> checkpoint_segments = 16
> shared_buffers = 384MB
> checkpoint_segments = 64
>
> and
>
> default_statistics_target = 100
>
> The query is this:
>
> insert into d_2010_09_13_sum
>        select FOO.i, FOO.n, sum(FOO.cc) as cc, sum(FOO.oc) as oc
>        from (
>          select * from d_2010_09_12_sum
>          union all
>          select * from d_2010_09_13
>        ) AS FOO group by i, n;
>
> here is the explain:
>
>  Subquery Scan "*SELECT*"  (cost=1200132.06..1201332.06 rows=40000 width=80)
>   ->  HashAggregate  (cost=1200132.06..1200732.06 rows=40000 width=41)
>         ->  Append  (cost=0.00..786531.53 rows=41360053 width=41)
>               ->  Seq Scan on d_2010_09_12_sum  (cost=0.00..520066.48
> rows=27272648 width=42)
>               ->  Seq Scan on d_2010_09_13  (cost=0.00..266465.05
> rows=14087405 width=40)
>
> Both source tables freshly vacuum analyze'd.
> The row estimates are correct for both source tables.
>
> If I use "set enable_hashagg = false" I get this plan:
>
>  Subquery Scan "*SELECT*"  (cost=8563632.73..9081838.25 rows=40000 width=80)
>   ->  GroupAggregate  (cost=8563632.73..9081238.25 rows=40000 width=41)
>         ->  Sort  (cost=8563632.73..8667033.84 rows=41360441 width=41)
>               Sort Key: d_2010_09_12_sum.i, d_2010_09_12_sum.n
>               ->  Result  (cost=0.00..786535.41 rows=41360441 width=41)
>                     ->  Append  (cost=0.00..786535.41 rows=41360441 width=41)
>                           ->  Seq Scan on d_2010_09_12_sum
> (cost=0.00..520062.04 rows=27272204 width=42)
>                           ->  Seq Scan on d_2010_09_13
> (cost=0.00..266473.37 rows=14088237 width=40)
>
> and postmaster's memory never exceeds (roughly) 548M (virtual), 27M
> (resident), 5M (shared).
>
> I even set default_statistics_target to 1000 and re-ran "vacuum
> analyze verbose" on both tables - no change.
> If I set work_mem to 1MB (from 16MB) then the GroupAggregate variation
> is chosen instead.
> Experimentally, HashAggregate is chosen when work_mem is 16MB, 8MB,
> 6MB, 5MB but not 4MB and on down.
>
> Two things I don't understand:
>
> 1. Why, when hash aggregation is allowed, does memory absolutely
> explode (eventually invoking the wrath of the oom killer). 16MB for
> work_mem does not seem outrageously high. For that matter, neither
> does 5MB.
>
> 2. Why do both HashAggregate and GroupAggregate say the cost estimate
> is 40000 rows?

Unfortunately, I've found that as my database size grows, I've
generally had to disable hash aggregates for fear of even simple
seeming queries running out of memory, even with work_mem = 1MB.

In some cases I saw memory usage (with hashagg) grow to well over 5GB
and with group aggregate it barely moves.  Am *I* doing something
wrong? Some of these queries are on partitioned tables (typically
querying the parent) and the resulting UNION or UNION ALL really
starts to hurt, and when the server runs out of memory and kills of
the postmaster process a few minutes or even hours into the query it
doesn't make anybody very happy.

Is there some setting I can turn on to look to see when memory is
being allocated (and, apparently, not deallocated)?

The latest query has a HashAggregate that looks like this:
HashAggregate  (cost=19950525.30..19951025.30 rows=40000 width=37)
but there are, in reality, approximately 200 million rows (when I run
the query with GroupAggregate, that's what I get).

Why does it keep choosing 40,000 rows?

I suppose I could use the newly-learned ALTER USER trick to disable
hash aggregation for the primary user, because disabling hash
aggregation system-wide sounds fairly drastic. However, if I *don't*
disable it, the query quickly balloons memory usage to the point where
the process is killed off.

--
Jon

Re: postmaster consuming /lots/ of memory with hash aggregate. why?

From
Pavel Stehule
Date:
Hello

look on EXPLAIN ANALYZE command. Probably your statistic are out, and
then planner can be confused. EXPLAIN ANALYZE statement show it.

Regards

Pavel Stehule

2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>:
> On Fri, Nov 5, 2010 at 7:26 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
>> I have a postgres 8.4.5 instance on CentOS 5 (x86_64) which appears to
>> go crazy with the amount of memory it consumes.
>> When I run the query below, in a matter of a few seconds memory
>> balloons to 5.3G (virtual), 4.6G (resident) and 1840 (shared), and
>> eventually the oom killer is invoked, killing the entire process.
>>
>> Physical memory is 8GB but other processes on the box consume
>> approximately 4GB of that.
>>
>> The settings changed from their defaults:
>>
>> effective_cache_size = 4GB
>> work_mem = 16MB
>> maintenance_work_mem = 128MB
>> wal_buffers = 16MB
>> checkpoint_segments = 16
>> shared_buffers = 384MB
>> checkpoint_segments = 64
>>
>> and
>>
>> default_statistics_target = 100
>>
>> The query is this:
>>
>> insert into d_2010_09_13_sum
>>        select FOO.i, FOO.n, sum(FOO.cc) as cc, sum(FOO.oc) as oc
>>        from (
>>          select * from d_2010_09_12_sum
>>          union all
>>          select * from d_2010_09_13
>>        ) AS FOO group by i, n;
>>
>> here is the explain:
>>
>>  Subquery Scan "*SELECT*"  (cost=1200132.06..1201332.06 rows=40000 width=80)
>>   ->  HashAggregate  (cost=1200132.06..1200732.06 rows=40000 width=41)
>>         ->  Append  (cost=0.00..786531.53 rows=41360053 width=41)
>>               ->  Seq Scan on d_2010_09_12_sum  (cost=0.00..520066.48
>> rows=27272648 width=42)
>>               ->  Seq Scan on d_2010_09_13  (cost=0.00..266465.05
>> rows=14087405 width=40)
>>
>> Both source tables freshly vacuum analyze'd.
>> The row estimates are correct for both source tables.
>>
>> If I use "set enable_hashagg = false" I get this plan:
>>
>>  Subquery Scan "*SELECT*"  (cost=8563632.73..9081838.25 rows=40000 width=80)
>>   ->  GroupAggregate  (cost=8563632.73..9081238.25 rows=40000 width=41)
>>         ->  Sort  (cost=8563632.73..8667033.84 rows=41360441 width=41)
>>               Sort Key: d_2010_09_12_sum.i, d_2010_09_12_sum.n
>>               ->  Result  (cost=0.00..786535.41 rows=41360441 width=41)
>>                     ->  Append  (cost=0.00..786535.41 rows=41360441 width=41)
>>                           ->  Seq Scan on d_2010_09_12_sum
>> (cost=0.00..520062.04 rows=27272204 width=42)
>>                           ->  Seq Scan on d_2010_09_13
>> (cost=0.00..266473.37 rows=14088237 width=40)
>>
>> and postmaster's memory never exceeds (roughly) 548M (virtual), 27M
>> (resident), 5M (shared).
>>
>> I even set default_statistics_target to 1000 and re-ran "vacuum
>> analyze verbose" on both tables - no change.
>> If I set work_mem to 1MB (from 16MB) then the GroupAggregate variation
>> is chosen instead.
>> Experimentally, HashAggregate is chosen when work_mem is 16MB, 8MB,
>> 6MB, 5MB but not 4MB and on down.
>>
>> Two things I don't understand:
>>
>> 1. Why, when hash aggregation is allowed, does memory absolutely
>> explode (eventually invoking the wrath of the oom killer). 16MB for
>> work_mem does not seem outrageously high. For that matter, neither
>> does 5MB.
>>
>> 2. Why do both HashAggregate and GroupAggregate say the cost estimate
>> is 40000 rows?
>
> Unfortunately, I've found that as my database size grows, I've
> generally had to disable hash aggregates for fear of even simple
> seeming queries running out of memory, even with work_mem = 1MB.
>
> In some cases I saw memory usage (with hashagg) grow to well over 5GB
> and with group aggregate it barely moves.  Am *I* doing something
> wrong? Some of these queries are on partitioned tables (typically
> querying the parent) and the resulting UNION or UNION ALL really
> starts to hurt, and when the server runs out of memory and kills of
> the postmaster process a few minutes or even hours into the query it
> doesn't make anybody very happy.
>
> Is there some setting I can turn on to look to see when memory is
> being allocated (and, apparently, not deallocated)?
>
> The latest query has a HashAggregate that looks like this:
> HashAggregate  (cost=19950525.30..19951025.30 rows=40000 width=37)
> but there are, in reality, approximately 200 million rows (when I run
> the query with GroupAggregate, that's what I get).
>
> Why does it keep choosing 40,000 rows?
>
> I suppose I could use the newly-learned ALTER USER trick to disable
> hash aggregation for the primary user, because disabling hash
> aggregation system-wide sounds fairly drastic. However, if I *don't*
> disable it, the query quickly balloons memory usage to the point where
> the process is killed off.
>
> --
> Jon
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: postmaster consuming /lots/ of memory with hash aggregate. why?

From
Jon Nelson
Date:
On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hello
>
> look on EXPLAIN ANALYZE command. Probably your statistic are out, and
> then planner can be confused. EXPLAIN ANALYZE statement show it.

As I noted earlier, I did set statistics to 1000 an re-ran vacuum
analyze and the plan did not change.

What other diagnostics can I provide? This still doesn't answer the
40000 row question, though. It seems absurd to me that the planner
would give up and just use 40000 rows (0.02 percent of the actual
result).

--
Jon

Re: postmaster consuming /lots/ of memory with hash aggregate. why?

From
Pavel Stehule
Date:
2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>:
> On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hello
>>
>> look on EXPLAIN ANALYZE command. Probably your statistic are out, and
>> then planner can be confused. EXPLAIN ANALYZE statement show it.
>
> As I noted earlier, I did set statistics to 1000 an re-ran vacuum
> analyze and the plan did not change.

this change can do nothing. this is default in config. did you use
ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE

>
> What other diagnostics can I provide? This still doesn't answer the
> 40000 row question, though. It seems absurd to me that the planner
> would give up and just use 40000 rows (0.02 percent of the actual
> result).
>

there can be some not well supported operation, then planner use a
some % from rows without statistic based estimation

Pavel
> --
> Jon
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: postmaster consuming /lots/ of memory with hash aggregate. why?

From
Jon Nelson
Date:
On Thu, Nov 11, 2010 at 10:38 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>:
>> On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> Hello
>>>
>>> look on EXPLAIN ANALYZE command. Probably your statistic are out, and
>>> then planner can be confused. EXPLAIN ANALYZE statement show it.
>>
>> As I noted earlier, I did set statistics to 1000 an re-ran vacuum
>> analyze and the plan did not change.
>
> this change can do nothing. this is default in config. did you use
> ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE

No. To be clear: are you saying that changing the value for
default_statistics_target, restarting postgresql, and re-running
VACUUM ANALYZE does *not* change the statistics for columns
created/populated *prior* to the sequence of operations, and that one
/must/ use ALTER TABLE ALTER COLUMN SET STATISTICS ... and re-ANALYZE?

That does not jive with the documentation, which appears to suggest
that setting a new default_statistics_target, restarting postgresql,
and then re-ANALYZE'ing a table should be sufficient (provided the
columns have not had a statistics target explicitly set).

>> What other diagnostics can I provide? This still doesn't answer the
>> 40000 row question, though. It seems absurd to me that the planner
>> would give up and just use 40000 rows (0.02 percent of the actual
>> result).
>>
>
> there can be some not well supported operation, then planner use a
> some % from rows without statistic based estimation

The strange thing is that the value 40000 keeps popping up in totally
diffferent contexts, with different tables, databases, etc... I tried
digging through the code and the only thing I found was that numGroups
was being set to 40000 but I couldn't see where.

--
Jon

Re: postmaster consuming /lots/ of memory with hash aggregate. why?

From
Pavel Stehule
Date:
2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>:
> On Thu, Nov 11, 2010 at 10:38 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> 2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>:
>>> On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>>> Hello
>>>>
>>>> look on EXPLAIN ANALYZE command. Probably your statistic are out, and
>>>> then planner can be confused. EXPLAIN ANALYZE statement show it.
>>>
>>> As I noted earlier, I did set statistics to 1000 an re-ran vacuum
>>> analyze and the plan did not change.
>>
>> this change can do nothing. this is default in config. did you use
>> ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE
>
> No. To be clear: are you saying that changing the value for
> default_statistics_target, restarting postgresql, and re-running
> VACUUM ANALYZE does *not* change the statistics for columns
> created/populated *prior* to the sequence of operations, and that one
> /must/ use ALTER TABLE ALTER COLUMN SET STATISTICS ... and re-ANALYZE?
>

yes.

but I was wrong. Documentation is correct. Problem is elsewhere.

> That does not jive with the documentation, which appears to suggest
> that setting a new default_statistics_target, restarting postgresql,
> and then re-ANALYZE'ing a table should be sufficient (provided the
> columns have not had a statistics target explicitly set).
>

>>> What other diagnostics can I provide? This still doesn't answer the
>>> 40000 row question, though. It seems absurd to me that the planner
>>> would give up and just use 40000 rows (0.02 percent of the actual
>>> result).
>>>
>>
>> there can be some not well supported operation, then planner use a
>> some % from rows without statistic based estimation
>
> The strange thing is that the value 40000 keeps popping up in totally
> diffferent contexts, with different tables, databases, etc... I tried
> digging through the code and the only thing I found was that numGroups
> was being set to 40000 but I couldn't see where.
>


if I remember well, you can set a number of group by ALTER TABLE ALTER
COLUMN SET n_distinct = ..

maybe you use it.

Regards

Pavel Stehule

http://www.postgresql.org/docs/9.0/interactive/sql-altertable.html




> --
> Jon
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: postmaster consuming /lots/ of memory with hash aggregate. why?

From
Robert Haas
Date:
On Fri, Nov 12, 2010 at 11:12 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> if I remember well, you can set a number of group by ALTER TABLE ALTER
> COLUMN SET n_distinct = ..
>
> maybe you use it.

I'm not sure where the number 40,000 is coming from either, but I
think Pavel's suggestion is a good one.  If you're grouping on a
column with N distinct values, then it stands to reason there will be
N groups, and the planner is known to estimate n_distinct on large
tables, even with very high statistics targets, which is why 9.0
allows a manual override.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company