Thread: postmaster consuming /lots/ of memory with hash aggregate. why?
I have a postgres 8.4.5 instance on CentOS 5 (x86_64) which appears to go crazy with the amount of memory it consumes. When I run the query below, in a matter of a few seconds memory balloons to 5.3G (virtual), 4.6G (resident) and 1840 (shared), and eventually the oom killer is invoked, killing the entire process. Physical memory is 8GB but other processes on the box consume approximately 4GB of that. The settings changed from their defaults: effective_cache_size = 4GB work_mem = 16MB maintenance_work_mem = 128MB wal_buffers = 16MB checkpoint_segments = 16 shared_buffers = 384MB checkpoint_segments = 64 and default_statistics_target = 100 The query is this: insert into d_2010_09_13_sum select FOO.i, FOO.n, sum(FOO.cc) as cc, sum(FOO.oc) as oc from ( select * from d_2010_09_12_sum union all select * from d_2010_09_13 ) AS FOO group by i, n; here is the explain: Subquery Scan "*SELECT*" (cost=1200132.06..1201332.06 rows=40000 width=80) -> HashAggregate (cost=1200132.06..1200732.06 rows=40000 width=41) -> Append (cost=0.00..786531.53 rows=41360053 width=41) -> Seq Scan on d_2010_09_12_sum (cost=0.00..520066.48 rows=27272648 width=42) -> Seq Scan on d_2010_09_13 (cost=0.00..266465.05 rows=14087405 width=40) Both source tables freshly vacuum analyze'd. The row estimates are correct for both source tables. If I use "set enable_hashagg = false" I get this plan: Subquery Scan "*SELECT*" (cost=8563632.73..9081838.25 rows=40000 width=80) -> GroupAggregate (cost=8563632.73..9081238.25 rows=40000 width=41) -> Sort (cost=8563632.73..8667033.84 rows=41360441 width=41) Sort Key: d_2010_09_12_sum.i, d_2010_09_12_sum.n -> Result (cost=0.00..786535.41 rows=41360441 width=41) -> Append (cost=0.00..786535.41 rows=41360441 width=41) -> Seq Scan on d_2010_09_12_sum (cost=0.00..520062.04 rows=27272204 width=42) -> Seq Scan on d_2010_09_13 (cost=0.00..266473.37 rows=14088237 width=40) and postmaster's memory never exceeds (roughly) 548M (virtual), 27M (resident), 5M (shared). I even set default_statistics_target to 1000 and re-ran "vacuum analyze verbose" on both tables - no change. If I set work_mem to 1MB (from 16MB) then the GroupAggregate variation is chosen instead. Experimentally, HashAggregate is chosen when work_mem is 16MB, 8MB, 6MB, 5MB but not 4MB and on down. Two things I don't understand: 1. Why, when hash aggregation is allowed, does memory absolutely explode (eventually invoking the wrath of the oom killer). 16MB for work_mem does not seem outrageously high. For that matter, neither does 5MB. 2. Why do both HashAggregate and GroupAggregate say the cost estimate is 40000 rows? -- Jon
> 2. Why do both HashAggregate and GroupAggregate say the cost estimate > is 40000 rows? I've reproduced this : CREATE TABLE popo AS SELECT (x%1000) AS a,(x%1001) AS b FROM generate_series( 1,1000000 ) AS x; VACUUM ANALYZE popo; EXPLAIN ANALYZE SELECT a,b,count(*) FROM (SELECT * FROM popo UNION ALL SELECT * FROM popo) AS foo GROUP BY a,b; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=43850.00..44350.00 rows=40000 width=8) (actual time=1893.441..2341.780 rows=1000000 loops=1) -> Append (cost=0.00..28850.00 rows=2000000 width=8) (actual time=0.025..520.581 rows=2000000 loops=1) -> Seq Scan on popo (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.025..142.639 rows=1000000 loops=1) -> Seq Scan on popo (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.003..114.257 rows=1000000 loops=1) Total runtime: 2438.741 ms (5 lignes) Temps : 2439,247 ms I guess the row count depends on the correlation of a and b, which pg has no idea about. In the first example, there is no correlation, now with full correlation : UPDATE popo SET a=b; VACUUM FULL popo; VACUUM FULL popo; ANALYZE popo; EXPLAIN ANALYZE SELECT a,b,count(*) FROM (SELECT * FROM popo UNION ALL SELECT * FROM popo) AS foo GROUP BY a,b; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=43850.00..44350.00 rows=40000 width=8) (actual time=1226.201..1226.535 rows=1001 loops=1) -> Append (cost=0.00..28850.00 rows=2000000 width=8) (actual time=0.008..518.068 rows=2000000 loops=1) -> Seq Scan on popo (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.007..128.609 rows=1000000 loops=1) -> Seq Scan on popo (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.005..128.502 rows=1000000 loops=1) Total runtime: 1226.797 ms
I also found this. Perhaps it is related? http://postgresql.1045698.n5.nabble.com/Hash-Aggregate-plan-picked-for-very-large-table-out-of-memory-td1883299.html -- Jon
On Fri, Nov 5, 2010 at 7:26 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > I have a postgres 8.4.5 instance on CentOS 5 (x86_64) which appears to > go crazy with the amount of memory it consumes. > When I run the query below, in a matter of a few seconds memory > balloons to 5.3G (virtual), 4.6G (resident) and 1840 (shared), and > eventually the oom killer is invoked, killing the entire process. > > Physical memory is 8GB but other processes on the box consume > approximately 4GB of that. > > The settings changed from their defaults: > > effective_cache_size = 4GB > work_mem = 16MB > maintenance_work_mem = 128MB > wal_buffers = 16MB > checkpoint_segments = 16 > shared_buffers = 384MB > checkpoint_segments = 64 > > and > > default_statistics_target = 100 > > The query is this: > > insert into d_2010_09_13_sum > select FOO.i, FOO.n, sum(FOO.cc) as cc, sum(FOO.oc) as oc > from ( > select * from d_2010_09_12_sum > union all > select * from d_2010_09_13 > ) AS FOO group by i, n; > > here is the explain: > > Subquery Scan "*SELECT*" (cost=1200132.06..1201332.06 rows=40000 width=80) > -> HashAggregate (cost=1200132.06..1200732.06 rows=40000 width=41) > -> Append (cost=0.00..786531.53 rows=41360053 width=41) > -> Seq Scan on d_2010_09_12_sum (cost=0.00..520066.48 > rows=27272648 width=42) > -> Seq Scan on d_2010_09_13 (cost=0.00..266465.05 > rows=14087405 width=40) > > Both source tables freshly vacuum analyze'd. > The row estimates are correct for both source tables. > > If I use "set enable_hashagg = false" I get this plan: > > Subquery Scan "*SELECT*" (cost=8563632.73..9081838.25 rows=40000 width=80) > -> GroupAggregate (cost=8563632.73..9081238.25 rows=40000 width=41) > -> Sort (cost=8563632.73..8667033.84 rows=41360441 width=41) > Sort Key: d_2010_09_12_sum.i, d_2010_09_12_sum.n > -> Result (cost=0.00..786535.41 rows=41360441 width=41) > -> Append (cost=0.00..786535.41 rows=41360441 width=41) > -> Seq Scan on d_2010_09_12_sum > (cost=0.00..520062.04 rows=27272204 width=42) > -> Seq Scan on d_2010_09_13 > (cost=0.00..266473.37 rows=14088237 width=40) > > and postmaster's memory never exceeds (roughly) 548M (virtual), 27M > (resident), 5M (shared). > > I even set default_statistics_target to 1000 and re-ran "vacuum > analyze verbose" on both tables - no change. > If I set work_mem to 1MB (from 16MB) then the GroupAggregate variation > is chosen instead. > Experimentally, HashAggregate is chosen when work_mem is 16MB, 8MB, > 6MB, 5MB but not 4MB and on down. > > Two things I don't understand: > > 1. Why, when hash aggregation is allowed, does memory absolutely > explode (eventually invoking the wrath of the oom killer). 16MB for > work_mem does not seem outrageously high. For that matter, neither > does 5MB. > > 2. Why do both HashAggregate and GroupAggregate say the cost estimate > is 40000 rows? Unfortunately, I've found that as my database size grows, I've generally had to disable hash aggregates for fear of even simple seeming queries running out of memory, even with work_mem = 1MB. In some cases I saw memory usage (with hashagg) grow to well over 5GB and with group aggregate it barely moves. Am *I* doing something wrong? Some of these queries are on partitioned tables (typically querying the parent) and the resulting UNION or UNION ALL really starts to hurt, and when the server runs out of memory and kills of the postmaster process a few minutes or even hours into the query it doesn't make anybody very happy. Is there some setting I can turn on to look to see when memory is being allocated (and, apparently, not deallocated)? The latest query has a HashAggregate that looks like this: HashAggregate (cost=19950525.30..19951025.30 rows=40000 width=37) but there are, in reality, approximately 200 million rows (when I run the query with GroupAggregate, that's what I get). Why does it keep choosing 40,000 rows? I suppose I could use the newly-learned ALTER USER trick to disable hash aggregation for the primary user, because disabling hash aggregation system-wide sounds fairly drastic. However, if I *don't* disable it, the query quickly balloons memory usage to the point where the process is killed off. -- Jon
Hello look on EXPLAIN ANALYZE command. Probably your statistic are out, and then planner can be confused. EXPLAIN ANALYZE statement show it. Regards Pavel Stehule 2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>: > On Fri, Nov 5, 2010 at 7:26 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: >> I have a postgres 8.4.5 instance on CentOS 5 (x86_64) which appears to >> go crazy with the amount of memory it consumes. >> When I run the query below, in a matter of a few seconds memory >> balloons to 5.3G (virtual), 4.6G (resident) and 1840 (shared), and >> eventually the oom killer is invoked, killing the entire process. >> >> Physical memory is 8GB but other processes on the box consume >> approximately 4GB of that. >> >> The settings changed from their defaults: >> >> effective_cache_size = 4GB >> work_mem = 16MB >> maintenance_work_mem = 128MB >> wal_buffers = 16MB >> checkpoint_segments = 16 >> shared_buffers = 384MB >> checkpoint_segments = 64 >> >> and >> >> default_statistics_target = 100 >> >> The query is this: >> >> insert into d_2010_09_13_sum >> select FOO.i, FOO.n, sum(FOO.cc) as cc, sum(FOO.oc) as oc >> from ( >> select * from d_2010_09_12_sum >> union all >> select * from d_2010_09_13 >> ) AS FOO group by i, n; >> >> here is the explain: >> >> Subquery Scan "*SELECT*" (cost=1200132.06..1201332.06 rows=40000 width=80) >> -> HashAggregate (cost=1200132.06..1200732.06 rows=40000 width=41) >> -> Append (cost=0.00..786531.53 rows=41360053 width=41) >> -> Seq Scan on d_2010_09_12_sum (cost=0.00..520066.48 >> rows=27272648 width=42) >> -> Seq Scan on d_2010_09_13 (cost=0.00..266465.05 >> rows=14087405 width=40) >> >> Both source tables freshly vacuum analyze'd. >> The row estimates are correct for both source tables. >> >> If I use "set enable_hashagg = false" I get this plan: >> >> Subquery Scan "*SELECT*" (cost=8563632.73..9081838.25 rows=40000 width=80) >> -> GroupAggregate (cost=8563632.73..9081238.25 rows=40000 width=41) >> -> Sort (cost=8563632.73..8667033.84 rows=41360441 width=41) >> Sort Key: d_2010_09_12_sum.i, d_2010_09_12_sum.n >> -> Result (cost=0.00..786535.41 rows=41360441 width=41) >> -> Append (cost=0.00..786535.41 rows=41360441 width=41) >> -> Seq Scan on d_2010_09_12_sum >> (cost=0.00..520062.04 rows=27272204 width=42) >> -> Seq Scan on d_2010_09_13 >> (cost=0.00..266473.37 rows=14088237 width=40) >> >> and postmaster's memory never exceeds (roughly) 548M (virtual), 27M >> (resident), 5M (shared). >> >> I even set default_statistics_target to 1000 and re-ran "vacuum >> analyze verbose" on both tables - no change. >> If I set work_mem to 1MB (from 16MB) then the GroupAggregate variation >> is chosen instead. >> Experimentally, HashAggregate is chosen when work_mem is 16MB, 8MB, >> 6MB, 5MB but not 4MB and on down. >> >> Two things I don't understand: >> >> 1. Why, when hash aggregation is allowed, does memory absolutely >> explode (eventually invoking the wrath of the oom killer). 16MB for >> work_mem does not seem outrageously high. For that matter, neither >> does 5MB. >> >> 2. Why do both HashAggregate and GroupAggregate say the cost estimate >> is 40000 rows? > > Unfortunately, I've found that as my database size grows, I've > generally had to disable hash aggregates for fear of even simple > seeming queries running out of memory, even with work_mem = 1MB. > > In some cases I saw memory usage (with hashagg) grow to well over 5GB > and with group aggregate it barely moves. Am *I* doing something > wrong? Some of these queries are on partitioned tables (typically > querying the parent) and the resulting UNION or UNION ALL really > starts to hurt, and when the server runs out of memory and kills of > the postmaster process a few minutes or even hours into the query it > doesn't make anybody very happy. > > Is there some setting I can turn on to look to see when memory is > being allocated (and, apparently, not deallocated)? > > The latest query has a HashAggregate that looks like this: > HashAggregate (cost=19950525.30..19951025.30 rows=40000 width=37) > but there are, in reality, approximately 200 million rows (when I run > the query with GroupAggregate, that's what I get). > > Why does it keep choosing 40,000 rows? > > I suppose I could use the newly-learned ALTER USER trick to disable > hash aggregation for the primary user, because disabling hash > aggregation system-wide sounds fairly drastic. However, if I *don't* > disable it, the query quickly balloons memory usage to the point where > the process is killed off. > > -- > Jon > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > look on EXPLAIN ANALYZE command. Probably your statistic are out, and > then planner can be confused. EXPLAIN ANALYZE statement show it. As I noted earlier, I did set statistics to 1000 an re-ran vacuum analyze and the plan did not change. What other diagnostics can I provide? This still doesn't answer the 40000 row question, though. It seems absurd to me that the planner would give up and just use 40000 rows (0.02 percent of the actual result). -- Jon
2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>: > On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Hello >> >> look on EXPLAIN ANALYZE command. Probably your statistic are out, and >> then planner can be confused. EXPLAIN ANALYZE statement show it. > > As I noted earlier, I did set statistics to 1000 an re-ran vacuum > analyze and the plan did not change. this change can do nothing. this is default in config. did you use ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE > > What other diagnostics can I provide? This still doesn't answer the > 40000 row question, though. It seems absurd to me that the planner > would give up and just use 40000 rows (0.02 percent of the actual > result). > there can be some not well supported operation, then planner use a some % from rows without statistic based estimation Pavel > -- > Jon > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Thu, Nov 11, 2010 at 10:38 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>: >> On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> Hello >>> >>> look on EXPLAIN ANALYZE command. Probably your statistic are out, and >>> then planner can be confused. EXPLAIN ANALYZE statement show it. >> >> As I noted earlier, I did set statistics to 1000 an re-ran vacuum >> analyze and the plan did not change. > > this change can do nothing. this is default in config. did you use > ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE No. To be clear: are you saying that changing the value for default_statistics_target, restarting postgresql, and re-running VACUUM ANALYZE does *not* change the statistics for columns created/populated *prior* to the sequence of operations, and that one /must/ use ALTER TABLE ALTER COLUMN SET STATISTICS ... and re-ANALYZE? That does not jive with the documentation, which appears to suggest that setting a new default_statistics_target, restarting postgresql, and then re-ANALYZE'ing a table should be sufficient (provided the columns have not had a statistics target explicitly set). >> What other diagnostics can I provide? This still doesn't answer the >> 40000 row question, though. It seems absurd to me that the planner >> would give up and just use 40000 rows (0.02 percent of the actual >> result). >> > > there can be some not well supported operation, then planner use a > some % from rows without statistic based estimation The strange thing is that the value 40000 keeps popping up in totally diffferent contexts, with different tables, databases, etc... I tried digging through the code and the only thing I found was that numGroups was being set to 40000 but I couldn't see where. -- Jon
2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>: > On Thu, Nov 11, 2010 at 10:38 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2010/11/12 Jon Nelson <jnelson+pgsql@jamponi.net>: >>> On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> Hello >>>> >>>> look on EXPLAIN ANALYZE command. Probably your statistic are out, and >>>> then planner can be confused. EXPLAIN ANALYZE statement show it. >>> >>> As I noted earlier, I did set statistics to 1000 an re-ran vacuum >>> analyze and the plan did not change. >> >> this change can do nothing. this is default in config. did you use >> ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE > > No. To be clear: are you saying that changing the value for > default_statistics_target, restarting postgresql, and re-running > VACUUM ANALYZE does *not* change the statistics for columns > created/populated *prior* to the sequence of operations, and that one > /must/ use ALTER TABLE ALTER COLUMN SET STATISTICS ... and re-ANALYZE? > yes. but I was wrong. Documentation is correct. Problem is elsewhere. > That does not jive with the documentation, which appears to suggest > that setting a new default_statistics_target, restarting postgresql, > and then re-ANALYZE'ing a table should be sufficient (provided the > columns have not had a statistics target explicitly set). > >>> What other diagnostics can I provide? This still doesn't answer the >>> 40000 row question, though. It seems absurd to me that the planner >>> would give up and just use 40000 rows (0.02 percent of the actual >>> result). >>> >> >> there can be some not well supported operation, then planner use a >> some % from rows without statistic based estimation > > The strange thing is that the value 40000 keeps popping up in totally > diffferent contexts, with different tables, databases, etc... I tried > digging through the code and the only thing I found was that numGroups > was being set to 40000 but I couldn't see where. > if I remember well, you can set a number of group by ALTER TABLE ALTER COLUMN SET n_distinct = .. maybe you use it. Regards Pavel Stehule http://www.postgresql.org/docs/9.0/interactive/sql-altertable.html > -- > Jon > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Fri, Nov 12, 2010 at 11:12 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > if I remember well, you can set a number of group by ALTER TABLE ALTER > COLUMN SET n_distinct = .. > > maybe you use it. I'm not sure where the number 40,000 is coming from either, but I think Pavel's suggestion is a good one. If you're grouping on a column with N distinct values, then it stands to reason there will be N groups, and the planner is known to estimate n_distinct on large tables, even with very high statistics targets, which is why 9.0 allows a manual override. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company