Thread: Out of memory error during large hashagg
I've reported variants of this in the past, but this case is entirely repeatable. Executing this query: select st_id, min(seed_id) as "initial_seed_id", count(*) as "seed_count" from seed group by st_id; The query plan and table stats are: QUERY PLAN ----------------------------------------------------------------------- HashAggregate (cost=1362694.83..1365164.68 rows=164656 width=16) -> Seq Scan on seed (cost=0.00..964065.62 rows=53150562 width=16) relname | relpages | reltuples ---------+----------+------------- seed | 428880 | 5.26984e+07 Environment: Debian linux, kernel v. 2.6.13.1-20050914 Dual opterons w/8G RAM Postgresql v 8.1.3 Pg Config: shared_buffers = 50000 work_mem = 262144 maintenance_work_mem = 262144 max_fsm_pages = 100000 When the query blows up I get this spew in the server log: TopMemoryContext: 40960 total in 5 blocks; 12192 free (11 chunks); 28768 used TopTransactionContext: 8192 total in 1 blocks; 5784 free (0 chunks); 2408 used Type information cache: 8192 total in 1 blocks; 1864 free (0 chunks); 6328 used Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used MessageContext: 24576 total in 2 blocks; 15080 free (4 chunks); 9496 used smgr relation table: 8192 total in 1 blocks; 1840 free (0 chunks); 6352 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used ExecutorState: 8192 total in 1 blocks; 1928 free (17 chunks); 6264 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used AggContext: -2001739776 total in 285 blocks; 4784 free (141 chunks); -2001744560 used TupleHashTable: 497279000 total in 70 blocks; 1772200 free (259 chunks); 495506800 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used CacheMemoryContext: 516096 total in 6 blocks; 46448 free (6 chunks); 469648 used pg_shdepend_depender_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_shdepend_reference_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_depend_depender_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_depend_reference_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used seed_pkey: 1024 total in 1 blocks; 392 free (0 chunks); 632 used seed_station_music_id_idx: 1024 total in 1 blocks; 328 free (0 chunks); 696 used seed_music_id_idx: 1024 total in 1 blocks; 392 free (0 chunks); 632 used seed_station_id: 1024 total in 1 blocks; 392 free (0 chunks); 632 used feedback_pkey: 1024 total in 1 blocks; 392 free (0 chunks); 632 used feedback_date_created_is_positive_idx: 1024 total in 1 blocks; 328 free (0 chunks); 696 used feedback_station_music_id_idx: 1024 total in 1 blocks; 328 free (0 chunks); 696 used feedback_station_id_idx: 1024 total in 1 blocks; 392 free (0 chunks); 632 used listener_segment_pkey: 1024 total in 1 blocks; 392 free (0 chunks); 632 used listener_expiration_date_idx: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_constraint_conrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used listener_default_web_name_key: 1024 total in 1 blocks; 392 free (0 chunks); 632 used listener_default_username_key: 1024 total in 1 blocks; 392 free (0 chunks); 632 used listener_default_pkey: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_opclass_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_database_datname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_cast_source_target_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_amop_opr_opc_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used MdSmgr: 8192 total in 1 blocks; 4768 free (1 chunks); 3424 used LockTable (locallock hash): 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used ErrorContext: 8192 total in 1 blocks; 8176 free (3 chunks); 16 used 2006-09-17 21:31:42.753 PDT [d:radio_prod_full u:radio s: 450e0813.6b5d 4] ERROR: out of memory 2006-09-17 21:31:42.753 PDT [d:radio_prod_full u:radio s: 450e0813.6b5d 5] DETAIL: Failed on request of size 88. 2006-09-17 21:31:42.753 PDT [d:radio_prod_full u:radio s: 450e0813.6b5d 6] STATEMENT: select st_id, min(seed_id) as "initial_seed_id", count(*) as "seed_count" from seed group by st_id; If I execute "set enable_hashagg=off;" before running this query it completes successfully. Thanks! -Casey
On Mon, 2006-09-18 at 14:08 -0700, Casey Duncan wrote: > I've reported variants of this in the past, but this case is entirely > repeatable. > > Executing this query: > > select st_id, min(seed_id) as "initial_seed_id", count(*) as > "seed_count" > from seed group by st_id; > > The query plan and table stats are: > > QUERY PLAN > ----------------------------------------------------------------------- > HashAggregate (cost=1362694.83..1365164.68 rows=164656 width=16) > -> Seq Scan on seed (cost=0.00..964065.62 rows=53150562 width=16) > > > relname | relpages | reltuples > ---------+----------+------------- > seed | 428880 | 5.26984e+07 > The hashagg doesn't yet have scroll to disk capability, so a bad estimation of ndistinct will cause this to fail (at any release). This is a known issue for certain distributions of data only. The workaround is the one you know about already: enable_hashagg = off I'm interested in collecting info on the distribution of data. Can you post: select tablename, attname, n_distinct from pg_stats where attname = 'st_id'; select count(distinct st_id) from seed; and also the table definition, including the PK -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sep 19, 2006, at 1:51 AM, Simon Riggs wrote: > On Mon, 2006-09-18 at 14:08 -0700, Casey Duncan wrote: >> I've reported variants of this in the past, but this case is entirely >> repeatable. >> >> Executing this query: >> >> select st_id, min(seed_id) as "initial_seed_id", count(*) as >> "seed_count" >> from seed group by st_id; >> >> The query plan and table stats are: >> >> QUERY PLAN >> --------------------------------------------------------------------- >> -- >> HashAggregate (cost=1362694.83..1365164.68 rows=164656 width=16) >> -> Seq Scan on seed (cost=0.00..964065.62 rows=53150562 >> width=16) >> >> >> relname | relpages | reltuples >> ---------+----------+------------- >> seed | 428880 | 5.26984e+07 >> > > The hashagg doesn't yet have scroll to disk capability, so a bad > estimation of ndistinct will cause this to fail (at any release). This > is a known issue for certain distributions of data only. The > workaround > is the one you know about already: enable_hashagg = off Ok, is that something for the TODO list? I took a glance and didn't see it. > > I'm interested in collecting info on the distribution of data. > Can you post: > > select tablename, attname, n_distinct from pg_stats > where attname = 'st_id'; tablename | attname | n_distinct --------------+---------+------------ st | st_id | -1 seed | st_id | 164656 feed | st_id | 14250 book | st_id | 14856 legacy_st | st_id | -1 (5 rows) I ran analyze after this, but the results were roughly the same. > select count(distinct st_id) from seed; count ---------- 40418083 (1 row) Looks a tad bit different than the above ;^) > and also the table definition, including the PK Table "public.seed" Column | Type | Modifiers --------------+-----------------------------+--------------- seed_id | bigint | not null mc_id | character varying(20) | st_id | bigint | date_created | timestamp without time zone | default now() Indexes: "seed_pkey" PRIMARY KEY, btree (seed_id) "seed_st_mc_id_idx" UNIQUE, btree (mc_id, st_id) "seed_mc_id_idx" btree (mc_id) "seed_st_id" btree (st_id) Foreign-key constraints: "seed_st_id_fkey" FOREIGN KEY (st_id) REFERENCES st(st_id) ON DELETE RESTRICT Thanks -Casey
Casey Duncan <casey@pandora.com> writes: > select st_id, min(seed_id) as "initial_seed_id", count(*) as > "seed_count" from seed group by st_id; > The query plan and table stats are: > QUERY PLAN > ----------------------------------------------------------------------- > HashAggregate (cost=1362694.83..1365164.68 rows=164656 width=16) > -> Seq Scan on seed (cost=0.00..964065.62 rows=53150562 width=16) How many distinct st_id values are there really? The planner's evidently expecting 164656 but I suppose that's wrong? What's in pg_stats for st_id? regards, tom lane
I posted that in a subsequent mail, but here it is again: > I'm interested in collecting info on the distribution of data. > Can you post: > > select tablename, attname, n_distinct from pg_stats > where attname = 'st_id'; > tablename | attname | n_distinct --------------+---------+------------ st | st_id | -1 seed | st_id | 164656 feed | st_id | 14250 book | st_id | 14856 legacy_st | st_id | -1 (5 rows) I ran analyze after this, but the results were roughly the same. > select count(distinct st_id) from seed; > count ---------- 40418083 (1 row) Looks a tad bit different than the above ;^) > and also the table definition, including the PK > Table "public.seed" Column | Type | Modifiers --------------+-----------------------------+--------------- seed_id | bigint | not null mc_id | character varying(20) | st_id | bigint | date_created | timestamp without time zone | default now() Indexes: "seed_pkey" PRIMARY KEY, btree (seed_id) "seed_st_mc_id_idx" UNIQUE, btree (mc_id, st_id) "seed_mc_id_idx" btree (mc_id) "seed_st_id" btree (st_id) Foreign-key constraints: "seed_st_id_fkey" FOREIGN KEY (st_id) REFERENCES st(st_id) ON DELETE RESTRICT I imagine this means I need to crank up the statistics on that column. Thanks -Casey On Sep 23, 2006, at 3:17 PM, Tom Lane wrote: > Casey Duncan <casey@pandora.com> writes: >> select st_id, min(seed_id) as "initial_seed_id", count(*) as >> "seed_count" from seed group by st_id; > >> The query plan and table stats are: > >> QUERY PLAN >> --------------------------------------------------------------------- >> -- >> HashAggregate (cost=1362694.83..1365164.68 rows=164656 width=16) >> -> Seq Scan on seed (cost=0.00..964065.62 rows=53150562 >> width=16) > > How many distinct st_id values are there really? The planner's > evidently expecting 164656 but I suppose that's wrong? What's > in pg_stats for st_id? > > regards, tom lane
Casey Duncan <casey@pandora.com> writes: > seed | st_id | 164656 > I ran analyze after this, but the results were roughly the same. What's the statistics target set to, and did you try increasing it? Can we see the rest of the pg_stats row for this column (I'm mainly interested in the most-common-value frequencies)? regards, tom lane
On Sep 24, 2006, at 8:59 AM, Tom Lane wrote: > Casey Duncan <casey@pandora.com> writes: >> seed | st_id | 164656 > >> I ran analyze after this, but the results were roughly the same. > > What's the statistics target set to, and did you try increasing it? > Can we see the rest of the pg_stats row for this column (I'm mainly > interested in the most-common-value frequencies)? Here's the full pg_stats row: public | seed | st_id | 0 | 8 | 138674 | {42151109694333522,66226913946054313,108711403296197641,1351772151143018 06,1130147629642373,1597650524892734,16434507676114329,17430059620571257 ,22269564345616346,28879557667799977} | {0.001,0.001,0.001,0.001,0.000666667,0.000666667,0.000666667,0.000666667 ,0.000666667,0.000666667} | {69337952030227,23194309459225658,44727149474281842,67399809385152536,93 773403960447474,116874916355760832,138282618203581456,141866119873210938 ,145491433047267294,149167246450593786,152782384850071963} | 0.829168 I haven't tried increasing the stats target. What would be a suitable value to try? Thanks -Casey
Casey Duncan <casey@pandora.com> writes: > I haven't tried increasing the stats target. What would be a suitable > value to try? Try 100 (instead of the default 10) ... you can go as high as 1000, though hopefully that's overkill. regards, tom lane