Thread: Out of memory error during large hashagg

Out of memory error during large hashagg

From
Casey Duncan
Date:
I've reported variants of this in the past, but this case is entirely
repeatable.

Executing this query:

select st_id, min(seed_id) as "initial_seed_id", count(*) as
"seed_count"
from seed group by st_id;

The query plan and table stats are:

                               QUERY PLAN
-----------------------------------------------------------------------
HashAggregate  (cost=1362694.83..1365164.68 rows=164656 width=16)
    ->  Seq Scan on seed  (cost=0.00..964065.62 rows=53150562 width=16)


  relname | relpages |  reltuples
---------+----------+-------------
seed    |   428880 | 5.26984e+07

Environment:

Debian linux, kernel v. 2.6.13.1-20050914
Dual opterons w/8G RAM
Postgresql v 8.1.3

Pg Config:

shared_buffers = 50000
work_mem = 262144
maintenance_work_mem = 262144
max_fsm_pages = 100000

When the query blows up I get this spew in the server log:

TopMemoryContext: 40960 total in 5 blocks; 12192 free (11 chunks);
28768 used
TopTransactionContext: 8192 total in 1 blocks; 5784 free (0 chunks);
2408 used
Type information cache: 8192 total in 1 blocks; 1864 free (0 chunks);
6328 used
Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks);
3256 used
MessageContext: 24576 total in 2 blocks; 15080 free (4 chunks); 9496
used
smgr relation table: 8192 total in 1 blocks; 1840 free (0 chunks);
6352 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used
ExecutorState: 8192 total in 1 blocks; 1928 free (17 chunks); 6264 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AggContext: -2001739776 total in 285 blocks; 4784 free (141 chunks);
-2001744560 used
TupleHashTable: 497279000 total in 70 blocks; 1772200 free (259
chunks); 495506800 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used
CacheMemoryContext: 516096 total in 6 blocks; 46448 free (6 chunks);
469648 used
pg_shdepend_depender_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_shdepend_reference_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_depend_depender_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_depend_reference_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
seed_pkey: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
seed_station_music_id_idx: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
seed_music_id_idx: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
seed_station_id: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
feedback_pkey: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
feedback_date_created_is_positive_idx: 1024 total in 1 blocks; 328
free (0 chunks); 696 used
feedback_station_music_id_idx: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
feedback_station_id_idx: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
listener_segment_pkey: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
listener_expiration_date_idx: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
pg_constraint_conrelid_index: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
listener_default_web_name_key: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
listener_default_username_key: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
listener_default_pkey: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free
(0 chunks); 696 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free
(0 chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_opclass_oid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_database_datname_index: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_cast_source_target_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
MdSmgr: 8192 total in 1 blocks; 4768 free (1 chunks); 3424 used
LockTable (locallock hash): 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used
ErrorContext: 8192 total in 1 blocks; 8176 free (3 chunks); 16 used
2006-09-17 21:31:42.753 PDT [d:radio_prod_full u:radio s:
450e0813.6b5d 4] ERROR:  out of memory
2006-09-17 21:31:42.753 PDT [d:radio_prod_full u:radio s:
450e0813.6b5d 5] DETAIL:  Failed on request of size 88.
2006-09-17 21:31:42.753 PDT [d:radio_prod_full u:radio s:
450e0813.6b5d 6] STATEMENT:  select st_id, min(seed_id) as
"initial_seed_id",
                 count(*) as "seed_count"
         from seed group by st_id;


If I execute "set enable_hashagg=off;" before running this query it
completes successfully.

Thanks!

-Casey

Re: Out of memory error during large hashagg

From
Simon Riggs
Date:
On Mon, 2006-09-18 at 14:08 -0700, Casey Duncan wrote:
> I've reported variants of this in the past, but this case is entirely
> repeatable.
>
> Executing this query:
>
> select st_id, min(seed_id) as "initial_seed_id", count(*) as
> "seed_count"
> from seed group by st_id;
>
> The query plan and table stats are:
>
>                                QUERY PLAN
> -----------------------------------------------------------------------
> HashAggregate  (cost=1362694.83..1365164.68 rows=164656 width=16)
>     ->  Seq Scan on seed  (cost=0.00..964065.62 rows=53150562 width=16)
>
>
>   relname | relpages |  reltuples
> ---------+----------+-------------
> seed    |   428880 | 5.26984e+07
>

The hashagg doesn't yet have scroll to disk capability, so a bad
estimation of ndistinct will cause this to fail (at any release). This
is a known issue for certain distributions of data only. The workaround
is the one you know about already: enable_hashagg = off

I'm interested in collecting info on the distribution of data.
Can you post:

select tablename, attname, n_distinct from pg_stats
where attname = 'st_id';

select count(distinct st_id) from seed;

and also the table definition, including the PK

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: Out of memory error during large hashagg

From
Casey Duncan
Date:
On Sep 19, 2006, at 1:51 AM, Simon Riggs wrote:

> On Mon, 2006-09-18 at 14:08 -0700, Casey Duncan wrote:
>> I've reported variants of this in the past, but this case is entirely
>> repeatable.
>>
>> Executing this query:
>>
>> select st_id, min(seed_id) as "initial_seed_id", count(*) as
>> "seed_count"
>> from seed group by st_id;
>>
>> The query plan and table stats are:
>>
>>                                QUERY PLAN
>> ---------------------------------------------------------------------
>> --
>> HashAggregate  (cost=1362694.83..1365164.68 rows=164656 width=16)
>>     ->  Seq Scan on seed  (cost=0.00..964065.62 rows=53150562
>> width=16)
>>
>>
>>   relname | relpages |  reltuples
>> ---------+----------+-------------
>> seed    |   428880 | 5.26984e+07
>>
>
> The hashagg doesn't yet have scroll to disk capability, so a bad
> estimation of ndistinct will cause this to fail (at any release). This
> is a known issue for certain distributions of data only. The
> workaround
> is the one you know about already: enable_hashagg = off

Ok, is that something for the TODO list? I took a glance and didn't
see it.

>
> I'm interested in collecting info on the distribution of data.
> Can you post:
>
> select tablename, attname, n_distinct from pg_stats
> where attname = 'st_id';

   tablename   | attname | n_distinct
--------------+---------+------------
st            | st_id   |         -1
seed          | st_id   |     164656
feed          | st_id   |      14250
book          | st_id   |      14856
legacy_st     | st_id   |         -1
(5 rows)

I ran analyze after this, but the results were roughly the same.

> select count(distinct st_id) from seed;

   count
----------
40418083
(1 row)

Looks a tad bit different than the above ;^)

> and also the table definition, including the PK

Table "public.seed"
     Column    |            Type             |   Modifiers
--------------+-----------------------------+---------------
seed_id      | bigint                      | not null
mc_id        | character varying(20)       |
st_id        | bigint                      |
date_created | timestamp without time zone | default now()
Indexes:
     "seed_pkey" PRIMARY KEY, btree (seed_id)
     "seed_st_mc_id_idx" UNIQUE, btree (mc_id, st_id)
     "seed_mc_id_idx" btree (mc_id)
     "seed_st_id" btree (st_id)
Foreign-key constraints:
     "seed_st_id_fkey" FOREIGN KEY (st_id) REFERENCES st(st_id) ON
DELETE RESTRICT

Thanks

-Casey

Re: Out of memory error during large hashagg

From
Tom Lane
Date:
Casey Duncan <casey@pandora.com> writes:
> select st_id, min(seed_id) as "initial_seed_id", count(*) as
> "seed_count" from seed group by st_id;

> The query plan and table stats are:

>                                QUERY PLAN
> -----------------------------------------------------------------------
> HashAggregate  (cost=1362694.83..1365164.68 rows=164656 width=16)
>     ->  Seq Scan on seed  (cost=0.00..964065.62 rows=53150562 width=16)

How many distinct st_id values are there really?  The planner's
evidently expecting 164656 but I suppose that's wrong?  What's
in pg_stats for st_id?

            regards, tom lane

Re: Out of memory error during large hashagg

From
Casey Duncan
Date:
I posted that in a subsequent mail, but here it is again:

> I'm interested in collecting info on the distribution of data.
> Can you post:
>
> select tablename, attname, n_distinct from pg_stats
> where attname = 'st_id';
>

   tablename   | attname | n_distinct
--------------+---------+------------
st            | st_id   |         -1
seed          | st_id   |     164656
feed          | st_id   |      14250
book          | st_id   |      14856
legacy_st     | st_id   |         -1
(5 rows)

I ran analyze after this, but the results were roughly the same.


> select count(distinct st_id) from seed;
>

   count
----------
40418083
(1 row)

Looks a tad bit different than the above ;^)

> and also the table definition, including the PK
>

Table "public.seed"
     Column    |            Type             |   Modifiers
--------------+-----------------------------+---------------
seed_id      | bigint                      | not null
mc_id        | character varying(20)       |
st_id        | bigint                      |
date_created | timestamp without time zone | default now()
Indexes:
     "seed_pkey" PRIMARY KEY, btree (seed_id)
     "seed_st_mc_id_idx" UNIQUE, btree (mc_id, st_id)
     "seed_mc_id_idx" btree (mc_id)
     "seed_st_id" btree (st_id)
Foreign-key constraints:
     "seed_st_id_fkey" FOREIGN KEY (st_id) REFERENCES st(st_id) ON
DELETE RESTRICT

I imagine this means I need to crank up the statistics on that column.

Thanks

-Casey


On Sep 23, 2006, at 3:17 PM, Tom Lane wrote:

> Casey Duncan <casey@pandora.com> writes:
>> select st_id, min(seed_id) as "initial_seed_id", count(*) as
>> "seed_count" from seed group by st_id;
>
>> The query plan and table stats are:
>
>>                                QUERY PLAN
>> ---------------------------------------------------------------------
>> --
>> HashAggregate  (cost=1362694.83..1365164.68 rows=164656 width=16)
>>     ->  Seq Scan on seed  (cost=0.00..964065.62 rows=53150562
>> width=16)
>
> How many distinct st_id values are there really?  The planner's
> evidently expecting 164656 but I suppose that's wrong?  What's
> in pg_stats for st_id?
>
>             regards, tom lane

Re: Out of memory error during large hashagg

From
Tom Lane
Date:
Casey Duncan <casey@pandora.com> writes:
> seed          | st_id   |     164656

> I ran analyze after this, but the results were roughly the same.

What's the statistics target set to, and did you try increasing it?
Can we see the rest of the pg_stats row for this column (I'm mainly
interested in the most-common-value frequencies)?

            regards, tom lane

Re: Out of memory error during large hashagg

From
Casey Duncan
Date:
On Sep 24, 2006, at 8:59 AM, Tom Lane wrote:

> Casey Duncan <casey@pandora.com> writes:
>> seed          | st_id   |     164656
>
>> I ran analyze after this, but the results were roughly the same.
>
> What's the statistics target set to, and did you try increasing it?
> Can we see the rest of the pg_stats row for this column (I'm mainly
> interested in the most-common-value frequencies)?

Here's the full pg_stats row:

  public     | seed      | st_id |         0 |         8 |     138674
|
{42151109694333522,66226913946054313,108711403296197641,1351772151143018
06,1130147629642373,1597650524892734,16434507676114329,17430059620571257
,22269564345616346,28879557667799977} |
{0.001,0.001,0.001,0.001,0.000666667,0.000666667,0.000666667,0.000666667
,0.000666667,0.000666667} |
{69337952030227,23194309459225658,44727149474281842,67399809385152536,93
773403960447474,116874916355760832,138282618203581456,141866119873210938
,145491433047267294,149167246450593786,152782384850071963} |    0.829168

I haven't tried increasing the stats target. What would be a suitable
value to try?

Thanks

-Casey

Re: Out of memory error during large hashagg

From
Tom Lane
Date:
Casey Duncan <casey@pandora.com> writes:
> I haven't tried increasing the stats target. What would be a suitable
> value to try?

Try 100 (instead of the default 10) ... you can go as high as 1000,
though hopefully that's overkill.

            regards, tom lane