Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Memory-Bounded Hash Aggregation |
Date | |
Msg-id | 20200219191636.gvdywx32kwbix6kd@development Whole thread Raw |
In response to | Re: Memory-Bounded Hash Aggregation (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Memory-Bounded Hash Aggregation
Re: Memory-Bounded Hash Aggregation Re: Memory-Bounded Hash Aggregation Re: Memory-Bounded Hash Aggregation Re: Memory-Bounded Hash Aggregation |
List | pgsql-hackers |
Hi, I've started reviewing the 20200218 version of the patch. In general it seems fine, but I have a couple minor comments and two crashes. 1) explain.c currently does this: I wonder if we could show something for plain explain (without analyze). At least the initial estimate of partitions, etc. I know not showing those details until after execution is what e.g. sort does, but I find it a bit annoying. A related comment is that maybe this should report also the initial number of partitions, not just the total number. With just the total it's impossible to say if there were any repartitions, etc. 2) The ExecBuildAggTrans comment should probably explain "spilled". 3) I wonder if we need to invent new opcodes? Wouldn't it be simpler to just add a new flag to the agg_* structs instead? I haven't tried hacking this, so maybe it's a silly idea. 4) lookup_hash_entries says /* check to see if we need to spill the tuple for this grouping set */ But that seems bogus, because AFAIK we can't spill tuples for grouping sets. So maybe this should say just "grouping"? 5) Assert(nbuckets > 0); I was curious what happens in case of extreme skew, when a lot/all rows consistently falls into a single partition. So I did this: create table t (a int, b real); insert into t select i, random() from generate_series(-2000000000, 2000000000) s(i) where mod(hashint4(i), 16384) = 0; analyze t; set work_mem = '64kB'; set max_parallel_workers_per_gather = 0; set enable_sort = 0; explain select a, sum(b) from t group by a; QUERY PLAN --------------------------------------------------------------- HashAggregate (cost=23864.26..31088.52 rows=244631 width=8) Group Key: a -> Seq Scan on t (cost=0.00..3529.31 rows=244631 width=8) (3 rows) This however quickly fails on this assert in BuildTupleHashTableExt (see backtrace1.txt): Assert(nbuckets > 0); The value is computed in hash_choose_num_buckets, and there seem to be no protections against returning bogus values like 0. So maybe we should return Min(nbuckets, 1024) or something like that, similarly to hash join. OTOH maybe it's simply due to agg_refill_hash_table() passing bogus values to the function? 6) Another thing that occurred to me was what happens to grouping sets, which we can't spill to disk. So I did this: create table t2 (a int, b int, c int); -- run repeatedly, until there are about 20M rows in t2 (1GB) with tx as (select array_agg(a) as a, array_agg(b) as b from (select a, b from t order by random()) foo), ty as (select array_agg(a) AS a from (select a from t order by random()) foo) insert into t2 select unnest(tx.a), unnest(ty.a), unnest(tx.b) from tx, ty; analyze t2; This produces a table with two independent columns, skewed the same as the column t.a. I don't know which of this actually matters, considering grouping sets don't spill, so maybe the independence is sufficient and the skew may be irrelevant? And then do this: set work_mem = '200MB'; set max_parallel_workers_per_gather = 0; set enable_sort = 0; explain select a, b, sum(c) from t2 group by cube (a,b);; QUERY PLAN --------------------------------------------------------------------- MixedAggregate (cost=0.00..833064.27 rows=2756495 width=16) Hash Key: a, b Hash Key: a Hash Key: b Group Key: () -> Seq Scan on t2 (cost=0.00..350484.44 rows=22750744 width=12) (6 rows) which fails with segfault at execution time: tuplehash_start_iterate (tb=0x18, iter=iter@entry=0x2349340) 870 for (i = 0; i < tb->size; i++) (gdb) bt #0 tuplehash_start_iterate (tb=0x18, iter=iter@entry=0x2349340) #1 0x0000000000654e49 in agg_retrieve_hash_table_in_memory ... That's not surprising, because 0x18 pointer is obviously bogus. I guess this is simply an offset 18B added to a NULL pointer? Disabling hashagg spill (setting both GUCs to off) makes no difference, but on master it fails like this: ERROR: out of memory DETAIL: Failed on request of size 3221225472 in memory context "ExecutorState". which is annoying, but expected with an under-estimate and hashagg. And much better than just crashing the whole cluster. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: