custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption - Mailing list pgsql-hackers

From Tomas Vondra
Subject custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Date
Msg-id 525058FE.1060609@fuzzy.cz
Whole thread Raw
Responses Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption  (Tomáš Janoušek <tomi@nomi.cz>)
List pgsql-hackers
Hi,

I've been playing with a custom COUNT(DISTINCT) aggregate based on hash
tables - it seems to be working as planned, except for memory
consumption which is much higher than I expected.

The code is available at https://github.com/tvondra/count_distinct and
in short it works like this:

1) There's a simple hash table for each group (so with HashAggregate  it's effectively a hash table of hash tables).
Thehash table  contains only distinct values for that particular group, so the  result is simply equal to number of
itemsin the table.
 

2) The table starts very small (4 buckets), and when it grows too large  (more 20 items in a bucket on average) it gets
resizedon the fly to  4x the size. So the buckets grows 4 -> 16 -> 64 -> 256 -> ... (and  the max number of items in
thetable grows 80 -> 320 -> 1280 ...).
 

Let's assume I have a table like this
  CREATE TABLE test_table (a int, b int);
  INSERT INTO test_table SELECT mod(i,4000), (100000*random())::int  FROM generate_series(1,80000000) s(i);

And let's run a query like this:
  SELECT a, count_distinct(b) FROM test_table GROUP a;

I.e. there are ~4k groups with ~20k distinct values for each group. This
query however consumes >5GB of memory, which is much more than I
expected, and I've spent a fair amount of time looking for memory leaks
in the code so I'm wondering if I'm missing something important.

The custom hash tables are built from these very simple structures:
   typedef struct hash_element_t {       uint32  hash;       uint32  length;       char   *value; /* the actual value
(say4B for int4) */   } hash_element_t;
 
   typedef struct hash_bucket_t {       uint32  nitems;       hash_element_t * items; /* array of hash_element_t items
*/  } hash_bucket_t;
 
   typedef struct hash_table_t {       uint16  nbits;    /* number of significant bits of the hash */       uint32
nbuckets;/* number of buckets (2^nbits) */       uint32  nitems;   /* current number of elements in the table */
hash_bucket_t*  buckets; /* array of hash_bucket_t */   } hash_table_t;
 

I'm on 64-bit architecture and the example works with int32, which means
the sizes should be about this:
   hash_element_t => 20B   hash_bucket_t  => 4B + (20B * items in the bucket [in steps of 5])   hash_table_t   => 4B +
spacefor buckets
 

In the example above, there's ~20k unique values in each group. The
threshold is 20 items per bucket on average, so that's 1024 buckets, and
the buckets are almost full.

So for single group, the hash table size is about
  4B + 1024 * (4B + 20 * 20B) = 413700B = ~ 400 kB

There are 4000 groups, so the total estimate is ~1.6GB in total.

However when executed (9.2, 9.3 and HEAD behave exactly the same), the
query consumes almost ~5GB of RAM (excluding shared buffers).

This is how the memory context stats: http://pastebin.com/JHjnehvC

The interesting part is

ExecutorState: 24576 total in 2 blocks; 16368 free (6 chunks); 8208 used ExprContext: 0 total in 0 blocks; 0 free (0
chunks);0 used AggContext: 5106224640 total in 4612 blocks; 565779368 free             (1072086 chunks); 4540445272
used  TupleHashTable: 253952 total in 5 blocks; 55360 free                   (16 chunks); 198592 used ExprContext: 0
totalin 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
 

So there's ~5GB in the AggContext, ~500MB of that is free.

I'm aware that there will be some more memory allocated by the
HashAggregate itself, but although I'm not sure how much that should be,
I would expect a "small amount" compared to the 1.6GB.

So I'm wondering what uses those 3.5GB? Is this the overhead of
HashAggregate, or am I missing something (e.g. a memory leak in the code)?


Tomas



pgsql-hackers by date:

Previous
From: Sameer Thakur
Date:
Subject: Re: pg_stat_statements: calls under-estimation propagation
Next
From: Noah Misch
Date:
Subject: Re: pgbench progress report improvements - split 3 v2 - A