Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Date
Msg-id 5251E736.3010203@fuzzy.cz
Whole thread Raw
In response to Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption  (Tomáš Janoušek <tomi@nomi.cz>)
Responses Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption  ("ktm@rice.edu" <ktm@rice.edu>)
Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption  (Atri Sharma <atri.jiit@gmail.com>)
List pgsql-hackers
On 6.10.2013 20:37, Tomáš Janoušek wrote:
> Hi,
> 
> On Sat, Oct 05, 2013 at 08:22:54PM +0200, Tomas Vondra wrote:
>> I'm on 64-bit architecture and the example works with int32, which means
>> the sizes should be about this:
>>
>>     hash_element_t => 20B
>>     hash_bucket_t  => 4B + (20B * items in the bucket [in steps of 5])
>>     hash_table_t   => 4B + space for buckets
>>
>> In the example above, there's ~20k unique values in each group. The
>> threshold is 20 items per bucket on average, so that's 1024 buckets, and
>> the buckets are almost full.
>>
>> So for single group, the hash table size is about
>>
>>    4B + 1024 * (4B + 20 * 20B) = 413700B = ~ 400 kB
>>
>> There are 4000 groups, so the total estimate is ~1.6GB in total.
>>
>> However when executed (9.2, 9.3 and HEAD behave exactly the same), the
>> query consumes almost ~5GB of RAM (excluding shared buffers).
> 
> I think the missing thing is the memory allocator bookkeeping overhead.
> You're assuming that hash_element_t.value takes 8B for the pointer and 4B for
> the value itself, but using malloc it takes another at least 20 bytes, and
> from a quick glance at backend/utils/mmgr/aset.c it seems that palloc is
> certainly not without its overhead either.
> 
> Also, each additional level of pointers adds execution overhead and increases
> the likelihood of cache misses.  I'd suggest a few improvements, if I may:
> 
> 1. Drop hash_element_t altogether, store length in hash_bucket_t and alloc
>    hash_bucket_t.items of size nitems * length bytes.  I doubt that storing
>    the hash values has a benefit worth the storage and code complexity
>    overhead (you're storing fixed-size ints, not large blobs that are
>    expensive to compare and hash).

Good idea - I'll move the length to the hash table.

You're right that keeping the hash for int32/64 values is probably
useless, however I planned to eventually extend the code to support
larger values (possibly even varlena types, i.e. text/bytea). So I'll
keep this for now, but I'll keep this as a possible optimization.

> 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2].
>    For fun, try not hashing those ints at all and see how that performs (that,
>    I think, is what you get from HashSet<int> in Java/C#).

I've used crc32 mostly because it's easily available in the code (i.e.
I'm lazy), but I've done some quick research / primitive benchmarking
too. For example hashing 2e9 integers takes this much time:

FNV-1   = 11.9
FNV-1a  = 11.9
jenkins = 38.8
crc32   = 32.0

So it's not really "slow" and the uniformity seems to be rather good.

I'll try FNV in the future, however I don't think that's the main issue
right now.

> 3. Consider dropping buckets in favor of open addressing (linear probing,
>    quadratic, whatever).  This avoids another level of pointer indirection.

OK, this sounds really interesting. It should be fairly straightforward
for fixed-length data types (in that case I can get rid of the pointers
altogether).

> It's been a few years since I've done stuff this low level, so I won't go into
> suggesting a different data structure -- I have honestly no idea what's the
> best way to count the number of distinct integers in a list.

Thanks for the hints!

Tomas



pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: Triggers on foreign tables
Next
From: Bruce Momjian
Date:
Subject: Re: record identical operator - Review