Home > mailing lists

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Default setting for enable_hashagg_disk
Date	July 24, 2020 15:18:48
Msg-id	CA+TgmoZHnQ4j0_bf+HH03_SwRxq6Hxh6W6+9xPiKvGbmOXaQPQ@mail.gmail.com Whole thread Raw
In response to	Re: Default setting for enable_hashagg_disk (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: Default setting for enable_hashagg_disk Re: Default setting for enable_hashagg_disk
List	pgsql-hackers

Tree view

On Thu, Jul 23, 2020 at 9:22 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>                            2MB     4MB    8MB    64MB    256MB
>     -----------------------------------------------------------
>      hash                 6.71    6.70   6.73    6.44     5.81
>      hash CP_SMALL_TLIST  5.28    5.26   5.24    5.04     4.54
>      sort                 3.41    3.41   3.41    3.57     3.45
>
> So sort writes ~3.4GB of data, give or take. But hashagg/master writes
> almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the
> original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still
> much more than the 3.4GB of data written by sort (which has to spill
> everything, while hashagg only spills rows not covered by the groups
> that fit into work_mem).
>
> I initially assumed this is due to writing the hash value to the tapes,
> and the rows are fairly narrow (only about 40B per row), so a 4B hash
> could make a difference - but certainly not this much. Moreover, that
> does not explain the difference between master and the now-reverted
> CP_SMALL_TLIST, I think.

This is all really good analysis, I think, but this seems like the key
finding. It seems like we don't really understand what's actually
getting written. Whether we use hash or sort doesn't seem like it
should have this kind of impact on how much data gets written, and
whether we use CP_SMALL_TLIST or project when needed doesn't seem like
it should matter like this either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 24 July 2020, 15:06:58
Subject: Re: HOT vs freezing issue causing "cannot freeze committed xmax"

From: Tom Lane
Date: 24 July 2020, 15:22:10
Subject: Re: Making CASE error handling less surprising

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

Previous

Next