Home > mailing lists

Re: Disk-based hash aggregate's cost model - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Disk-based hash aggregate's cost model
Date	September 4, 2020 19:01:37
Msg-id	20200904190137.scrucxyjkxyc2bmk@development Whole thread Raw
In response to	Re: Disk-based hash aggregate's cost model (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Disk-based hash aggregate's cost model
List	pgsql-hackers

Tree view

On Fri, Sep 04, 2020 at 11:31:36AM -0700, Jeff Davis wrote:
>On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:
>> Those charts show that the CP_SMALL_TLIST resulted in smaller temp
>> files
>> (per EXPLAIN ANALYZE the difference is ~25%) and also lower query
>> durations (also in the ~25% range).
>
>I was able to reproduce the problem, thank you.
>
>Only two attributes are needed, so the CP_SMALL_TLIST projected schema
>only needs a single-byte null bitmap.
>
>But if just setting the attributes to NULL rather than projecting them,
>the null bitmap size is based on all 16 attributes, bumping the bitmap
>size to two bytes.
>
>MAXALIGN(23 + 1) = 24
>MAXALIGN(23 + 2) = 32
>
>I think that explains it. It's not ideal, but projection has a cost as
>well, so I don't think we necessarily need to do something here.
>
>If we are motivated to improve this in v14, we could potentially have a
>different schema for spilled tuples, and perform real projection at
>spill time. But I don't know if that's worth the extra complexity.
>

Thanks for the investigation and explanation.

Wouldn't it be enough to just use a slot with smaller tuple descriptor?
All we'd need to do is creating the descriptor in ExecInitAgg after
calling find_hash_columns, and using it for rslot/wslot, and then
"mapping" the attributes in hashagg_spill_tuple (which already almost
does that, to the extra cost should be 0) and when reading the spilled
tuples. So I'm not quite buying the argument that this would make
measurable difference ...

That being said, I won't insist on fixing this in v13 - at least we know
what the issue is and we can fix it later. The costing seems like a more
serious open item.

OTOH I don't think this example is particularly extreme, and I wouldn't
be surprised if we se even worse examples in practice - tables tend to
be quite wide and aggregation of just a few columns seems likely.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Alvaro Herrera
Date: 04 September 2020, 19:00:26
Subject: Re: report expected contrecord size

From: Fabrízio de Royes Mello
Date: 04 September 2020, 19:10:53
Subject: Re: pg_dump bug for extension owned tables

Re: Disk-based hash aggregate's cost model - Mailing list pgsql-hackers

Previous

Next