Home > mailing lists

Re: Disk-based hash aggregate's cost model - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Disk-based hash aggregate's cost model
Date	September 4, 2020 18:31:36
Msg-id	011877614fa1279c97ce6e897ea2f0dc90124483.camel@j-davis.com Whole thread Raw
In response to	Re: Disk-based hash aggregate's cost model (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: Disk-based hash aggregate's cost model
List	pgsql-hackers

Tree view

On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:
> Those charts show that the CP_SMALL_TLIST resulted in smaller temp
> files
> (per EXPLAIN ANALYZE the difference is ~25%) and also lower query
> durations (also in the ~25% range).

I was able to reproduce the problem, thank you.

Only two attributes are needed, so the CP_SMALL_TLIST projected schema
only needs a single-byte null bitmap.

But if just setting the attributes to NULL rather than projecting them,
the null bitmap size is based on all 16 attributes, bumping the bitmap
size to two bytes.

MAXALIGN(23 + 1) = 24
MAXALIGN(23 + 2) = 32

I think that explains it. It's not ideal, but projection has a cost as
well, so I don't think we necessarily need to do something here.

If we are motivated to improve this in v14, we could potentially have a
different schema for spilled tuples, and perform real projection at
spill time. But I don't know if that's worth the extra complexity.

Regards,
    Jeff Davis

pgsql-hackers by date:

From: Alexey Kondratov
Date: 04 September 2020, 18:31:14
Subject: Re: Global snapshots

From: Andres Freund
Date: 04 September 2020, 18:53:04
Subject: Re: Improving connection scalability: GetSnapshotData()

Re: Disk-based hash aggregate's cost model - Mailing list pgsql-hackers

Previous

Next