Re: Disk-based hash aggregate's cost model - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Disk-based hash aggregate's cost model
Date
Msg-id 011877614fa1279c97ce6e897ea2f0dc90124483.camel@j-davis.com
Whole thread Raw
In response to Re: Disk-based hash aggregate's cost model  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Disk-based hash aggregate's cost model
List pgsql-hackers
On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:
> Those charts show that the CP_SMALL_TLIST resulted in smaller temp
> files
> (per EXPLAIN ANALYZE the difference is ~25%) and also lower query
> durations (also in the ~25% range).

I was able to reproduce the problem, thank you.

Only two attributes are needed, so the CP_SMALL_TLIST projected schema
only needs a single-byte null bitmap.

But if just setting the attributes to NULL rather than projecting them,
the null bitmap size is based on all 16 attributes, bumping the bitmap
size to two bytes.

MAXALIGN(23 + 1) = 24
MAXALIGN(23 + 2) = 32

I think that explains it. It's not ideal, but projection has a cost as
well, so I don't think we necessarily need to do something here.

If we are motivated to improve this in v14, we could potentially have a
different schema for spilled tuples, and perform real projection at
spill time. But I don't know if that's worth the extra complexity.

Regards,
    Jeff Davis





pgsql-hackers by date:

Previous
From: Alexey Kondratov
Date:
Subject: Re: Global snapshots
Next
From: Andres Freund
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()