Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200720004827.4c46ikcdrfum6xyu@development
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Default setting for enable_hashagg_disk
List pgsql-hackers
On Sun, Jul 19, 2020 at 02:17:15PM -0700, Jeff Davis wrote:
>On Sat, 2020-07-18 at 21:15 -0400, Tom Lane wrote:
>> Jeff Davis <pgsql@j-davis.com> writes:
>> > What is your opinion about pessimizing the HashAgg disk costs (not
>> > affecting HashAgg plans expected to stay in memory)? Tomas Vondra
>> > presented some evidence that Sort had some better IO patterns in
>> > some
>> > cases that weren't easily reflected in a principled way in the cost
>> > model.
>>
>> Hm, was that in some other thread?  I didn't find any such info
>> in a quick look through this one.
>
>
>https://www.postgresql.org/message-id/2df2e0728d48f498b9d6954b5f9080a34535c385.camel%40j-davis.com
>

FWIW the two messages to look at are these two:

1) report with initial data
https://www.postgresql.org/message-id/20200519151202.u2p2gpiawoaznsv2%40development

2) updated stats, with the block pre-allocation and tlist projection
https://www.postgresql.org/message-id/20200521001255.kfaihp3afv6vy6uq%40development

But I'm not convinced we actually need to tweak the costing - we've
ended up fixing two things, and I think a lot of the differences in I/O
patterns disappeared thanks to this.

For sort, the stats of request sizes look like this:

       type |  bytes  | count |   pct
      ------+---------+-------+-------
       RA   |  131072 | 26034 | 59.92
       RA   |   16384 |  6160 | 14.18
       RA   |    8192 |  3636 |  8.37
       RA   |   32768 |  3406 |  7.84
       RA   |   65536 |  3270 |  7.53
       RA   |   24576 |   361 |  0.83
       ...
       W    | 1310720 |  8070 | 34.26
       W    |  262144 |  1213 |  5.15
       W    |  524288 |  1056 |  4.48
       W    | 1056768 |   689 |  2.93
       W    |  786432 |   292 |  1.24
       W    |  802816 |   199 |  0.84
       ...

And for the hashagg, it looks like this:

       type |  bytes  | count  |  pct
      ------+---------+--------+--------
       RA   |  131072 | 200816 |  70.93
       RA   |    8192 |  23640 |   8.35
       RA   |   16384 |  19324 |   6.83
       RA   |   32768 |  19279 |   6.81
       RA   |   65536 |  19273 |   6.81
       ...
       W    | 1310720 |  18000 |  65.91
       W    |  524288 |   2074 |   7.59
       W    | 1048576 |    660 |   2.42
       W    |    8192 |    409 |   1.50
       W    |  786432 |    354 |   1.30
       ...

so it's actually a tad better than sort, because larger proportion of
both reads and writes is in larger chunks (reads 128kB, writes 1280kB).
I think the device had default read-ahead setting, which I assume
explains the 128kB.

For the statistics of deltas between requests - for sort

       type | block_delta | count |   pct
      ------+-------------+-------+-------
       RA   |         256 | 13432 | 30.91
       RA   |          16 |  3291 |  7.57
       RA   |          32 |  3272 |  7.53
       RA   |          64 |  3266 |  7.52
       RA   |         128 |  2877 |  6.62
       RA   |        1808 |  1278 |  2.94
       RA   |       -2320 |   483 |  1.11
       RA   |       28928 |   386 |  0.89
       ...
       W    |        2560 |  7856 | 33.35
       W    |        2064 |  4921 | 20.89
       W    |        2080 |   586 |  2.49
       W    |       30960 |   300 |  1.27
       W    |        2160 |   253 |  1.07
       W    |        1024 |   248 |  1.05
       ...

and for hashagg:

       type | block_delta | count  |  pct
      ------+-------------+--------+-------
       RA   |         256 | 180955 | 63.91
       RA   |          32 |  19274 |  6.81
       RA   |          64 |  19273 |  6.81
       RA   |         128 |  19264 |  6.80
       RA   |          16 |  19203 |  6.78
       RA   |       30480 |   9835 |  3.47

At first this might look worse than sort, but 256 sectors matches the
128kB from the request size stats, and it's good match (64% vs. 70%).


There's a minor problem here, though - these stats were collected before
we fixed the tlist issue, so hashagg was spilling about 10x the amount
of data compared to sort+groupagg. So maybe that's the first thing we
should do, before contemplating changes to the costing - collecting
fresh data. I can do that, if needed.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_subscription.subslotname is wrongly marked NOT NULL
Next
From: Greg Nancarrow
Date:
Subject: Re: Libpq support to connect to standby server as priority