Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200625231154.uknvrnqwmahv6rtu@development
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Thu, Jun 25, 2020 at 02:28:02PM -0700, Jeff Davis wrote:
>On Thu, 2020-06-25 at 15:56 -0400, Bruce Momjian wrote:
>> It is my understanding that spill of sorts is mostly read
>> sequentially,
>> while hash reads are random.  Is that right?  Is that not being
>> costed
>> properly?
>
>I don't think there's a major problem with the cost model, but it could
>probably use some tweaking.
>
>Hash writes are random. The hash reads should be mostly sequential (for
>large partitions it will be 128-block extents, or 1MB). The cost model
>assumes 50% sequential and 50% random.
>

The important bit here is that while the logical writes are random,
those are effectively combined in page cache and the physical writes are
pretty sequential. So I think the cost model is fairly reasonable.

Note: Judging by iosnoop stats shared in the thread linked by Jeff.

>Sorts are written sequentially and read randomly, but there's
>prefetching to keep the reads from being too random. The cost model
>assumes 75% sequential and 25% random.
>
>Overall, the IO pattern is better for Sort, but not dramatically so.
>Tomas Vondra did some nice analysis here:
>
>
>https://www.postgresql.org/message-id/20200525021045.dilgcsmgiu4l5jpa@development
>
>That resulted in getting the prealloc and projection patches in.
>
>Regards,
>    Jeff Davis
>
>

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Tomas Vondra
Date:
Subject: Re: Default setting for enable_hashagg_disk