On Thu, Jun 25, 2020 at 02:28:02PM -0700, Jeff Davis wrote:
>On Thu, 2020-06-25 at 15:56 -0400, Bruce Momjian wrote:
>> It is my understanding that spill of sorts is mostly read
>> sequentially,
>> while hash reads are random. Is that right? Is that not being
>> costed
>> properly?
>
>I don't think there's a major problem with the cost model, but it could
>probably use some tweaking.
>
>Hash writes are random. The hash reads should be mostly sequential (for
>large partitions it will be 128-block extents, or 1MB). The cost model
>assumes 50% sequential and 50% random.
>
The important bit here is that while the logical writes are random,
those are effectively combined in page cache and the physical writes are
pretty sequential. So I think the cost model is fairly reasonable.
Note: Judging by iosnoop stats shared in the thread linked by Jeff.
>Sorts are written sequentially and read randomly, but there's
>prefetching to keep the reads from being too random. The cost model
>assumes 75% sequential and 25% random.
>
>Overall, the IO pattern is better for Sort, but not dramatically so.
>Tomas Vondra did some nice analysis here:
>
>
>https://www.postgresql.org/message-id/20200525021045.dilgcsmgiu4l5jpa@development
>
>That resulted in getting the prealloc and projection patches in.
>
>Regards,
> Jeff Davis
>
>
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services