Home > mailing lists

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Default setting for enable_hashagg_disk
Date	June 25, 2020 21:28:02
Msg-id	e43828bbf4c1dc81ce8a38e9b769fd7bda45d5b4.camel@j-davis.com Whole thread Raw
In response to	Re: Default setting for enable_hashagg_disk (Bruce Momjian <bruce@momjian.us>)
Responses	Re: Default setting for enable_hashagg_disk
List	pgsql-hackers

Tree view

On Thu, 2020-06-25 at 15:56 -0400, Bruce Momjian wrote:
> It is my understanding that spill of sorts is mostly read
> sequentially,
> while hash reads are random.  Is that right?  Is that not being
> costed
> properly?

I don't think there's a major problem with the cost model, but it could
probably use some tweaking.

Hash writes are random. The hash reads should be mostly sequential (for
large partitions it will be 128-block extents, or 1MB). The cost model
assumes 50% sequential and 50% random.

Sorts are written sequentially and read randomly, but there's
prefetching to keep the reads from being too random. The cost model
assumes 75% sequential and 25% random.

Overall, the IO pattern is better for Sort, but not dramatically so.
Tomas Vondra did some nice analysis here:

https://www.postgresql.org/message-id/20200525021045.dilgcsmgiu4l5jpa@development

That resulted in getting the prealloc and projection patches in.

Regards,
    Jeff Davis

pgsql-hackers by date:

From: Alastair McKinley
Date: 25 June 2020, 20:47:30
Subject: Re: CUBE_MAX_DIM

From: Melanie Plageman
Date: 25 June 2020, 22:09:44
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

Previous

Next