Home > mailing lists

Re: Trouble with hashagg spill I/O pattern and costing - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Trouble with hashagg spill I/O pattern and costing
Date	May 26, 2020 19:15:11
Msg-id	20200526191511.d7pwz4awrvdm4p6j@development Whole thread Raw
In response to	Re: Trouble with hashagg spill I/O pattern and costing (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Trouble with hashagg spill I/O pattern and costing
List	pgsql-hackers

Tree view

On Tue, May 26, 2020 at 11:40:07AM -0700, Jeff Davis wrote:
>On Tue, 2020-05-26 at 16:15 +0200, Tomas Vondra wrote:
>> I'm not familiar with logtape internals but IIRC the blocks are
>> linked
>> by each block having a pointer to the prev/next block, which means we
>> can't prefetch more than one block ahead I think. But maybe I'm
>> wrong,
>> or maybe fetching even just one block ahead would help ...
>
>We'd have to get creative. Keeping a directory in the LogicalTape
>structure might work, but I'm worried the memory requirements would be
>too high.
>
>One idea is to add a "prefetch block" to the TapeBlockTrailer (perhaps
>only in the forward direction?). We could modify the prealloc list so
>that we always know the next K blocks that will be allocated to the
>tape. All for v14, of course, but I'd be happy to hack together a
>prototype to collect data.
>

Yeah. I agree prefetching is definitely out of v13 scope. It might be
interesting to try how useful would it be, if you're willing to spend
some time on a prototype.

>
>Do you have any other thoughts on the current prealloc patch for v13,
>or is it about ready for commit?
>

I think it's pretty much ready to go.

I have some some doubts about the maximum value (128 probably means
read-ahead values above 256 are probably pointless, although I have not
tested that). But it's still a huge improvement with 128, so let's get
that committed.

I've been thinking about actually computing the expected number of
blocks per tape, and tying the maximum to that, somehow. But that's
something we can look at in the future.

As for the tlist fix, I think that's mostly ready too - the one thing we
should do is probably only doing it for AGG_HASHED. For AGG_SORTED it's
not really necessary.

iregards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Jeff Davis
Date: 26 May 2020, 18:40:07
Subject: Re: Trouble with hashagg spill I/O pattern and costing

From: Alvaro Herrera
Date: 26 May 2020, 19:17:10
Subject: Re: hash join error improvement (old)

Re: Trouble with hashagg spill I/O pattern and costing - Mailing list pgsql-hackers

Previous

Next