Re: Trouble with hashagg spill I/O pattern and costing - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Trouble with hashagg spill I/O pattern and costing
Date
Msg-id 20200521214122.6opovye4mnbqczlc@development
Whole thread Raw
In response to Re: Trouble with hashagg spill I/O pattern and costing  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Trouble with hashagg spill I/O pattern and costing
List pgsql-hackers
On Thu, May 21, 2020 at 02:16:37PM -0700, Jeff Davis wrote:
>On Thu, 2020-05-21 at 21:13 +0200, Tomas Vondra wrote:
>> 2) We could make it self-tuning, by increasing the number of blocks
>> we pre-allocate. So every time we exhaust the range, we double the
>> number of blocks (with a reasonable maximum, like 1024 or so). Or we
>> might just increment it by 32, or something.
>
>Attached a new version that uses the doubling behavior, and cleans it
>up a bit. It also returns the unused prealloc blocks back to lts-
>freeBlocks when the tape is rewound for reading.
>

Ah, the returning is a nice idea, that should limit the overhead quite a
bit, I think.

>> IIUC the danger of pre-allocating blocks is that we might not fill
>> them,
>> resulting in temp file much larger than necessary. It might be
>> harmless
>> on some (most?) current filesystems that don't actually allocate
>> space
>> for blocks that are never written, but it also confuses our
>> accounting
>> of temporary file sizes. So we should try to limit that, and growing
>> the
>> number of pre-allocated blocks over time seems reasonable.
>
>There's another danger here: it doesn't matter how well the filesystem
>deals with sparse writes, because ltsWriteBlock fills in the holes with
>zeros anyway. That's potentially a significant amount of wasted IO
>effort if we aren't careful.
>

True. I'll give it a try on both machines and report some numbers. Might
take a couple of days.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Piotr Stefaniak
Date:
Subject: Re: pgindent && weirdness
Next
From: David Rowley
Date:
Subject: Re: Parallel Seq Scan vs kernel read ahead