Home > mailing lists

Re: Tuplesort merge pre-reading - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Tuplesort merge pre-reading
Date	October 3, 2016 13:39:20
Msg-id	9f7d8bb4-e954-37e4-1db4-ae2702fea4c7@iki.fi Whole thread Raw
In response to	Re: Tuplesort merge pre-reading (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Tuplesort merge pre-reading
List	pgsql-hackers

Tree view

On 09/30/2016 04:08 PM, Peter Geoghegan wrote:
> On Thu, Sep 29, 2016 at 4:10 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Bah, I fumbled the initSlabAllocator() function, attached is a fixed
>> version.
>
> This looks much better. It's definitely getting close. Thanks for
> being considerate of my more marginal concerns. More feedback:

Fixed most of the things you pointed out, thanks.

> * Minor issues with initSlabAllocator():
>
> You call the new function initSlabAllocator() as follows:
>
>> +   if (state->tuples)
>> +       initSlabAllocator(state, numInputTapes + 1);
>> +   else
>> +       initSlabAllocator(state, 0);
>
> Isn't the number of slots (the second argument to initSlabAllocator())
> actually just numInputTapes when we're "state->tuples"? And so,
> shouldn't the "+ 1" bit happen within initSlabAllocator() itself? It
> can just inspect "state->tuples" itself. In short, maybe push a bit
> more into initSlabAllocator(). Making the arguments match those passed
> to initTapeBuffers() a bit later would be nice, perhaps.

The comment above that explains the "+ 1". init_slab_allocator allocates 
the number of slots that was requested, and the caller is responsible 
for deciding how many slots are needed. Yeah, we could remove the 
argument and move the logic altogether into init_slab_allocator(), but I 
think it's clearer this way. Matter of taste, I guess.

> * This could be simpler, I think:
>
>> +   /* Release the read buffers on all the other tapes, by rewinding them. */
>> +   for (tapenum = 0; tapenum < state->maxTapes; tapenum++)
>> +   {
>> +       if (tapenum == state->result_tape)
>> +           continue;
>> +       LogicalTapeRewind(state->tapeset, tapenum, true);
>> +   }
>
> Can't you just use state->tapeRange, and remove the "continue"? I
> recommend referring to "now-exhausted input tapes" here, too.

Don't think so. result_tape == tapeRange only when the merge was done in 
a single pass (or you're otherwise lucky).

> * I'm not completely prepared to give up on using
> MemoryContextAllocHuge() within logtape.c just yet. Maybe you're right
> that it couldn't possibly matter that we impose a MaxAllocSize cap
> within logtape.c (per tape), but I have slight reservations that I
> need to address. Maybe a better way of putting it would be that I have
> some reservations about possible regressions at the very high end,
> with very large workMem. Any thoughts on that?

Meh, I can't imagine that using more than 1 GB for a read-ahead buffer 
could make any difference in practice. If you have a very large 
work_mem, you'll surely get away with a single merge pass, and 
fragmentation won't become an issue. And 1GB should be more than enough 
to trigger OS read-ahead.

Committed with some final kibitzing. Thanks for the review!

PS. This patch didn't fix bug #14344, the premature reuse of memory with 
tuplesort_gettupleslot. We'll still need to come up with 1. a 
backportable fix for that, and 2. perhaps a different fix for master.

- Heikki

pgsql-hackers by date:

From: Stas Kelvich
Date: 03 October 2016, 13:01:57
Subject: Re: WIP: About CMake v2

From: Christoph Berg
Date: 03 October 2016, 13:43:49
Subject: Re: pgbench - allow backslash continuations in \set expressions

Re: Tuplesort merge pre-reading - Mailing list pgsql-hackers

Previous

Next