Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAMkU=1wJ0FmyBpBZ0rj1pP3Dsr0kdfiGst6X7uVQyD7YBXk64Q@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Sat, Dec 12, 2015 at 2:28 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Sat, Dec 12, 2015 at 12:10 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I have a question about the terminology used in this patch.  What is a
>> tuple proper?  What is it in contradistinction to?  I would think that
>> a tuple which is located in its own palloc'ed space is the "proper"
>> one, leaving a tuple allocated in the bulk memory pool to be
>> called...something else.  I don't know what the
>> non-judgmental-sounding antonym of postpositive "proper" is.
>
> "Tuple proper" is a term that appears 5 times in tuplesort.c today. As
> it says at the top of that file:
>
> /*
>  * The objects we actually sort are SortTuple structs.  These contain
>  * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
>  * which is a separate palloc chunk --- we assume it is just one chunk and
>  * can be freed by a simple pfree().  SortTuples also contain the tuple's
>  * first key column in Datum/nullflag format, and an index integer.

Those usages make sense to me, as they are locally self-contained and
it is clear what they are in contradistinction to.   But your usage is
spread throughout (even in function names, not just comments) and
seems to contradict the current usage as yours are not separately
palloced, as the "proper" ones described here are.  I think that
"proper" only works when the same comment also defines the
alternative, rather than as some file-global description.  Maybe
"pooltuple" rather than "tupleproper"


>
>> Also, if I am reading this correctly, when we refill a pool from a
>> logical tape we still transform each tuple as it is read from the disk
>> format to the memory format.  This inflates the size quite a bit, at
>> least for single-datum tuples.  If we instead just read the disk
>> format directly into the pool, and converted them into the in-memory
>> format when each tuple came due for the merge heap, would that destroy
>> the locality of reference you are seeking to gain?
>
> Are you talking about alignment?

Maybe alignment, but also the size of the SortTuple struct itself,
which is not present on tape but is present in memory if I understand
correctly.

When reading 128kb (32 blocks) worth of in-memory pool, it seems like
it only gets to read 16 to 18 blocks of tape to fill them up, in the
case of building an index on single column 32-byte random md5 digests.
I don't exactly know where all of that space goes, I'm taking an
experimentalist approach.

Cheers,

Jeff



pgsql-hackers by date:

Previous
From: Paul Ramsey
Date:
Subject: [PATCH] PostGIS Doc Urls
Next
From: David Fetter
Date:
Subject: Re: [sqlsmith] Failed to generate plan on lateral subqueries