Re: Re: Abbreviated keys for Datum tuplesort - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Re: Abbreviated keys for Datum tuplesort
Date
Msg-id 87wq2knxf2.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Re: Abbreviated keys for Datum tuplesort  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Re: Abbreviated keys for Datum tuplesort
List pgsql-hackers
>>>>> "Peter" == Peter Geoghegan <pg@heroku.com> writes:
Peter> I attach a slightly tweaked version of Andrew's original.

You changed this:
static intcomparetup_datum(const SortTuple *a, const SortTuple *b, Tuplesortstate *state){
-    int         compare;
+    int32       compare;    compare = ApplySortComparator(a->datum1, a->isnull1,

Since ApplySortComparator returns int, and "compare" is used to store
the return value of comparetup_datum which is also declared int, this
seems inappropriate even as a "stylistic tweak".

Also, your changes to the block comment for SortTuple now hide the fact
that datum1 is potentially the abbreviated value in tuple as well as
single-Datum cases.  Here are the versions for comparison (mine is
first):

***************
*** 136,175 **** /*  * The objects we actually sort are SortTuple structs.  These contain  * a pointer to the tuple
proper(might be a MinimalTuple or IndexTuple),  * which is a separate palloc chunk --- we assume it is just one chunk
and * can be freed by a simple pfree().  SortTuples also contain the tuple's  * first key column in Datum/nullflag
format,and an index integer.  *  * Storing the first key column lets us save heap_getattr or index_getattr  * calls
duringtuple comparisons.  We could extract and save all the key  * columns not just the first, but this would increase
codecomplexity and  * overhead, and wouldn't actually save any comparison cycles in the common  * case where the first
keydetermines the comparison result.  Note that  * for a pass-by-reference datatype, datum1 points into the "tuple"
storage. *
 
-  * There is one special case: when the sort support infrastructure provides an
-  * "abbreviated key" representation, where the key is (typically) a pass by
-  * value proxy for a pass by reference type.  In this case, the abbreviated key
-  * is stored in datum1 in place of the actual first key column.
-  *  * When sorting single Datums, the data value is represented directly by
!  * datum1/isnull1 for pass by value types (or null values).  If the datatype is
!  * pass-by-reference and isnull1 is false, then "tuple" points to a separately
!  * palloc'd data value, otherwise "tuple" is NULL.  The value of datum1 is then
!  * either the same pointer as "tuple", or is an abbreviated key value as
!  * described above.  Accordingly, "tuple" is always used in preference to
!  * datum1 as the authoritative value for pass-by-reference cases.  *  * While building initial runs, tupindex holds
thetuple's run number.  During  * merge passes, we re-use it to hold the input tape number that each tuple in  * the
heapwas read from, or to hold the index of the next tuple pre-read  * from the same tape in the case of pre-read
entries. tupindex goes unused  * if the sort occurs entirely in memory.  */ typedef struct {     void       *tuple;
    /* the tuple proper */     Datum       datum1;         /* value of first key column */     bool        isnull1;
  /* is first key column NULL? */     int         tupindex;       /* see notes above */ } SortTuple;
 
--- 136,170 ---- /*  * The objects we actually sort are SortTuple structs.  These contain  * a pointer to the tuple
proper(might be a MinimalTuple or IndexTuple),  * which is a separate palloc chunk --- we assume it is just one chunk
and * can be freed by a simple pfree().  SortTuples also contain the tuple's  * first key column in Datum/nullflag
format,and an index integer.  *  * Storing the first key column lets us save heap_getattr or index_getattr  * calls
duringtuple comparisons.  We could extract and save all the key  * columns not just the first, but this would increase
codecomplexity and  * overhead, and wouldn't actually save any comparison cycles in the common  * case where the first
keydetermines the comparison result.  Note that  * for a pass-by-reference datatype, datum1 points into the "tuple"
storage. *  * When sorting single Datums, the data value is represented directly by
 
!  * datum1/isnull1.  If the datatype is pass-by-reference and isnull1 is false,
!  * then datum1 points to a separately palloc'd data value that is also pointed
!  * to by the "tuple" pointer; otherwise "tuple" is NULL.  There is one special
!  * case:  when the sort support infrastructure provides an "abbreviated key"
!  * representation, where the key is (typically) a pass by value proxy for a
!  * pass by reference type.  *  * While building initial runs, tupindex holds the tuple's run number.  During  * merge
passes,we re-use it to hold the input tape number that each tuple in  * the heap was read from, or to hold the index of
thenext tuple pre-read  * from the same tape in the case of pre-read entries.  tupindex goes unused  * if the sort
occursentirely in memory.  */ typedef struct {     void       *tuple;          /* the tuple proper */     Datum
datum1;        /* value of first key column */     bool        isnull1;        /* is first key column NULL? */     int
      tupindex;       /* see notes above */ } SortTuple;
 



-- 
Andrew (irc:RhodiumToad)



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Performance improvement for joins where outer side is unique
Next
From: Andreas Karlsson
Date:
Subject: Re: Using 128-bit integers for sum, avg and statistics aggregates