Re: [HACKERS] Small improvement to compactify_tuples - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Small improvement to compactify_tuples
Date
Msg-id 21392.1509677196@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Small improvement to compactify_tuples  (Sokolov Yura <funny.falcon@postgrespro.ru>)
Responses Re: [HACKERS] Small improvement to compactify_tuples  (Claudio Freire <klaussfreire@gmail.com>)
Re: [HACKERS] Small improvement to compactify_tuples  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] Small improvement to compactify_tuples  (Юрий Соколов <funny.falcon@gmail.com>)
List pgsql-hackers
Sokolov Yura <funny.falcon@postgrespro.ru> writes:
> [ 0001-Improve-compactify_tuples.patch, v5 or thereabouts ]

I started to review this patch.  I spent a fair amount of time on
beautifying the code, because I found it rather ugly and drastically
undercommented.  Once I had it to the point where it seemed readable,
I went to check the shellsort algorithm against Wikipedia's entry,
and found that this appears to be an incorrect implementation of
shellsort: where pg_shell_sort_pass has

        for (_i = off; _i < _n; _i += off) \

it seems to me that we need to have

        for (_i = off; _i < _n; _i += 1) \

or maybe just _i++.  As-is, this isn't h-sorting the whole file,
but just the subset of entries that have multiple-of-h indexes
(ie, the first of the h distinct subfiles that should get sorted).
The bug is masked by the final pass of plain insertion sort, but
we are not getting the benefit we should get from the earlier passes.

However, I'm a bit dubious that it's worth fixing that; instead
my inclination would be to rip out the shellsort implementation
entirely.  The code is only using it for the nitems <= 48 case
(which makes the first three offset steps certainly no-ops) and
I am really unconvinced that it's worth expending the code space
for a shellsort rather than plain insertion sort in that case,
especially when we have good reason to think that the input data
is nearly sorted.

BTW, the originally given test case shows no measurable improvement
on my box.  I was eventually able to convince myself by profiling
that the patch makes us spend less time in compactify_tuples, but
this test case isn't a very convincing one.

So, quite aside from the bug, I'm not excited about committing the
attached as-is.  I think we should remove pg_shell_sort and just
use pg_insertion_sort.  If somebody can show a test case that
provides a measurable speed improvement from the extra code,
I could be persuaded to reconsider.

I also wonder if the nitems <= 48 cutoff needs to be reconsidered
in light of this.  But since I can hardly measure any benefit from
the patch at all, I'm not in the best position to test different
values for that cutoff.

Have not looked at the 0002 patch yet.

            regards, tom lane

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 41642eb..1af1b85 100644
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
***************
*** 18,23 ****
--- 18,24 ----
  #include "access/itup.h"
  #include "access/xlog.h"
  #include "storage/checksum.h"
+ #include "utils/inline_sort.h"
  #include "utils/memdebug.h"
  #include "utils/memutils.h"

*************** typedef struct itemIdSortData
*** 425,439 ****
  } itemIdSortData;
  typedef itemIdSortData *itemIdSort;

! static int
  itemoffcompare(const void *itemidp1, const void *itemidp2)
  {
-     /* Sort in decreasing itemoff order */
      return ((itemIdSort) itemidp2)->itemoff -
          ((itemIdSort) itemidp1)->itemoff;
  }

  /*
   * After removing or marking some line pointers unused, move the tuples to
   * remove the gaps caused by the removed items.
   */
--- 426,542 ----
  } itemIdSortData;
  typedef itemIdSortData *itemIdSort;

! /* Comparator for sorting in decreasing itemoff order */
! static inline int
  itemoffcompare(const void *itemidp1, const void *itemidp2)
  {
      return ((itemIdSort) itemidp2)->itemoff -
          ((itemIdSort) itemidp1)->itemoff;
  }

  /*
+  * Sort an array of itemIdSort's on itemoff, descending.
+  *
+  * This uses Shell sort.  Given that array is small and itemoffcompare
+  * can be inlined, it is much faster than general-purpose qsort.
+  */
+ static void
+ sort_itemIds_small(itemIdSort itemidbase, int nitems)
+ {
+     pg_shell_sort(itemIdSortData, itemidbase, nitems, itemoffcompare);
+ }
+
+ /*
+  * Sort an array of itemIdSort's on itemoff, descending.
+  *
+  * This uses bucket sort:
+  * - single pass of stable prefix sort on high 8 bits of itemoffs
+  * - then insertion sort on buckets larger than 1 element
+  */
+ static void
+ sort_itemIds(itemIdSort itemidbase, int nitems)
+ {
+     /* number of buckets to use: */
+ #define NSPLIT 256
+     /* divisor to scale input values into 0..NSPLIT-1: */
+ #define PREFDIV (BLCKSZ / NSPLIT)
+     /* per-bucket counts; we need two extra elements, see below */
+     uint16        count[NSPLIT + 2];
+     itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)];
+     int            i,
+                 max,
+                 total,
+                 pos,
+                 highbits;
+
+     Assert(nitems <= lengthof(copy));
+
+     /*
+      * Count how many items in each bucket.  We assume all itemoff values are
+      * less than BLCKSZ, therefore dividing by PREFDIV gives a value less than
+      * NSPLIT.
+      */
+     memset(count, 0, sizeof(count));
+     for (i = 0; i < nitems; i++)
+     {
+         highbits = itemidbase[i].itemoff / PREFDIV;
+         count[highbits]++;
+     }
+
+     /*
+      * Now convert counts to bucket position info, placing the buckets in
+      * decreasing order.  After this loop, count[k+1] is start of bucket k
+      * (for 0 <= k < NSPLIT), count[k] is end+1 of bucket k, and therefore
+      * count[k] - count[k+1] is length of bucket k.
+      *
+      * Also detect whether any buckets have more than one element.  For this
+      * purpose, "max" is set to the OR of all the counts (not really the max).
+      */
+     max = total = count[NSPLIT - 1];
+     for (i = NSPLIT - 2; i >= 0; i--)
+     {
+         max |= count[i];
+         total += count[i];
+         count[i] = total;
+     }
+     Assert(count[0] == nitems);
+
+     /*
+      * Now copy the data to be sorted into appropriate positions in the copy[]
+      * array.  We increment each bucket-start pointer as we insert data into
+      * its bucket; hence, after this loop count[k+1] is the end+1 of bucket k,
+      * count[k+2] is the start of bucket k, and count[k+1] - count[k+2] is the
+      * length of bucket k.
+      */
+     for (i = 0; i < nitems; i++)
+     {
+         highbits = itemidbase[i].itemoff / PREFDIV;
+         pos = count[highbits + 1]++;
+         copy[pos] = itemidbase[i];
+     }
+     Assert(count[1] == nitems);
+
+     /*
+      * If any buckets are larger than 1 item, we must sort them.  They should
+      * be small enough to make insertion sort effective.
+      */
+     if (max > 1)
+     {
+         /* i is bucket number plus 1 */
+         for (i = NSPLIT; i > 0; i--)
+         {
+             pg_insertion_sort(itemIdSortData,
+                               copy + count[i + 1],
+                               count[i] - count[i + 1],
+                               itemoffcompare);
+         }
+     }
+
+     /* And transfer the sorted data back to the caller */
+     memcpy(itemidbase, copy, sizeof(itemIdSortData) * nitems);
+ }
+
+ /*
   * After removing or marking some line pointers unused, move the tuples to
   * remove the gaps caused by the removed items.
   */
*************** compactify_tuples(itemIdSort itemidbase,
*** 445,452 ****
      int            i;

      /* sort itemIdSortData array into decreasing itemoff order */
!     qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
!           itemoffcompare);

      upper = phdr->pd_special;
      for (i = 0; i < nitems; i++)
--- 548,558 ----
      int            i;

      /* sort itemIdSortData array into decreasing itemoff order */
!     /* empirically, bucket sort is worth the trouble above 48 items */
!     if (nitems > 48)
!         sort_itemIds(itemidbase, nitems);
!     else
!         sort_itemIds_small(itemidbase, nitems);

      upper = phdr->pd_special;
      for (i = 0; i < nitems; i++)
diff --git a/src/include/utils/inline_sort.h b/src/include/utils/inline_sort.h
index ...c97a248 .
*** a/src/include/utils/inline_sort.h
--- b/src/include/utils/inline_sort.h
***************
*** 0 ****
--- 1,88 ----
+ /*-------------------------------------------------------------------------
+  *
+  * inline_sort.h
+  *      Macros to perform specialized types of sorts.
+  *
+  *
+  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * src/include/utils/inline_sort.h
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef INLINE_SORT_H
+ #define INLINE_SORT_H
+
+ /*
+  * pg_shell_sort - sort for small arrays with inlinable comparator.
+  *
+  * This is best used with arrays smaller than 200 elements, and could be
+  * safely used with up to 1000 elements.  But it degrades fast after that.
+  *
+  * Since this is implemented as a macro it can be optimized together with
+  * comparison function; using a macro or inlinable function is recommended.
+  *
+  * Arguments:
+  *     elem_t - type of array elements (for declaring temporary variables)
+  *     array    - pointer to elements to be sorted
+  *     nitems - number of elements to be sorted
+  *     cmp    - comparison function that accepts addresses of 2 elements
+  *              (same API as qsort comparison function).
+  * cmp argument should be a function or macro name.
+  * array and nitems arguments are evaluated only once.
+  *
+  * This uses Shellsort (see e.g. wikipedia's entry), with gaps selected as
+  * "gap(i) = smallest prime number below e^i".  These are close to the gaps
+  * recommended by Incerpi & Sedwick, but look to be better on average.
+  */
+ #define pg_shell_sort(elem_t, array, nitems, cmp) \
+     do { \
+         elem_t *_arr = (array); \
+         int        _n = (nitems); \
+         static const int _offsets[] = {401, 139, 53, 19, 7, 3}; \
+         int        _noff; \
+         for (_noff = 0; _noff < lengthof(_offsets); _noff++) \
+         { \
+             int        _off = _offsets[_noff]; \
+             pg_shell_sort_pass(elem_t, cmp, _off, _arr, _n); \
+         } \
+         pg_shell_sort_pass(elem_t, cmp, 1, _arr, _n); \
+     } while (0)
+
+ /*
+  * pg_insertion_sort - plain insertion sort.
+  * Useful for very small array, or if array was almost sorted already.
+  * Same API as pg_shell_sort.
+  */
+ #define pg_insertion_sort(elem_t, array, nitems, cmp) \
+     do { \
+         elem_t *_arr = (array); \
+         int        _n = (nitems); \
+         pg_shell_sort_pass(elem_t, cmp, 1, _arr, _n); \
+     } while (0)
+
+ /*
+  * One pass of Shellsort: simple insertion sort of the subset of entries
+  * at stride "off".  Not intended to be used outside of above macros.
+  */
+ #define pg_shell_sort_pass(elem_t, cmp, off, _arr, _n) \
+     do { \
+         int        _i; \
+         for (_i = off; _i < _n; _i += off) \
+         { \
+             if (cmp(_arr + _i - off, _arr + _i) > 0) \
+             { \
+                 elem_t    _temp = _arr[_i]; \
+                 int        _j = _i; \
+                 do \
+                 { \
+                     _arr[_j] = _arr[_j - off]; \
+                     _j -= off; \
+                 } while (_j >= off && cmp(_arr + _j - off, &_temp) > 0); \
+                 _arr[_j] = _temp; \
+             } \
+         } \
+     } while (0)
+
+ #endif                            /* INLINE_SORT_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: [HACKERS] Skip unneeded temp file in 'make html'
Next
From: David Rowley
Date:
Subject: Re: [HACKERS] path toward faster partition pruning