Use quick select instead of qsort to get median - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject Use quick select instead of qsort to get median
Date
Msg-id OS0PR01MB5716A0A4B9CFC4D8A662C64994E49@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
Responses Re: Use quick select instead of qsort to get median  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
Hi,

When I was writing an extension which need to get the median of an array, I
tried to find if postgres provide some api that can do that. I found all the
places in postgres invoke qsort() and then get the median. I was thinking can
we do better by using "quick select" and is it worth it.

Currently, there are some places[1] in the code that need the median and can
use "quick select" instead. And some of them(spg_box_quad_picksplit /
spg_range_quad_picksplit) are invoked frequently when INSERT or CREATE INDEX.
So, Peronally, It's acceptable to introduce a quick select api to improve these
places.

Since most of the logic of "quick select" is similar to quick sort, I think
we can reuse the code in sort_template.h. We only need to let the sort stop
when find the target top Kth element.

Attach a POC patch about this idea. I did some simple performance tests, I can
see about 10% performance gain in this test[2].

Thoughts ?

[1]
1.
entry_dealloc
    ...
    /* Record the (approximate) median usage */
    if (i > 0)
        pgss->cur_median_usage = entries[i / 2]->counters.usage;
2.
spg_box_quad_picksplit
    qsort(lowXs, in->nTuples, sizeof(float8), compareDoubles);
    ...
    centroid->low.x = lowXs[median];

3.
spg_range_quad_picksplit
    qsort(lowerBounds, nonEmptyCount, sizeof(RangeBound),
    ...
    centroid = range_serialize(typcache, &lowerBounds[median],

4.
spg_quad_picksplit
    qsort(sorted, in->nTuples, sizeof(*sorted), x_cmp);
    ...
    centroid->x = sorted[median]->x;



[2]
drop table quad_box_tbl;
CREATE unlogged TABLE quad_box_tbl (id int, b box);
truncate quad_box_tbl ;
explain (verbose, analyze)INSERT INTO quad_box_tbl
  SELECT (x - 1) * 10 + x, box(point(x * 10, x * 20), point(x * 10, x * 20 + 5))
  FROM generate_series(1, 1000000) x order by random();

-----test create index
drop index quad_box_tbl_idx;
CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b);

-------test results
PATCH:
Time: 2609.664 ms (00:02.610)

HEAD:
Time: 2903.765 ms (00:02.944)

Best regards,
Houzj


Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: window build doesn't apply PG_CPPFLAGS correctly
Next
From: Pavel Stehule
Date:
Subject: Re: window build doesn't apply PG_CPPFLAGS correctly