pgsql: Speed up sort-order-comparison tests in create_index_spgist. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Speed up sort-order-comparison tests in create_index_spgist.
Date
Msg-id E1hEgpP-0004jj-Ec@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Speed up sort-order-comparison tests in create_index_spgist.

This test script verifies that KNN searches of an SP-GiST index
produce the same sort order as a seqscan-and-sort.  The FULL JOINs
used for that are exceedingly slow, however.  Investigation shows
that the problem is that the initial join is on the rank() values,
and we have a lot of duplicates due to the data set containing 1000
duplicate points.  We're therefore going to produce 1000000 join
rows that have to be thrown away again by the join filter.

We can improve matters by using row_number() instead of rank(),
so that the initial join keys are unique.  The catch is that
that makes the results sensitive to the sorting of rows with
equal distances from the reference point.  That doesn't matter
for the actually-equal points, but as luck would have it, the
data set also contains two distinct points that have identical
distances to the origin.  So those two rows could legitimately
appear in either order, causing unwanted output from the check
queries.

However, it doesn't seem like it's the job of this test to
check whether the <-> operator correctly computes distances;
its charter is just to verify that SP-GiST emits the values
in distance order.  So we can dodge the indeterminacy problem
by having the check only compare row numbers and distances
not the actual point values.

This change reduces the run time of create_index_spgist by a good
three-quarters, on my machine, with ensuing beneficial effects on
the runtime of create_index (thanks to interactions with CREATE
INDEX CONCURRENTLY tests in the latter).  I see a net improvement
of more than 2X in the runtime of their parallel test group.

Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/5874c7055702e1cf5e58543f11dfcff6de2cc260

Modified Files
--------------
src/test/regress/expected/create_index_spgist.out | 54 ++++++++++-------------
src/test/regress/sql/create_index_spgist.sql      | 54 ++++++++++-------------
2 files changed, 48 insertions(+), 60 deletions(-)


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Split up a couple of long-running regression test scripts.
Next
From: Tom Lane
Date:
Subject: pgsql: Re-order some regression test scripts for more parallelism.