[PATCH] Add sortsupport for range types and btree_gist - Mailing list pgsql-hackers

From Christoph Heiss
Subject [PATCH] Add sortsupport for range types and btree_gist
Date
Msg-id 437ccbcf-8f80-2919-411d-a3af88becf6c@cybertec.at
Whole thread Raw
Responses Re: [PATCH] Add sortsupport for range types and btree_gist
Re: [PATCH] Add sortsupport for range types and btree_gist
Re: [PATCH] Add sortsupport for range types and btree_gist
List pgsql-hackers
Hi all!

The motivation behind this is that incrementally building up a GiST 
index for certain input data can create a terrible tree structure.
Furthermore, exclusion constraints are commonly implemented using GiST 
indices and for that use case, data is mostly orderable.

By sorting the data before inserting it into the index results in a much 
better index structure, leading to significant performance improvements.

Testing was done using following setup, with about 50 million rows:

    CREATE EXTENSION btree_gist;
    CREATE TABLE t (id uuid, block_range int4range);
    CREATE INDEX ON before USING GIST (id, block_range);
    COPY t FROM '..' DELIMITER ',' CSV HEADER;

using

    SELECT * FROM t WHERE id = '..' AND block_range && '..'

as test query, using a unpatched instance and one with the patch applied.

Some stats for fetching 10,000 random rows using the query above,
100 iterations to get good averages.

The benchmarking was done on a unpatched instance compiled using the 
exact same options as with the patch applied.
[ Results are noted in a unpatched -> patched fashion. ]

First set of results are after the initial CREATE TABLE, CREATE INDEX 
and a COPY to the table, thereby incrementally building the index.

Shared Hit Blocks (average): 110.97 -> 78.58
Shared Read Blocks (average): 58.90 -> 47.42
Execution Time (average): 1.10 -> 0.83 ms
I/O Read Time (average): 0.19 -> 0.15 ms

After a REINDEX on the table, the results improve even more:

Shared Hit Blocks (average): 84.24 -> 8.54
Shared Read Blocks (average): 49.89 -> 0.74
Execution Time (average): 0.84 -> 0.065 ms
I/O Read Time (average): 0.16 -> 0.004 ms

Additionally, the time a REINDEX takes also improves significantly:

     672407.584 ms (11:12.408) -> 130670.232 ms (02:10.670)

Most of the sortsupport for btree_gist was implemented by re-using 
already existing infrastructure. For the few remaining types (bit, bool, 
cash, enum, interval, macaddress8 and time) I manually implemented them 
directly in btree_gist.
It might make sense to move them into the backend for uniformity, but I 
wanted to get other opinions on that first.

`make check-world` reports no regressions.

Attached below, besides the patch, are also two scripts for benchmarking.

`bench-gist.py` to benchmark the actual patch, example usage of this 
would be e.g. `./bench-gist.py -o results.csv public.table`. This 
expects a local instance with no authentication and default `postgres` 
user. The port can be set using the `--port` option.

`plot.py` prints average values (as used above) and creates boxplots for 
each statistic from the result files produced with `bench-gist.py`. 
Depends on matplotlib and pandas.

Additionally, if needed, the sample dataset used to benchmark this is 
available to independently verify the results [1].

Thanks,
Christoph Heiss

---

[1] https://drive.google.com/file/d/1SKRiUYd78_zl7CeD8pLDoggzCCh0wj39
Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: replacing role-level NOINHERIT with a grant-level option
Next
From: Przemysław Sztoch
Date:
Subject: Re: [PATCH] Completed unaccent dictionary with many missing characters