Re: [PATCH] Add sortsupport for range types and btree_gist - Mailing list pgsql-hackers
From | Bernd Helmle |
---|---|
Subject | Re: [PATCH] Add sortsupport for range types and btree_gist |
Date | |
Msg-id | 77f4bacbeb9d223f90b3a0bd55ef7d65407bfddd.camel@oopsware.de Whole thread Raw |
In response to | [PATCH] Add sortsupport for range types and btree_gist (Christoph Heiss <christoph.heiss@cybertec.at>) |
Responses |
Re: [PATCH] Add sortsupport for range types and btree_gist
|
List | pgsql-hackers |
Hi, No deep code review yet, but CF is approaching its end and i didn't have time to look at this earlier :/ Below are some things i've tested so far. Am Mittwoch, dem 15.06.2022 um 12:45 +0200 schrieb Christoph Heiss: > Testing was done using following setup, with about 50 million rows: > > CREATE EXTENSION btree_gist; > CREATE TABLE t (id uuid, block_range int4range); > CREATE INDEX ON before USING GIST (id, block_range); > COPY t FROM '..' DELIMITER ',' CSV HEADER; > > using > > SELECT * FROM t WHERE id = '..' AND block_range && '..' > > as test query, using a unpatched instance and one with the patch > applied. > > Some stats for fetching 10,000 random rows using the query above, > 100 iterations to get good averages. > Here are my results with repeating this: HEAD: -- token index (buffering=auto) CREATE INDEX Time: 700213,110 ms (11:40,213) HEAD patched: -- token index (buffering=auto) CREATE INDEX Time: 136229,400 ms (02:16,229) So index creation speed on the test set (table filled with the tokens and then creating the index afterwards) gets a lot of speedup with this patch and default buffering strategy. > The benchmarking was done on a unpatched instance compiled using the > > exact same options as with the patch applied. > > [ Results are noted in a unpatched -> patched fashion. ] > > > > First set of results are after the initial CREATE TABLE, CREATE > INDEX > > and a COPY to the table, thereby incrementally building the index. > > > > Shared Hit Blocks (average): 110.97 -> 78.58 > > Shared Read Blocks (average): 58.90 -> 47.42 > > Execution Time (average): 1.10 -> 0.83 ms > > I/O Read Time (average): 0.19 -> 0.15 ms I've changed this a little and did the following: CREATE EXTENSION btree_gist; CREATE TABLE t (id uuid, block_range int4range); COPY t FROM '..' DELIMITER ',' CSV HEADER; CREATE INDEX ON before USING GIST (id, block_range); So creating the index _after_ having loaded the tokens. My configuration was: shared_buffers = 4G max_wal_size = 6G effective_cache_size = 4g # (default, index fits) maintenance_work_mem = 1G Here are my numbers from the attached benchmark script HEAD -> HEAD patched: Shared Hit Blocks (avg) : 76.81 -> 9.17 Shared Read Blocks (avg): 0.43 -> 0.11 Execution Time (avg) : 0.40 -> 0.05 IO Read Time (avg) : 0.001 -> 0.0007 So with these settings i see an improvement with the provided test set. Since this patches adds sortsupport for all other existing opclasses, i thought to give it a try with another test set. What i did was to adapt the benchmark script (see attached) to use the "pgbench_accounts" table which i changed to instead using the primary key to have a btree_gist index on column "aid". I let pgbench fill its tables with scale = 1000, dropped the primary key, create the btree_gist on "aid" with default buffering strategy: pgbench -s 1000 -i bernd ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey ; CREATE INDEX ON pgbench_accounts USING gist(aid); Ran the benchmark script bench-gist-pgbench_accounts.py: The numbers are: HEAD -> HEAD patched Shared Hit Blocks (avg) : 4.85 -> 8.75 Shared Read Blocks (avg): 0.14 -> 0.17 Execution Time (avg) : 0.01 -> 0.05 IO Read Time (avg) : 0.0003 -> 0.0009 So numbers got worse here. You can uncover this when using pgbench against that modified table in a much more worse outcome. Running pgbench -s 1000 -c 16 -j 16 -S -Mprepared -T 300 on my workstation at least 3 times gives me the following numbers: HEAD: tps = 215338.784398 (without initial connection time) tps = 212826.513727 (without initial connection time) tps = 212102.857891 (without initial connection time) HEAD patched: tps = 126487.796716 (without initial connection time) tps = 125076.391528 (without initial connection time) tps = 124538.946388 (without initial connection time) So this doesn't look good. While this patch gets a real improvement for the provided tokens, it makes performance for at least int4 on this test worse. Though the picture changes again if you build the index buffered: tps = 198409.248911 (without initial connection time) tps = 194431.827394 (without initial connection time) tps = 195657.532281 (without initial connection time) which is again close to current HEAD (i have no idea why it is even *that* slower, since "buffered=on" shouldn't employ sortsupport, no?). Of course, built time for the index in this case is much slower again: -- pgbench_accounts index (buffered) CREATE INDEX Time: 900912,924 ms (15:00,913) So while providing a huge improvement on index creation speed it's sometimes still required to carefully check the index quality. [...] > Most of the sortsupport for btree_gist was implemented by re-using > already existing infrastructure. For the few remaining types (bit, > bool, > cash, enum, interval, macaddress8 and time) I manually implemented > them > directly in btree_gist. > It might make sense to move them into the backend for uniformity, but > I > wanted to get other opinions on that first. Hmm i'd say we leave them in the contrib module until they are required somewhere else, too or make a separate patch for them? Do we have plans to have such requirement in the backend already? Attached is a rebased patch against current HEAD. Thanks Bernd
Attachment
pgsql-hackers by date: