Re: Minmax indexes - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Minmax indexes |
Date | |
Msg-id | 20140805234131.GD9388@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: Minmax indexes (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Minmax indexes
|
List | pgsql-hackers |
Thanks for all the feedback on version 9. Here's version 13. (The intermediate versions are just tags in my private tree which I created each time I rebased. Please bear with me here.) I have chosen to keep the name "minmax", even if the opclasses now let one implement completely different things on top of it such as geometry bounding boxes and bloom filters (aka bitmap indexes). I don't see a need for a rename: essentially, in PR we can just say "we have these neat minmax indexes that other databases also have, but instead of just being used for integer data, they can also be used for geometry, GIS and bitmap indexes, so as always we're more powerful than everyone else when implementing new database features". This new version includes some changes per feedback. Most notoriously, the opclass definition is different now: instead of relying on the "sortable" opclass implementation extracting the oprcode for each operator strategy (i.e. the functions that underlie < <= >= >), I chose to have catalog entries in pg_amproc for the underlying support functions. The new definition makes a lot of sense to me now, after thinking long about this stuff and carefully reading the "Catalog Entries for Indexes" chapter in docs. The way it works now is that there are five pg_amop entries in an opclass, just like previously (corresponding to the underlying < <= = >= > operators). This lets the optimizer choose the index when a query uses those operators. There are also seven pg_amproc entries. The first three are identical to all minmax opclasses: "opcinfo" (version 9 called it "getopers"), "consistent" (v9 name "compare") and "add_value" (v9 name "maybeUpdateValues", not a loved name evidently). A minmax opclass on top of a sortable datatype has four additional support functions: one for each function underlying the < <= >= > operators. Other opclasses would define their own support functions here, which would correspond to functions used to implement the "consistent" and "compare" functions internally. I don't claim this is 100% correct, but in particular I think it's now possible to implement cross-datatype comparisons, so that a minmax index defined on an int8 column works when the query uses an int4 operator, for example. (The current patch doesn't actually add such catalog entries, though. I think some minor code changes are required for this to actually work. However with the previous opclass definition it would have been outright impossible.) I fixed the bug reported by Masao-kun that collatable datatypes weren't cleanly supported. Collation OIDs are passed down now, although I don't claim that it is bulletproof. This could use some more testing. I haven't yet updated the revmap definition per Heikki's review. I am not sure I want to do that right away. I think we could live with what we have now, and see about changing this later on in the 9.5 cycle if we think a different definition is better. I think what we have is pretty solid even if there are some theoretical holes. As a very quick test, I created a 10 million tuples table with an int4 column on my laptop. The table is ~346 MB. Creating a btree index on it takes 8 seconds. A minmax index takes 1.6 seconds. The btree index is 214 MB. The minmax index, with pages_per_range=1 is 1 MB. With pages_per_range=16 (default) it is 48kB. Very unscientific results follow. This is the btree doing an index-only scan: alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Only Scan using bti2 on t (cost=0.43..1692.75 rows=54416 width=4) (actual time=0.106..23.329 rows=54518 loops=1) Index Cond: ((a > 991243) AND (a < 1045762)) Heap Fetches: 0 Buffers: shared hit=1 read=152 Planning time: 0.695 ms Execution time: 31.565 ms (6 filas) Duración: 33,662 ms Turn off index-only scan, do a regular index scan: alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Index Scan using bti2 on t (cost=0.43..1932.75 rows=54416 width=4) (actual time=0.066..31.027 rows=54518 loops=1) Index Cond: ((a > 991243) AND (a < 1045762)) Buffers: shared hit=394 Planning time: 0.250 ms Execution time: 39.218 ms (5 filas) Duración: 40,385 ms Use the 16-pages-per-range minmax index: alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on t (cost=16.60..47402.01 rows=54416 width=4) (actual time=4.266..43.948 rows=54518 loops=1) Recheck Cond: ((a > 991243) AND (a < 1045762)) Rows Removed by Index Recheck: 32266 Heap Blocks: lossy=384 Buffers: shared hit=244 read=142 -> Bitmap Index Scan on ti2 (cost=0.00..3.00 rows=54416 width=0) (actual time=1.061..1.061 rows=3840 loops=1) Index Cond: ((a > 991243) AND (a < 1045762)) Buffers: shared hit=2 Planning time: 0.215 ms Execution time: 51.820 ms (10 filas) This is the 1-page-per-range minmax index: alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on t (cost=157.60..47543.01 rows=54416 width=4) (actual time=82.479..98.642 rows=54518 loops=1) Recheck Cond: ((a > 991243) AND (a < 1045762)) Rows Removed by Index Recheck: 174 Heap Blocks: lossy=242 Buffers: shared hit=385 -> Bitmap Index Scan on ti (cost=0.00..144.00 rows=54416 width=0) (actual time=82.448..82.448 rows=2420 loops=1) Index Cond: ((a > 991243) AND (a < 1045762)) Buffers: shared hit=143 Planning time: 0.280 ms Execution time: 103.542 ms (10 filas) Duración: 104,952 ms This is a seqscan. Notice the high number of buffer accesses: alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..194248.00 rows=54416 width=4) (actual time=161.338..1201.535 rows=54518 loops=1) Filter: ((a > 991243) AND (a < 1045762)) Rows Removed by Filter: 9945482 Buffers: shared hit=10672 read=33576 Planning time: 0.189 ms Execution time: 1204.501 ms (6 filas) Duración: 1205,304 ms Of course, this isn't nearly a worst-case scenario for minmax, as the data is perfectly correlated. The pages_per_range=16 index benefits particularly from that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: