Re: Minmax indexes - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Minmax indexes
Date
Msg-id 20140805234131.GD9388@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: Minmax indexes  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Minmax indexes
List pgsql-hackers
Thanks for all the feedback on version 9.  Here's version 13.  (The
intermediate versions are just tags in my private tree which I created
each time I rebased.  Please bear with me here.)

I have chosen to keep the name "minmax", even if the opclasses now let
one implement completely different things on top of it such as geometry
bounding boxes and bloom filters (aka bitmap indexes).  I don't see a
need for a rename: essentially, in PR we can just say "we have these
neat minmax indexes that other databases also have, but instead of just
being used for integer data, they can also be used for geometry, GIS and
bitmap indexes, so as always we're more powerful than everyone else when
implementing new database features".

This new version includes some changes per feedback.  Most notoriously,
the opclass definition is different now: instead of relying on the
"sortable" opclass implementation extracting the oprcode for each
operator strategy (i.e. the functions that underlie < <= >= >), I chose
to have catalog entries in pg_amproc for the underlying support
functions.  The new definition makes a lot of sense to me now, after
thinking long about this stuff and carefully reading the
"Catalog Entries for Indexes" chapter in docs.

The way it works now is that there are five pg_amop entries in an
opclass, just like previously (corresponding to the underlying < <= = >= >
operators).  This lets the optimizer choose the index when a query uses
those operators.  There are also seven pg_amproc entries.  The first
three are identical to all minmax opclasses: "opcinfo" (version 9 called
it "getopers"), "consistent" (v9 name "compare") and "add_value" (v9
name "maybeUpdateValues", not a loved name evidently).  A minmax opclass
on top of a sortable datatype has four additional support functions: one
for each function underlying the < <= >= > operators.  Other opclasses
would define their own support functions here, which would correspond to
functions used to implement the "consistent" and "compare" functions
internally.

I don't claim this is 100% correct, but in particular I think it's now
possible to implement cross-datatype comparisons, so that a minmax index
defined on an int8 column works when the query uses an int4 operator,
for example.  (The current patch doesn't actually add such catalog
entries, though.  I think some minor code changes are required for this
to actually work.  However with the previous opclass definition it would
have been outright impossible.)

I fixed the bug reported by Masao-kun that collatable datatypes weren't
cleanly supported.  Collation OIDs are passed down now, although I don't
claim that it is bulletproof.  This could use some more testing.

I haven't yet updated the revmap definition per Heikki's review.  I am
not sure I want to do that right away.  I think we could live with what
we have now, and see about changing this later on in the 9.5 cycle if we
think a different definition is better.  I think what we have is pretty
solid even if there are some theoretical holes.

As a very quick test, I created a 10 million tuples table with an int4
column on my laptop.  The table is ~346 MB.  Creating a btree index on
it takes 8 seconds.  A minmax index takes 1.6 seconds.  The btree index
is 214 MB.  The minmax index, with pages_per_range=1 is 1 MB.  With
pages_per_range=16 (default) it is 48kB.

Very unscientific results follow.  This is the btree doing an index-only
scan:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
                                                       QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using bti2 on t  (cost=0.43..1692.75 rows=54416 width=4) (actual time=0.106..23.329 rows=54518
loops=1)
   Index Cond: ((a > 991243) AND (a < 1045762))
   Heap Fetches: 0
   Buffers: shared hit=1 read=152
 Planning time: 0.695 ms
 Execution time: 31.565 ms
(6 filas)

Duración: 33,662 ms

Turn off index-only scan, do a regular index scan:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Index Scan using bti2 on t  (cost=0.43..1932.75 rows=54416 width=4) (actual time=0.066..31.027 rows=54518 loops=1)
   Index Cond: ((a > 991243) AND (a < 1045762))
   Buffers: shared hit=394
 Planning time: 0.250 ms
 Execution time: 39.218 ms
(5 filas)

Duración: 40,385 ms

Use the 16-pages-per-range minmax index:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=16.60..47402.01 rows=54416 width=4) (actual time=4.266..43.948 rows=54518 loops=1)
   Recheck Cond: ((a > 991243) AND (a < 1045762))
   Rows Removed by Index Recheck: 32266
   Heap Blocks: lossy=384
   Buffers: shared hit=244 read=142
   ->  Bitmap Index Scan on ti2  (cost=0.00..3.00 rows=54416 width=0) (actual time=1.061..1.061 rows=3840 loops=1)
         Index Cond: ((a > 991243) AND (a < 1045762))
         Buffers: shared hit=2
 Planning time: 0.215 ms
 Execution time: 51.820 ms
(10 filas)

This is the 1-page-per-range minmax index:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=157.60..47543.01 rows=54416 width=4) (actual time=82.479..98.642 rows=54518 loops=1)
   Recheck Cond: ((a > 991243) AND (a < 1045762))
   Rows Removed by Index Recheck: 174
   Heap Blocks: lossy=242
   Buffers: shared hit=385
   ->  Bitmap Index Scan on ti  (cost=0.00..144.00 rows=54416 width=0) (actual time=82.448..82.448 rows=2420 loops=1)
         Index Cond: ((a > 991243) AND (a < 1045762))
         Buffers: shared hit=143
 Planning time: 0.280 ms
 Execution time: 103.542 ms
(10 filas)

Duración: 104,952 ms

This is a seqscan.  Notice the high number of buffer accesses:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..194248.00 rows=54416 width=4) (actual time=161.338..1201.535 rows=54518 loops=1)
   Filter: ((a > 991243) AND (a < 1045762))
   Rows Removed by Filter: 9945482
   Buffers: shared hit=10672 read=33576
 Planning time: 0.189 ms
 Execution time: 1204.501 ms
(6 filas)

Duración: 1205,304 ms

Of course, this isn't nearly a worst-case scenario for minmax, as the
data is perfectly correlated.  The pages_per_range=16 index benefits
particularly from that.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: David G Johnston
Date:
Subject: Re: Append to a GUC parameter ?
Next
From: Josh Berkus
Date:
Subject: Re: Minmax indexes