Thread: "The index is not optimal" GiST warnings

"The index is not optimal" GiST warnings

From
James Robinson
Date:
Folks,

We are using triggers and daterange-enhanced side-tables for temporal logging of inserts/updates/deletes of data so that we can perform arbitrary point-in-time reporting. Given a slightly augmented many-many relationship table like:

                                  Table "housing.unit_funding_source"
      Column       |  Type  |                                Modifiers                                 
-------------------+--------+--------------------------------------------------------------------------
 id                | bigint | not null default nextval('housing.unit_funding_source_id_seq'::regclass)
 unit_id           | bigint | not null
 funding_source_id | bigint | not null
 funding_id        | text   | 
Indexes:
    "unit_funding_source_pkey" PRIMARY KEY, btree (id)
    "unit_funding_source_unit_id_key" UNIQUE CONSTRAINT, btree (unit_id, funding_source_id) CLUSTER



("id" column is a concession to our older generation ORM)

we have a trigger maintained historical archive table documenting the open or closed date ranges documenting when data was visible in the main table:

                                      Table "housing.unit_funding_source_snapshot"
         Column          |   Type    |                                     Modifiers                                     
-------------------------+-----------+-----------------------------------------------------------------------------------
 seen                    | daterange | not null
 id                      | bigint    | not null
 unit_id                 | bigint    | not null
 funding_source_id       | bigint    | not null
 funding_identifier      | text      | 
 listing_deleted_code_id | integer   | 
 listing_deleted_text    | text      | 

(essentially the same table but with 'seen' describing the either open-ended (row still visible in housing.unit_funding_source), or closed (row no longer visible) date range. Basic temporal range logging pattern.

My question stems from the building of the GiST constraint to both guard against 'overlapping daterange for same logical row' as well as used for point-in-time queries (i.e. consider only the rows which contain midnight of an arbitrary day, or those which overlap with an entire month). When going by the recipes and we express the constraint with the daterange column and operator && first:

ALTER TABLE ONLY housing.unit_funding_source_snapshot
    ADD CONSTRAINT no_overlapping_rows_orig EXCLUDE USING gist (seen WITH &&, unit_id WITH =, funding_source_id WITH =)
    DEFERRABLE INITIALLY DEFERRED;

This ends up building a GiST index with 'seen' as the first column:

# \d housing.no_overlapping_rows_orig
      Index "housing.no_overlapping_rows_orig"
      Column       |    Type     |    Definition     
-------------------+-------------+-------------------
 seen              | daterange   | seen
 unit_id           | gbtreekey16 | unit_id
 funding_source_id | gbtreekey16 | funding_source_id
gist, for table "housing.unit_funding_source_snapshot", deferrable, initially deferred


and provokes the following warning 14 times (on PostgreSQL 9.5):

DEBUG:  picksplit method for column 1 of index "no_overlapping_rows_orig" failed
HINT:  The index is not optimal. To optimize it, contact a developer, or try to use the column as the second one in the CREATE INDEX command.

When rephrasing to have the columns reordered favoring the primary key combination from the original table:

ALTER TABLE ONLY housing.unit_funding_source_snapshot
ADD CONSTRAINT no_overlapping_rows_two EXCLUDE USING gist (unit_id WITH =, funding_source_id WITH =, seen WITH &&)
DEFERRABLE INITIALLY DEFERRED;

we only get one such warning.

Question: 

It seems that, contrary to expectation / experience with btree indices, that the index backing no_overlapping_rows_two, with 'seen' as the non-leftmost column, still is useable / preferred for daterange-only queries:

# explain analyze select count(*) from  housing.unit_funding_source_snapshot where seen && '[2016-09-01, 2016-10-01)'::daterange;
                                                                                QUERY PLAN                                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=481.83..481.84 rows=1 width=0) (actual time=7.184..7.184 rows=1 loops=1)
   ->  Index Only Scan using no_overlapping_rows_two on unit_funding_source_snapshot  (cost=0.28..456.59 rows=10098 width=0) (actual time=0.074..6.298 rows=10095 loops=1)
          Index Cond: (seen && '[2016-09-01,2016-10-01)'::daterange)
         Heap Fetches: 10095
 Planning time: 0.153 ms
 Execution time: 7.228 ms
(6 rows)

When that constraint / index is dropped, then will opt to use the original index with 'seen' as the leftmost. Is it less preferred due to being a larger index / disoptimal splits having produced a deeper tree?

# alter table housing.unit_funding_source_snapshot drop constraint no_overlapping_rows_two;
ALTER TABLE
# explain analyze select count(*) from  housing.unit_funding_source_snapshot where seen && '[2016-09-01, 2016-10-01)'::daterange;
                                                                                  QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=491.43..491.44 rows=1 width=0) (actual time=6.584..6.584 rows=1 loops=1)
   ->  Index Only Scan using no_overlapping_rows_orig on unit_funding_source_snapshot  (cost=0.28..466.19 rows=10098 width=0) (actual time=0.052..5.935 rows=10095 loops=1)
         Index Cond: (seen && '[2016-09-01,2016-10-01)'::daterange)
         Heap Fetches: 10095
 Planning time: 0.110 ms
 Execution time: 6.610 ms
(6 rows)


Searching the web / mailing list archives finds little to nothing regarding "The index is not optimal. To optimize it, contact a developer, or try to use the column as the second one in the CREATE INDEX command."

Should we make the effort to track down these indices / constraints and reorder the columns to put the original table's primary key fields first, then the range column last?

------
James Robinson
Socialserve.com by Emphasys Software



Re: "The index is not optimal" GiST warnings

From
Tom Lane
Date:
James Robinson <jrobinson@emphasys-software.com> writes:
> ... provokes the following warning 14 times (on PostgreSQL 9.5):

>     DEBUG:  picksplit method for column 1 of index "no_overlapping_rows_orig" failed
>     HINT:  The index is not optimal. To optimize it, contact a developer, or try to use the column as the second one
inthe CREATE INDEX command. 

This just indicates that you had a whole index page full of entries with
the same daterange value.  I wouldn't put a lot of concern into it
(there's a reason it's only a DEBUG message), especially if there are
only 14 occurrences in what I assume is a pretty large index.

> ... It seems that, contrary to expectation / experience with btree indices, that the index backing
no_overlapping_rows_two,with 'seen' as the non-leftmost column, still is useable / preferred for daterange-only
queries:

GiST indexes have a preference for queries on the leading column rather
than later columns, but it's less strong than is true for btree.  AFAICS
the planner doesn't account for such an effect at all.  If you're seeing
a cost estimate difference, that must come just from one index being
physically larger than the other.  That wouldn't be a surprising thing,
due to possibly different choices about page splits, but it's hard to say
whether it really corresponds to any noticeable difference in access
speed.  Your actual-runtime results suggest that the daterange-first
index is faster, and I'd tend to believe that over any other evidence.

            regards, tom lane